feat(agent): add GraphNet Agent with single-model and multi-GPU parallel extraction by fangfangssj · Pull Request #704 · PaddlePaddle/GraphNet

fangfangssj · 2026-05-13T12:23:38Z

PR Category

Feature Enhancement

Description

新增 GraphNet Agent 模块，实现从 HuggingFace model ID 到 GraphNet Sample 的全自动化抽取流水线，支持单模型抽取和多 GPU并行批量抽取。

自动化流水线：Fetch → Analyze → CodeGen → Extract → Deduplicate → Verify
LLM 自动修复：模板脚本执行失败时，自动调用 ducc/claude -p 修复脚本并最多重试 2 次
多 GPU 并行：parallel_extract.py 基于共享任务队列动态调度，天然负载均衡，支持从文件或 HuggingFace Hub 批量获取模型列表
OOM 防护：序列长度上限 128、图像尺寸上限 512，防止大模型 max_position_embeddings（可达 131072）直接导致 OOM

… docs - Remove tests/ directory (broken test cases referencing non-existent methods) - Fix concurrent output dir collision: remove time-based Strategy 3 in SubprocessGraphExtractor._find_output_dir_robust to prevent workers from grabbing each other's output directories - Fix generated code missing import torchvision for resnet/vgg/densenet; then remove torchvision entirely — all models now go through AutoConfig - Cap input sequence length to 128 and image size to 512 in ConfigMetadataAnalyzer to prevent OOM from large max_position_embeddings - Remove hardcoded paths (/work/graphnet_workspace, GPU list [2,3,4,5], python3.12 nvidia path, /root/.comate path injection); workspace now resolves from GRAPH_NET_EXTRACT_WORKSPACE or ~/graphnet_workspace, GPUs auto-detected via CUDA_VISIBLE_DEVICES / nvidia-smi, nvidia lib path via sysconfig, PATH injection derived from found binary - Fix broken import of deleted tests module in parallel_extract.py; inline load_models_from_file / get_models_from_hf / HUGGINGFACE_HUB_AVAILABLE - Update README: remove torchvision dep, remove tests section, add parallel_extract.py detailed docs, LLM retry section, OOM limits section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddle-bot · 2026-05-13T12:23:46Z

Thanks for your contribution!

Xreki · 2026-05-14T02:35:52Z

@@ -222,4 +208,8 @@ def _find_hash_named_dir(self, workspace_path: Path) -> Optional[Path]:
    def _is_valid_sample_dir(self, dir_path: Path) -> bool:


这里的判断还需要完善下

Xreki · 2026-05-14T02:43:34Z

+"""
+
+
+class ForwardVerifier(BaseSampleVerifier):


应该叫ModelRunnableVerifier，或者其实也可以直接用graph_net里面已经实现的ModelRunnablePredictor

Xreki · 2026-05-14T02:47:14Z

-    """Basic verifier that checks file existence and basic structure"""
+    """Basic verifier that checks file existence and basic structure.
+
+    Supports both single-graph and multi-subgraph (subgraph_0/, subgraph_1/, …) layouts.


这个BasicSampleVerifier功能和前面的_is_valid_sample_dir是重复的吧？

Xreki · 2026-05-14T02:48:47Z

+        Args:
+            timeout: seconds to wait for each forward-pass subprocess (default 5 min)
+        """
+        self._basic = BasicSampleVerifier()


这种写法，不太规范

Xreki · 2026-05-14T03:33:38Z

+        help="从 HuggingFace Hub 抓取的模型数量（model-list 未指定时生效，默认 100）",
+    )
+    parser.add_argument(
+        "--task",


所以在agent里面是有用到task类中的哦，需要把它写入graph_net.json

Xreki · 2026-05-14T03:35:15Z

+    _print_summary(results)
+    print(f"\n[DONE] Total elapsed: {elapsed_total:.0f}s")
+
+    return 0 if results["success_rate"] > 0 else 1


这个函数太长了

Copilot

Pull request overview

Adds a GraphNet Agent workflow for HuggingFace model extraction with config-only model loading, LLM-assisted retry, forward verification, and multi-GPU batch extraction.

Changes:

Adds LLM retry support, forward-pass verification, and multi-subgraph verification support.
Adds parallel_extract.py for multi-GPU batch extraction and updates Agent docs.
Changes fetching/codegen behavior to avoid downloading weights and cap input sizes; removes existing agent tests.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 24 comments.

Show a summary per file

File	Description
`graph_net/agent/graph_net_agent.py`	Updates the core extraction pipeline with optional workspace defaults, LLM retry, forward verifier, and model-name JSON fixups.
`graph_net/agent/code_generator/template_generator.py`	Switches generated scripts to config-only random-weight loading and static graph extraction names.
`graph_net/agent/code_generator/llm_code_fixer.py`	Adds LLM-based script repair via `ducc`/`claude`.
`graph_net/agent/code_generator/__init__.py`	Exports the new LLM fixer.
`graph_net/agent/graph_extractor/subprocess_graph_extractor.py`	Changes subprocess execution, timeout cleanup, workspace handling, and output directory discovery.
`graph_net/agent/metadata_analyzer/config_metadata_analyzer.py`	Caps sequence length and image size during metadata-derived input generation.
`graph_net/agent/model_fetcher/huggingface_fetcher.py`	Adds retry behavior, endpoint support, and weight-file ignore patterns for downloads.
`graph_net/agent/sample_verifier/basic_sample_verifier.py`	Extends basic verification to multi-subgraph outputs.
`graph_net/agent/sample_verifier/forward_verifier.py`	Adds subprocess-based eager forward verification.
`graph_net/agent/sample_verifier/__init__.py`	Exports `ForwardVerifier`.
`graph_net/agent/parallel_extract.py`	Adds shared-queue multi-GPU batch extraction CLI.
`graph_net/agent/README.md`	Updates setup, usage, workflow, LLM retry, and parallel extraction documentation.
`graph_net/agent/agent_usage.md`	Adds a detailed usage guide for single and batch extraction.
`graph_net/agent/tests/__init__.py`	Removes the agent tests package marker.
`graph_net/agent/tests/test_utils.py`	Removes utility/workspace tests.
`graph_net/agent/tests/test_model_metadata.py`	Removes metadata validation tests.
`graph_net/agent/tests/test_integration.py`	Removes integration workflow tests.
`graph_net/agent/tests/test_code_generator.py`	Removes template code generator tests.
`graph_net/agent/tests/test_batch_success_rate.py`	Removes batch success-rate test script.
`graph_net/agent/tests/run_500_models_test.py`	Removes large batch test runner.

Comments suppressed due to low confidence (4)

graph_net/agent/tests/test_integration.py:1

The PR deletes the agent's existing unit and integration coverage while adding new extraction, retry, verifier, and parallel scheduling behavior. Please keep or replace these tests so the existing API contracts and new paths remain covered.
graph_net/agent/agent_usage.md:136
This flow still says the final step archives the script, but GraphNetAgent.extract_sample() no longer calls an archive method after verification. Keeping this in the guide makes the documented pipeline disagree with the actual behavior.

⑥ 生成 graph_hash.txt + 去重检查 + 验证输出文件完整性 + 归档脚本

graph_net/agent/agent_usage.md:50

This table repeats /work/graphnet_workspace as the default, but GraphNetAgent defaults to ~/graphnet_workspace when no workspace is provided. Please align the documented default with the implementation.

| 参数 | 默认值 | 说明 |
|------|--------|------|
| `workspace` | `/work/graphnet_workspace` | 工作目录，自动创建子目录结构 |
| `hf_token` | `None` | HF access token，公开模型无需填写 |

graph_net/agent/agent_usage.md:190

This success checklist still includes run_model.py, but the archive method and call were removed from GraphNetAgent. As written, users can see extract_sample() return True while this documented file is absent.

**Q：如何检查某次抽取是否成功？**

`extract_sample()` 返回 `True` 表示成功，同时可以检查输出目录是否存在 7 个文件：
`model.py`、`graph_net.json`、`input_meta.py`、`input_tensor_constraints.py`、
`weight_meta.py`、`graph_hash.txt`、`run_model.py`。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            # Ensure GRAPH_NET_EXTRACT_WORKSPACE points to our workspace
+            if "GRAPH_NET_EXTRACT_WORKSPACE" not in env:
+                env["GRAPH_NET_EXTRACT_WORKSPACE"] = str(self.workspace)


+            # Check if all workers are done
+            alive = [p for p in processes if p.is_alive()]
+            if not alive:
+                break


+            # image_size may be an int or a [H, W] list
+            raw_size = config.get("image_size", 224)
+            if isinstance(raw_size, (list, tuple)):
+                raw_size = raw_size[0]
+            image_size = min(int(raw_size), _MAX_IMAGE_SIZE)


-            return f'model = AutoModel.from_pretrained("{model_path}")'
+        return (
+            f"from transformers import AutoConfig\n"
+            f'_config = AutoConfig.from_pretrained("{model_path}", trust_remote_code=True)\n'


+            subgraph_dirs = sorted(sample_dir.glob("subgraph_*/"))
+            targets = subgraph_dirs if subgraph_dirs else [sample_dir]


+# 设置代理（访问 HuggingFace 需要）
+export http_proxy=http://agent.baidu.com:8891
+export https_proxy=http://agent.baidu.com:8891


+export https_proxy=http://agent.baidu.com:8891
+
+# LLM 兜底功能需要 ducc CLI（可选）
+export PATH="/root/.comate/baidu-cc/bin:$PATH"


+# Candidate binary names / paths to search for ducc CLI
+_DUCC_CANDIDATES = [
+    "ducc",
+    "claude",
+    "/root/.comate/baidu-cc/bin/ducc",
+    "/usr/local/bin/ducc",
+    os.path.expanduser("~/.local/bin/ducc"),
+]
+


    def __init__(
        self,
-        workspace: str,
+        workspace: Optional[str] = None,
        hf_token: Optional[str] = None,
+        llm_retry: bool = True,
    ):


+_GRAPHNET_ROOT = _SCRIPT_DIR.parent.parent  # GraphNet/
+if str(_GRAPHNET_ROOT) not in sys.path:
+    sys.path.insert(0, str(_GRAPHNET_ROOT))


…_extract to English - Remove baidu proxy settings and baidu-cc PATH from agent_usage.md - Remove /root/.comate/baidu-cc/bin/ducc hardcoded path from llm_code_fixer.py - Translate all Chinese comments/docstrings/help text in parallel_extract.py to English - Remove _setup_nvidia_ld_library_path and unused sysconfig import from parallel_extract.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

luotao1

parallel_extract_20260418_040238.json‎ 可以删掉

Xreki

先合入一版

Xreki · 2026-05-14T09:16:29Z

+# 目录
+在GraphNet目录下运行即可，不需要安装
+
+```


L12多出来的

fangfangssj and others added 3 commits March 31, 2026 06:56

init

b370282

fix agent

b44b98f

Xreki reviewed May 14, 2026

View reviewed changes

Xreki requested a review from Copilot May 14, 2026 03:39

Copilot started reviewing on behalf of Xreki May 14, 2026 03:39 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

fangfangssj and others added 2 commits May 14, 2026 03:57

chore: remove stale parallel extraction result file

935487f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

luotao1 reviewed May 14, 2026

View reviewed changes

Comment thread graph_net/agent/code_generator/llm_code_fixer.py

fangfangssj force-pushed the agent branch from fdf3d9e to 935487f Compare May 14, 2026 07:55

Xreki approved these changes May 14, 2026

View reviewed changes

Xreki merged commit 6c8c1a3 into PaddlePaddle:develop May 14, 2026
3 checks passed

		@@ -222,4 +208,8 @@ def _find_hash_named_dir(self, workspace_path: Path) -> Optional[Path]:
		def _is_valid_sample_dir(self, dir_path: Path) -> bool:

		subgraph_dirs = sorted(sample_dir.glob("subgraph_*/"))
		targets = subgraph_dirs if subgraph_dirs else [sample_dir]

Conversation

fangfangssj commented May 13, 2026

PR Category

Description

Uh oh!

paddle-bot Bot commented May 13, 2026

Uh oh!

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Xreki May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants