Skip to content

feat(agent): add GraphNet Agent with single-model and multi-GPU parallel extraction #704

Merged
Xreki merged 5 commits into
PaddlePaddle:developfrom
fangfangssj:agent
May 14, 2026
Merged

feat(agent): add GraphNet Agent with single-model and multi-GPU parallel extraction #704
Xreki merged 5 commits into
PaddlePaddle:developfrom
fangfangssj:agent

Conversation

@fangfangssj
Copy link
Copy Markdown
Collaborator

PR Category

Feature Enhancement

Description

新增 GraphNet Agent 模块,实现从 HuggingFace model ID 到 GraphNet Sample 的全自动化抽取流水线,支持单模型抽取和多 GPU并行批量抽取。

  • 自动化流水线:Fetch → Analyze → CodeGen → Extract → Deduplicate → Verify
  • LLM 自动修复:模板脚本执行失败时,自动调用 ducc/claude -p 修复脚本并最多重试 2 次
  • 多 GPU 并行parallel_extract.py 基于共享任务队列动态调度,天然负载均衡,支持从文件或 HuggingFace Hub 批量获取模型列表
  • OOM 防护:序列长度上限 128、图像尺寸上限 512,防止大模型 max_position_embeddings(可达 131072)直接导致 OOM

fangfangssj and others added 3 commits March 31, 2026 06:56
… docs

- Remove tests/ directory (broken test cases referencing non-existent methods)
- Fix concurrent output dir collision: remove time-based Strategy 3 in
  SubprocessGraphExtractor._find_output_dir_robust to prevent workers
  from grabbing each other's output directories
- Fix generated code missing import torchvision for resnet/vgg/densenet;
  then remove torchvision entirely — all models now go through AutoConfig
- Cap input sequence length to 128 and image size to 512 in
  ConfigMetadataAnalyzer to prevent OOM from large max_position_embeddings
- Remove hardcoded paths (/work/graphnet_workspace, GPU list [2,3,4,5],
  python3.12 nvidia path, /root/.comate path injection); workspace now
  resolves from GRAPH_NET_EXTRACT_WORKSPACE or ~/graphnet_workspace,
  GPUs auto-detected via CUDA_VISIBLE_DEVICES / nvidia-smi, nvidia lib
  path via sysconfig, PATH injection derived from found binary
- Fix broken import of deleted tests module in parallel_extract.py;
  inline load_models_from_file / get_models_from_hf / HUGGINGFACE_HUB_AVAILABLE
- Update README: remove torchvision dep, remove tests section, add
  parallel_extract.py detailed docs, LLM retry section, OOM limits section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 13, 2026

Thanks for your contribution!

Comment thread graph_net/agent/code_generator/llm_code_fixer.py
@@ -222,4 +208,8 @@ def _find_hash_named_dir(self, workspace_path: Path) -> Optional[Path]:
def _is_valid_sample_dir(self, dir_path: Path) -> bool:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的判断还需要完善下

"""


class ForwardVerifier(BaseSampleVerifier):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该叫ModelRunnableVerifier,或者其实也可以直接用graph_net里面已经实现的ModelRunnablePredictor

"""Basic verifier that checks file existence and basic structure"""
"""Basic verifier that checks file existence and basic structure.

Supports both single-graph and multi-subgraph (subgraph_0/, subgraph_1/, …) layouts.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个BasicSampleVerifier功能和前面的_is_valid_sample_dir是重复的吧?

Args:
timeout: seconds to wait for each forward-pass subprocess (default 5 min)
"""
self._basic = BasicSampleVerifier()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种写法,不太规范

Comment thread graph_net/agent/parallel_extract.py Outdated
Comment thread graph_net/agent/parallel_extract.py Outdated
help="从 HuggingFace Hub 抓取的模型数量(model-list 未指定时生效,默认 100)",
)
parser.add_argument(
"--task",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以在agent里面是有用到task类中的哦,需要把它写入graph_net.json

_print_summary(results)
print(f"\n[DONE] Total elapsed: {elapsed_total:.0f}s")

return 0 if results["success_rate"] > 0 else 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数太长了

Comment thread parallel_extract_20260418_040238.json Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a GraphNet Agent workflow for HuggingFace model extraction with config-only model loading, LLM-assisted retry, forward verification, and multi-GPU batch extraction.

Changes:

  • Adds LLM retry support, forward-pass verification, and multi-subgraph verification support.
  • Adds parallel_extract.py for multi-GPU batch extraction and updates Agent docs.
  • Changes fetching/codegen behavior to avoid downloading weights and cap input sizes; removes existing agent tests.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
graph_net/agent/graph_net_agent.py Updates the core extraction pipeline with optional workspace defaults, LLM retry, forward verifier, and model-name JSON fixups.
graph_net/agent/code_generator/template_generator.py Switches generated scripts to config-only random-weight loading and static graph extraction names.
graph_net/agent/code_generator/llm_code_fixer.py Adds LLM-based script repair via ducc/claude.
graph_net/agent/code_generator/__init__.py Exports the new LLM fixer.
graph_net/agent/graph_extractor/subprocess_graph_extractor.py Changes subprocess execution, timeout cleanup, workspace handling, and output directory discovery.
graph_net/agent/metadata_analyzer/config_metadata_analyzer.py Caps sequence length and image size during metadata-derived input generation.
graph_net/agent/model_fetcher/huggingface_fetcher.py Adds retry behavior, endpoint support, and weight-file ignore patterns for downloads.
graph_net/agent/sample_verifier/basic_sample_verifier.py Extends basic verification to multi-subgraph outputs.
graph_net/agent/sample_verifier/forward_verifier.py Adds subprocess-based eager forward verification.
graph_net/agent/sample_verifier/__init__.py Exports ForwardVerifier.
graph_net/agent/parallel_extract.py Adds shared-queue multi-GPU batch extraction CLI.
graph_net/agent/README.md Updates setup, usage, workflow, LLM retry, and parallel extraction documentation.
graph_net/agent/agent_usage.md Adds a detailed usage guide for single and batch extraction.
graph_net/agent/tests/__init__.py Removes the agent tests package marker.
graph_net/agent/tests/test_utils.py Removes utility/workspace tests.
graph_net/agent/tests/test_model_metadata.py Removes metadata validation tests.
graph_net/agent/tests/test_integration.py Removes integration workflow tests.
graph_net/agent/tests/test_code_generator.py Removes template code generator tests.
graph_net/agent/tests/test_batch_success_rate.py Removes batch success-rate test script.
graph_net/agent/tests/run_500_models_test.py Removes large batch test runner.
Comments suppressed due to low confidence (4)

graph_net/agent/tests/test_integration.py:1

  • The PR deletes the agent's existing unit and integration coverage while adding new extraction, retry, verifier, and parallel scheduling behavior. Please keep or replace these tests so the existing API contracts and new paths remain covered.
    graph_net/agent/agent_usage.md:136
  • This flow still says the final step archives the script, but GraphNetAgent.extract_sample() no longer calls an archive method after verification. Keeping this in the guide makes the documented pipeline disagree with the actual behavior.
⑥ 生成 graph_hash.txt + 去重检查 + 验证输出文件完整性 + 归档脚本

graph_net/agent/agent_usage.md:50

  • This table repeats /work/graphnet_workspace as the default, but GraphNetAgent defaults to ~/graphnet_workspace when no workspace is provided. Please align the documented default with the implementation.
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `workspace` | `/work/graphnet_workspace` | 工作目录,自动创建子目录结构 |
| `hf_token` | `None` | HF access token,公开模型无需填写 |

graph_net/agent/agent_usage.md:190

  • This success checklist still includes run_model.py, but the archive method and call were removed from GraphNetAgent. As written, users can see extract_sample() return True while this documented file is absent.
**Q:如何检查某次抽取是否成功?**

`extract_sample()` 返回 `True` 表示成功,同时可以检查输出目录是否存在 7 个文件:
`model.py`、`graph_net.json`、`input_meta.py`、`input_tensor_constraints.py`、
`weight_meta.py`、`graph_hash.txt`、`run_model.py`。

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +65
# Ensure GRAPH_NET_EXTRACT_WORKSPACE points to our workspace
if "GRAPH_NET_EXTRACT_WORKSPACE" not in env:
env["GRAPH_NET_EXTRACT_WORKSPACE"] = str(self.workspace)
Comment on lines +393 to +396
# Check if all workers are done
alive = [p for p in processes if p.is_alive()]
if not alive:
break
Comment on lines +129 to +133
# image_size may be an int or a [H, W] list
raw_size = config.get("image_size", 224)
if isinstance(raw_size, (list, tuple)):
raw_size = raw_size[0]
image_size = min(int(raw_size), _MAX_IMAGE_SIZE)
return f'model = AutoModel.from_pretrained("{model_path}")'
return (
f"from transformers import AutoConfig\n"
f'_config = AutoConfig.from_pretrained("{model_path}", trust_remote_code=True)\n'
Comment on lines +26 to +27
subgraph_dirs = sorted(sample_dir.glob("subgraph_*/"))
targets = subgraph_dirs if subgraph_dirs else [sample_dir]
Comment thread graph_net/agent/agent_usage.md Outdated
Comment on lines +12 to +14
# 设置代理(访问 HuggingFace 需要)
export http_proxy=http://agent.baidu.com:8891
export https_proxy=http://agent.baidu.com:8891
Comment thread graph_net/agent/agent_usage.md Outdated
export https_proxy=http://agent.baidu.com:8891

# LLM 兜底功能需要 ducc CLI(可选)
export PATH="/root/.comate/baidu-cc/bin:$PATH"
Comment on lines +14 to +22
# Candidate binary names / paths to search for ducc CLI
_DUCC_CANDIDATES = [
"ducc",
"claude",
"/root/.comate/baidu-cc/bin/ducc",
"/usr/local/bin/ducc",
os.path.expanduser("~/.local/bin/ducc"),
]

Comment on lines 29 to 34
def __init__(
self,
workspace: str,
workspace: Optional[str] = None,
hf_token: Optional[str] = None,
llm_retry: bool = True,
):
Comment on lines +36 to +38
_GRAPHNET_ROOT = _SCRIPT_DIR.parent.parent # GraphNet/
if str(_GRAPHNET_ROOT) not in sys.path:
sys.path.insert(0, str(_GRAPHNET_ROOT))
fangfangssj and others added 2 commits May 14, 2026 03:57
…_extract to English

- Remove baidu proxy settings and baidu-cc PATH from agent_usage.md
- Remove /root/.comate/baidu-cc/bin/ducc hardcoded path from llm_code_fixer.py
- Translate all Chinese comments/docstrings/help text in parallel_extract.py to English
- Remove _setup_nvidia_ld_library_path and unused sysconfig import from parallel_extract.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parallel_extract_20260418_040238.json‎ 可以删掉

Comment thread graph_net/agent/code_generator/llm_code_fixer.py
Copy link
Copy Markdown
Collaborator

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先合入一版

# 目录
在GraphNet目录下运行即可,不需要安装

```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L12多出来的

@Xreki Xreki merged commit 6c8c1a3 into PaddlePaddle:develop May 14, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants