Skip to content

feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model#710

Merged
Xreki merged 1 commit into
PaddlePaddle:developfrom
fangfangssj:llm
May 15, 2026
Merged

feat(agent): support MoE, multimodal, audio, seq2seq, diffusion model#710
Xreki merged 1 commit into
PaddlePaddle:developfrom
fangfangssj:llm

Conversation

@fangfangssj
Copy link
Copy Markdown
Collaborator

PR Category

Feature Enhancement

Description

抽图agent支持 MoE, multimodal, audio, seq2seq, diffusion model等架构,目前小批量测试没问题,尚未大批量测试

Extend GraphNet Agent to correctly identify and extract computation graphs for a wider range of model architectures beyond basic text/vision models.

Key changes:

  • ModelMetadata: add architecture_type field ("text"/"vision"/"seq2seq"/ "audio"/"multimodal"/"diffusion"/"moe")
  • ConfigMetadataAnalyzer: use AutoConfig.from_pretrained() for rich config introspection; classify architecture via transformers' own task mapping tables (MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, etc.) — no hardcoded lists; add per-arch input shape builders covering whisper decoder_input_ids, CLIP text_config seq_len, diffusion sample/timestep/encoder_hidden_states, MoE/seq2seq specific inputs; add field-based model_type inference fallback for configs missing model_type (e.g. prajjwal1/bert-tiny)
  • TemplateCodeGenerator: branch model loader by arch (AutoModelForSeq2SeqLM for seq2seq, UNet2DConditionModel for diffusion); add diffusion-specific script generation using positional args; inject inferred model_type into config when absent
  • LLMCodeFixer: extend _SYSTEM_PROMPT with MoE and diffusion input specs and error patterns; add MoE routing / UNet / seq2seq / GQA / audio fields to _extract_key_fields
  • GraphNetAgent: add _resolve_model_dir() to detect diffusers pipelines (model_index.json) and automatically redirect to unet/ subdir

Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq), whisper-tiny (audio), clip-vit-base-patch32 (multimodal), tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion)

… extraction

Extend GraphNet Agent to correctly identify and extract computation graphs
for a wider range of model architectures beyond basic text/vision models.

Key changes:
- ModelMetadata: add architecture_type field ("text"/"vision"/"seq2seq"/
  "audio"/"multimodal"/"diffusion"/"moe")
- ConfigMetadataAnalyzer: use AutoConfig.from_pretrained() for rich config
  introspection; classify architecture via transformers' own task mapping
  tables (MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES, etc.) — no hardcoded
  lists; add per-arch input shape builders covering whisper decoder_input_ids,
  CLIP text_config seq_len, diffusion sample/timestep/encoder_hidden_states,
  MoE/seq2seq specific inputs; add field-based model_type inference fallback
  for configs missing model_type (e.g. prajjwal1/bert-tiny)
- TemplateCodeGenerator: branch model loader by arch (AutoModelForSeq2SeqLM
  for seq2seq, UNet2DConditionModel for diffusion); add diffusion-specific
  script generation using positional args; inject inferred model_type into
  config when absent
- LLMCodeFixer: extend _SYSTEM_PROMPT with MoE and diffusion input specs and
  error patterns; add MoE routing / UNet / seq2seq / GQA / audio fields to
  _extract_key_fields
- GraphNetAgent: add _resolve_model_dir() to detect diffusers pipelines
  (model_index.json) and automatically redirect to unet/ subdir

Tested on: bert-tiny (text), convnextv2 (vision), t5-small (seq2seq),
whisper-tiny (audio), clip-vit-base-patch32 (multimodal),
tiny-random-MixtralForCausalLM (moe), tiny-stable-diffusion-pipe (diffusion)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 15, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Collaborator

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit dc25adf into PaddlePaddle:develop May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants