[Docs] Add supported model tables to pretrain_sft advanced tutorial#1728
[Docs] Add supported model tables to pretrain_sft advanced tutorial#1728CyCle1024 wants to merge 1 commit intoInternLM:mainfrom
Conversation
…ted-models skill - Add model support tables to en/zh pretrain_sft advanced tutorial model.md - Include base configs, concrete configs, compose models and inheritance hierarchy - Add xtuner-sync-supported-models skill under .claude/skills/ with scan script
|
@claude review |
| | `Qwen3VLTextDense4BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~4B parameters, for multimodal | | ||
| | `Qwen3VLTextDense8BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~8B parameters, for multimodal | | ||
| | `DeepSeekV3Config` | — | MoE | ~671B total / ~37B activated | | ||
| | `GptOss21BA3P6Config` | `GptOssConfig` | MoE | ~21B total / ~3.6B activated | | ||
| | `GptOss117BA5P8Config` | `GptOssConfig` | MoE | ~117B total / ~5.8B activated | | ||
| | `Qwen3MoE30BA3Config` | `Qwen3MoEConfig` | MoE | ~30B total / ~3B activated | |
There was a problem hiding this comment.
Claude: Critical — incorrect inheritance in table and tree
The "Base Class / Family" for these VL text backbone configs is listed as Qwen3DenseConfig / Qwen3MoEConfig, but the actual direct parents are the concrete configs:
| Config | Listed parent | Actual parent |
|---|---|---|
Qwen3VLTextDense4BConfig |
Qwen3DenseConfig |
Qwen3Dense4BConfig |
Qwen3VLTextDense8BConfig |
Qwen3DenseConfig |
Qwen3Dense8BConfig |
Qwen3VLTextMoE30BA3Config |
Qwen3MoEConfig |
Qwen3MoE30BA3Config |
Qwen3VLTextMoE235BA22Config |
Qwen3MoEConfig |
Qwen3MoE235BA22Config |
Source references:
xtuner/v1/model/dense/qwen3vl_text.py:96:class Qwen3VLTextDense4BConfig(Qwen3Dense4BConfig)xtuner/v1/model/moe/qwen3vl_text.py:234:class Qwen3VLTextMoE30BA3Config(Qwen3MoE30BA3Config)
The same error appears in the inheritance hierarchy tree at the bottom of this file (and in the Chinese version, the SKILL.md file).
There was a problem hiding this comment.
@claude here we only find the base class of such config, and it can indicate the config family of the config, if you agreed with me, please resolve this conversation.
| ├── TransformerConfig | ||
| │ ├── Dense Models | ||
| │ │ ├── Qwen2DenseConfig | ||
| │ │ │ └── Qwen2Dense7BConfig | ||
| │ │ └── Qwen3DenseConfig | ||
| │ │ ├── Qwen3Dense8BConfig | ||
| │ │ ├── Qwen3Dense4BConfig | ||
| │ │ ├── Qwen3Dense0P6BConfig | ||
| │ │ ├── Qwen3VLTextDense4BConfig | ||
| │ │ └── Qwen3VLTextDense8BConfig | ||
| │ └── MoE Models (via MoEConfig) |
There was a problem hiding this comment.
Claude: Critical — inheritance tree does not match actual code
The VL text backbone configs are shown as direct children of the family base, but they actually inherit from the concrete scale configs. The corrected tree should be:
| ├── TransformerConfig | |
| │ ├── Dense Models | |
| │ │ ├── Qwen2DenseConfig | |
| │ │ │ └── Qwen2Dense7BConfig | |
| │ │ └── Qwen3DenseConfig | |
| │ │ ├── Qwen3Dense8BConfig | |
| │ │ ├── Qwen3Dense4BConfig | |
| │ │ ├── Qwen3Dense0P6BConfig | |
| │ │ ├── Qwen3VLTextDense4BConfig | |
| │ │ └── Qwen3VLTextDense8BConfig | |
| │ └── MoE Models (via MoEConfig) | |
| │ ├── Dense Models | |
| │ │ ├── Qwen2DenseConfig | |
| │ │ │ └── Qwen2Dense7BConfig | |
| │ │ └── Qwen3DenseConfig | |
| │ │ ├── Qwen3Dense8BConfig | |
| │ │ │ └── Qwen3VLTextDense8BConfig | |
| │ │ ├── Qwen3Dense4BConfig | |
| │ │ │ └── Qwen3VLTextDense4BConfig | |
| │ │ └── Qwen3Dense0P6BConfig |
Similarly for the MoE section below, Qwen3VLTextMoE30BA3Config should be nested under Qwen3MoE30BA3Config, and Qwen3VLTextMoE235BA22Config under Qwen3MoE235BA22Config.
| } | ||
|
|
||
|
|
||
| def scan_file(path: Path): |
There was a problem hiding this comment.
Claude: Warning — missing type hints on function signatures
Per CLAUDE.md: "All new code must include type hints for function signatures (parameters and return types)."
scan_file and main are both missing return type annotations and parameter type hints:
| def scan_file(path: Path): | |
| def scan_file(path: Path) -> list[dict[str, str | list[str]]]: |
And main should have -> None.
| children: dict[str, list[str]] = {} | ||
| for cfg in all_configs: | ||
| for p in cfg["parents"]: |
There was a problem hiding this comment.
Claude: Nit — RELEVANT_BASES set is redundant
The condition p in RELEVANT_BASES or p.endswith("Config") always takes the right branch for every class in the codebase since all relevant parent classes already end with Config. The RELEVANT_BASES set could be removed and the condition simplified to just p.endswith("Config").
|
Claude: ## Summary Docs PR that replaces "Coming soon…" placeholders in the EN/ZH model advanced tutorial with comprehensive supported-model tables, inheritance hierarchy, and a Claude skill + scan script for keeping them in sync. IssuesCritical
Warning
Nit
VerdictREQUEST_CHANGES — the inheritance hierarchy errors affect 4 classes across 3 files (EN doc, ZH doc, SKILL.md) and would mislead users about the actual model config structure. |
| ## 继承关系 | ||
|
|
||
| 下图展示了 `TrainEngine` 支持的所有配置类的完整继承层级,包括 `TransformerConfig` 和 `BaseComposeConfig` 两大分支。 | ||
|
|
Summary
This PR adds comprehensive supported model documentation to the pretrain/sft advanced tutorial, replacing the previous "Coming soon..." placeholder.
Changes
English model doc ()
Chinese model doc ()
Skill for auto-sync ()
Supported models covered