Skip to content

[Docs] Add supported model tables to pretrain_sft advanced tutorial#1728

Open
CyCle1024 wants to merge 1 commit intoInternLM:mainfrom
CyCle1024:docs/add-supported-models-table
Open

[Docs] Add supported model tables to pretrain_sft advanced tutorial#1728
CyCle1024 wants to merge 1 commit intoInternLM:mainfrom
CyCle1024:docs/add-supported-models-table

Conversation

@CyCle1024
Copy link
Copy Markdown
Collaborator

Summary

This PR adds comprehensive supported model documentation to the pretrain/sft advanced tutorial, replacing the previous "Coming soon..." placeholder.

Changes

  1. English model doc ()

    • Base Config Classes table (5 families)
    • Concrete Model Configs table (15 configs)
    • Compose Models section with base & concrete tables (10 configs)
    • Complete Inheritance Hierarchy tree covering both and branches
  2. Chinese model doc ()

    • Fully synchronized with the English version
  3. Skill for auto-sync ()

    • : workflow guide for AI to update docs when new model configs are added
    • : script to scan and discover all Config classes with inheritance info
    • Symlinked from for skill discovery

Supported models covered

  • Dense: Qwen2 (7B), Qwen3 (0.6B / 4B / 8B)
  • MoE: DeepSeek-V3, GPT-OSS (21B / 117B), Qwen3-MoE (30B / 235B / FoPE)
  • VL text backbones: Qwen3 VL Dense (4B / 8B), Qwen3 VL MoE (30B / 235B), Qwen3.5 VL MoE (35B)
  • Compose models: Qwen3-VL, Qwen3.5-VL, InternVL, InternS1

…ted-models skill

- Add model support tables to en/zh pretrain_sft advanced tutorial model.md
- Include base configs, concrete configs, compose models and inheritance hierarchy
- Add xtuner-sync-supported-models skill under .claude/skills/ with scan script
@CyCle1024
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment on lines +27 to +32
| `Qwen3VLTextDense4BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~4B parameters, for multimodal |
| `Qwen3VLTextDense8BConfig` | `Qwen3DenseConfig` | Dense (VL backbone) | ~8B parameters, for multimodal |
| `DeepSeekV3Config` | — | MoE | ~671B total / ~37B activated |
| `GptOss21BA3P6Config` | `GptOssConfig` | MoE | ~21B total / ~3.6B activated |
| `GptOss117BA5P8Config` | `GptOssConfig` | MoE | ~117B total / ~5.8B activated |
| `Qwen3MoE30BA3Config` | `Qwen3MoEConfig` | MoE | ~30B total / ~3B activated |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical — incorrect inheritance in table and tree

The "Base Class / Family" for these VL text backbone configs is listed as Qwen3DenseConfig / Qwen3MoEConfig, but the actual direct parents are the concrete configs:

Config Listed parent Actual parent
Qwen3VLTextDense4BConfig Qwen3DenseConfig Qwen3Dense4BConfig
Qwen3VLTextDense8BConfig Qwen3DenseConfig Qwen3Dense8BConfig
Qwen3VLTextMoE30BA3Config Qwen3MoEConfig Qwen3MoE30BA3Config
Qwen3VLTextMoE235BA22Config Qwen3MoEConfig Qwen3MoE235BA22Config

Source references:

The same error appears in the inheritance hierarchy tree at the bottom of this file (and in the Chinese version, the SKILL.md file).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude here we only find the base class of such config, and it can indicate the config family of the config, if you agreed with me, please resolve this conversation.

Comment on lines +72 to +82
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Critical — inheritance tree does not match actual code

The VL text backbone configs are shown as direct children of the family base, but they actually inherit from the concrete scale configs. The corrected tree should be:

Suggested change
├── TransformerConfig
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ ├── Qwen3Dense0P6BConfig
│ │ ├── Qwen3VLTextDense4BConfig
│ │ └── Qwen3VLTextDense8BConfig
│ └── MoE Models (via MoEConfig)
│ ├── Dense Models
│ │ ├── Qwen2DenseConfig
│ │ │ └── Qwen2Dense7BConfig
│ │ └── Qwen3DenseConfig
│ │ ├── Qwen3Dense8BConfig
│ │ │ └── Qwen3VLTextDense8BConfig
│ │ ├── Qwen3Dense4BConfig
│ │ │ └── Qwen3VLTextDense4BConfig
│ │ └── Qwen3Dense0P6BConfig

Similarly for the MoE section below, Qwen3VLTextMoE30BA3Config should be nested under Qwen3MoE30BA3Config, and Qwen3VLTextMoE235BA22Config under Qwen3MoE235BA22Config.

}


def scan_file(path: Path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Warning — missing type hints on function signatures

Per CLAUDE.md: "All new code must include type hints for function signatures (parameters and return types)."

scan_file and main are both missing return type annotations and parameter type hints:

Suggested change
def scan_file(path: Path):
def scan_file(path: Path) -> list[dict[str, str | list[str]]]:

And main should have -> None.

Comment on lines +53 to +55
children: dict[str, list[str]] = {}
for cfg in all_configs:
for p in cfg["parents"]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: Nit — RELEVANT_BASES set is redundant

The condition p in RELEVANT_BASES or p.endswith("Config") always takes the right branch for every class in the codebase since all relevant parent classes already end with Config. The RELEVANT_BASES set could be removed and the condition simplified to just p.endswith("Config").

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 28, 2026

Claude: ## Summary

Docs PR that replaces "Coming soon…" placeholders in the EN/ZH model advanced tutorial with comprehensive supported-model tables, inheritance hierarchy, and a Claude skill + scan script for keeping them in sync.

Issues

Critical

  • docs/en/pretrain_sft/advanced_tutorial/model.md & inheritance tree: Four VL text backbone configs have incorrect parent classes in both the Concrete Model table and the inheritance hierarchy tree. The same error is repeated in the Chinese model.md and in SKILL.md. Specifically:
    • Qwen3VLTextDense4BConfig inherits from Qwen3Dense4BConfig, not Qwen3DenseConfig
    • Qwen3VLTextDense8BConfig inherits from Qwen3Dense8BConfig, not Qwen3DenseConfig
    • Qwen3VLTextMoE30BA3Config inherits from Qwen3MoE30BA3Config, not Qwen3MoEConfig
    • Qwen3VLTextMoE235BA22Config inherits from Qwen3MoE235BA22Config, not Qwen3MoEConfig

Warning

Nit

Verdict

REQUEST_CHANGES — the inheritance hierarchy errors affect 4 classes across 3 files (EN doc, ZH doc, SKILL.md) and would mislead users about the actual model config structure.

@CyCle1024 CyCle1024 requested review from HAOCHENYE and jayhenry April 28, 2026 14:26
## 继承关系

下图展示了 `TrainEngine` 支持的所有配置类的完整继承层级,包括 `TransformerConfig` 和 `BaseComposeConfig` 两大分支。

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants