Skip to content

Add qwen3#1827

Merged
oyilmaz-nvidia merged 11 commits intoNVIDIA-NeMo:mainfrom
oyilmaz-nvidia:onur/add-qwen3-model
Apr 21, 2026
Merged

Add qwen3#1827
oyilmaz-nvidia merged 11 commits intoNVIDIA-NeMo:mainfrom
oyilmaz-nvidia:onur/add-qwen3-model

Conversation

@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor

Summary

  • Add qwen3 as a supported captioning variant alongside qwen2.5 (replacing the generic qwen alias)
  • qwen_vl.py: Add Qwen/Qwen3-VL-8B-Instruct (rev 0c351dd) and a per-variant _QWEN_VL_PIXEL_PARAMS dict; pixel budget params (image_factor, min_pixels, max_pixels, video_*_pixels) are now merged into mm_processor_kwargs on setup(). Qwen2.5 uses factor-28 params; Qwen3 uses factor-32.
  • qwen_lm.py: Add Qwen/Qwen3-14B (rev 8268fe3) as the qwen3 LM variant via _QWEN_LM_VARIANTS_INFO; model_variant param added to constructor and download_weights_on_node.
  • prompt_formatter.py: Replace "qwen" entry with "qwen2.5" / "qwen3" in VARIANT_MAPPING; generate_inputs() check updated accordingly.
  • caption_generation.py: Variant check uses _QWEN_VARIANTS_INFO membership instead of == "qwen"; download_weights_on_node now receives the variant.
  • caption_enhancement.py: Add captioning_model_variant field so the enhancement stage reads captions from the correct variant key (previously hardcoded to "qwen"); LM variant check uses _QWEN_LM_VARIANTS_INFO.
  • video_split_clip_example.py: --captioning-algorithm and --enhance-captions-algorithm choices updated to ["qwen2.5", "qwen3", ...]; captioning_model_variant wired through to CaptionEnhancementStage.
  • All unit and integration tests updated to reflect the renamed variants and new pixel-param expectations.

Test plan

  • uv run pytest tests/models/test_qwen_vl.py tests/models/test_qwen_lm.py tests/models/test_prompt_formatter.py -v
  • uv run pytest tests/stages/video/caption/ -v -m "not gpu"
  • GPU integration: uv run pytest tests/stages/video/caption/ -v -m gpu (via nemo_curator_benchmarking:20260414175401UTC)
  • uv run ruff check . && uv run ruff format --check .
  • python tutorials/video/getting-started/video_split_clip_example.py --help — verify qwen2.5 and qwen3 appear in --captioning-algorithm choices

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia oyilmaz-nvidia requested review from a team, abhinavg4 and suiyoubi as code owners April 17, 2026 17:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR adds qwen3 as a supported captioning and enhancement variant alongside qwen2.5 (replacing the old generic "qwen" alias), wiring per-variant pixel parameters, model IDs, and revision hashes throughout the VL model, LM model, stage, and tutorial layers.

  • P1: QwenLM.model_id_names is implemented as a plain method but ModelInterface declares it @property @abc.abstractmethod; any call-site accessing it as a property will receive a bound method instead of list[str].

Confidence Score: 4/5

Core rename and qwen3 wiring is correct; one new P1 (missing @Property on model_id_names) and two previously-flagged open P1s (hardcoded 'qwen_lm' enhanced_caption key, Qwen3 thinking mode) should be resolved before merge.

Previously flagged stale defaults are fixed. A new P1 is found (model_id_names missing @Property). Two prior P1 findings (hardcoded enhanced_caption key, Qwen3 think-block contamination) remain unaddressed in the diff.

nemo_curator/models/qwen_lm.py (missing @Property), nemo_curator/stages/video/caption/caption_enhancement.py (hardcoded 'qwen_lm' key)

Important Files Changed

Filename Overview
nemo_curator/models/qwen_lm.py Adds qwen3 LM variant via _QWEN_LM_VARIANTS_INFO, new _weights_complete helper, and download_weights_on_node variant param; model_id_names missing @Property decorator that the abstract base requires.
nemo_curator/models/qwen_vl.py Adds Qwen3-VL-8B-Instruct model, per-variant pixel params dict (_QWEN_VL_PIXEL_PARAMS), and variant-aware download; correctly imports _weights_complete from qwen_lm.
nemo_curator/models/prompt_formatter.py Replaces "qwen" with "qwen2.5"/"qwen3" in VARIANT_MAPPING; _generate_qwen_inputs now receives fps and builds the (ndarray, metadata) tuple; docstring still references the removed "qwen" variant.
nemo_curator/stages/video/caption/caption_enhancement.py Adds captioning_model_variant field and updates LM variant check to use _QWEN_LM_VARIANTS_INFO; enhanced_caption key is still hardcoded to "qwen_lm" regardless of model_variant (previously flagged).
nemo_curator/stages/video/caption/caption_generation.py model_variant default correctly updated to "qwen2.5"; variant check uses _QWEN_VARIANTS_INFO membership; download_weights_on_node receives variant — looks correct.
nemo_curator/stages/video/caption/caption_preparation.py model_variant default correctly updated to "qwen2.5"; logic unchanged otherwise — no issues found.
tutorials/video/getting-started/video_split_clip_example.py --captioning-algorithm and --enhance-captions-algorithm choices updated to include qwen3; captioning_model_variant wired to CaptionEnhancementStage; --enhanced-caption-models still hardcoded to "qwen_lm".

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CaptionPreparationStage<br/>model_variant: qwen2.5 or qwen3"] -->|PromptFormatter| B["_generate_qwen_inputs<br/>builds fps + frames metadata tuple"]
    B --> C["window.llm_inputs[model_variant]"]
    C --> D["CaptionGenerationStage<br/>model_variant: qwen2.5 or qwen3"]
    D -->|"QwenVL.setup<br/>applies _QWEN_VL_PIXEL_PARAMS per variant"| E["window.caption[model_variant]"]
    E --> F["CaptionEnhancementStage<br/>reads caption via captioning_model_variant"]
    F -->|"QwenLM.generate<br/>apply_chat_template"| G["window.enhanced_caption<br/>hardcoded 'qwen_lm' key"]
    subgraph VL_Model["Vision-Language Model"]
        H["QwenVL<br/>_QWEN_VARIANTS_INFO<br/>_QWEN_VL_PIXEL_PARAMS"]
    end
    subgraph LM_Model["Language Model"]
        I["QwenLM<br/>_QWEN_LM_VARIANTS_INFO<br/>model_id_names missing @property"]
    end
    D -.->|initialises| H
    F -.->|initialises| I
Loading

Reviews (11): Last reviewed commit: "Merge branch 'main' into onur/add-qwen3-..." | Re-trigger Greptile

clip_idx, window_idx = mapping[idx]
original_caption = video.clips[clip_idx].windows[window_idx].caption["qwen"]
original_caption = video.clips[clip_idx].windows[window_idx].caption[self.captioning_model_variant]
video.clips[clip_idx].windows[window_idx].enhanced_caption["qwen_lm"] = result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded "qwen_lm" key in enhanced_caption

Every other caption/input key in this pipeline was migrated from the old "qwen" string to the variant-aware value, but the enhanced-caption key is still hardcoded to "qwen_lm" regardless of whether model_variant is "qwen2.5" or "qwen3". This is inconsistent with the surrounding design, and any downstream consumer that reads enhanced_caption["qwen3_lm"] (or a similarly named key) will get a KeyError. Consider using f"{self.model_variant}_lm" to keep it variant-aware.

@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 3de2e4e

_QWEN_LM_MODEL_REVISION = "cf98f3b"
_QWEN_LM_VARIANTS_INFO: Final = {
"qwen2.5": ("Qwen/Qwen2.5-14B-Instruct", "cf98f3b"),
"qwen3": ("Qwen/Qwen3-14B", "f8c293d"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Qwen3-14B revision hash mismatch with PR description

The PR description states rev 8268fe3 for Qwen/Qwen3-14B, but the code pins f8c293d. These are different commit SHAs — one of them is incorrect, and download_weights_on_node will fetch whichever commit hash is in this dict. Please confirm the intended pinned revision matches the model checkpoint you tested against.

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 2398340

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 9b02695

Copy link
Copy Markdown
Contributor

@suiyoubi suiyoubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your effort!

@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 9b02695

@oyilmaz-nvidia
Copy link
Copy Markdown
Contributor Author

/ok to test 5731502

Comment on lines 66 to +68
def model_id_names(self) -> list[str]:
return [_QWEN_LM_MODEL_ID]
model_id, _ = _QWEN_LM_VARIANTS_INFO[self.model_variant]
return [model_id]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 model_id_names missing @property decorator

ModelInterface declares model_id_names as @property @abc.abstractmethod. QwenVL correctly overrides it with @property, but QwenLM implements it as a plain method. Any call-site that accesses model.model_id_names without parentheses (the property convention) will receive the bound-method object rather than a list[str], silently returning wrong data.

Suggested change
def model_id_names(self) -> list[str]:
return [_QWEN_LM_MODEL_ID]
model_id, _ = _QWEN_LM_VARIANTS_INFO[self.model_variant]
return [model_id]
@property
def model_id_names(self) -> list[str]:
model_id, _ = _QWEN_LM_VARIANTS_INFO[self.model_variant]
return [model_id]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants