Skip to content

[None][doc] Gemma 4: usage examples#14303

Merged
Hudayday merged 5 commits into
NVIDIA:mainfrom
Hudayday:docs/gemma4-examples-readme
May 20, 2026
Merged

[None][doc] Gemma 4: usage examples#14303
Hudayday merged 5 commits into
NVIDIA:mainfrom
Hudayday:docs/gemma4-examples-readme

Conversation

@Hudayday
Copy link
Copy Markdown
Collaborator

@Hudayday Hudayday commented May 19, 2026

Summary

Append a new "Run Gemma 4" section to examples/models/core/gemma/README.md
covering the PyTorch backend flow that we actually develop and test against:

  • variant table (E2B / E4B / 26B-A4B / 31B with modalities)
  • trtllm-serve (OpenAI-compatible REST) launch + curl /v1/chat/completions
  • quickstart_multimodal.py for image and audio inputs (E2B / E4B only)
  • trtllm-eval mmmu and trtllm-eval covost2 entry points

Gemma 4 runs on the PyTorch backend with HF checkpoints loaded directly;
the legacy convert_checkpoint.py / trtllm-build flow is intentionally
omitted since it does not apply to Gemma 4.

Also registers the new section and its three subsections in the
top-of-file Table Of Contents.

Summary by CodeRabbit

  • Documentation
    • Added detailed Gemma 4 documentation including deployment setup, OpenAI-compatible API integration with usage examples, multimodal inference capabilities for image and audio processing, accuracy benchmarking and evaluation tools with sample commands, practical quick-start examples, and guidance on supported model variants.

Review Change Stack

Hudayday added 2 commits May 19, 2026 03:04
…lm-eval)

Append a "Run Gemma 4" section to examples/models/core/gemma/README.md
covering the PyTorch backend flow:
- variant table (E2B / E4B / 26B-A4B / 31B with modalities)
- trtllm-serve (OpenAI-compatible API) + curl example
- quickstart_multimodal.py for image and audio inputs
- trtllm-eval mmmu and covost2 entry points

Gemma 4 runs on the PyTorch backend with HF checkpoints loaded directly;
the legacy convert_checkpoint.py / trtllm-build flow is intentionally
omitted since it does not apply.

Signed-off-by: Hudayday <tianruih@nvidia.com>
Follow-up to de5e5b4: register the new "Run Gemma 4" section and
its three subsections in the Table Of Contents at the top of
examples/models/core/gemma/README.md.

Signed-off-by: Hudayday <tianruih@nvidia.com>
@Hudayday Hudayday requested review from a team as code owners May 19, 2026 10:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

This PR adds documentation for Gemma 4 model inference and evaluation. A new "Run Gemma 4" section describes launching trtllm-serve with Gemma 4 checkpoints, querying the OpenAI-compatible API, running multimodal inference, and evaluating accuracy using trtllm-eval on MMMU and CoVoST 2 benchmarks. The Table of Contents is updated to reflect these subsections.

Changes

Gemma 4 Documentation

Layer / File(s) Summary
Gemma 4 server and API documentation
examples/models/core/gemma/README.md
Table of Contents updated with new "Run Gemma 4" entry and sub-links. New section documents supported Gemma 4 HuggingFace variants/modalities, provides trtllm-serve launch examples using PyTorch backend, curl chat-completions query examples with internal chat-template handling, multimodal inference commands for image/audio via quickstart script, and trtllm-eval benchmark commands for MMMU and CoVoST 2 evaluation.

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed The description explains the change clearly (new README section for Gemma 4) and provides sufficient detail about what is being added.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title '[None][doc] Gemma 4: usage examples' is directly related to the main change, which is adding usage examples for Gemma 4 to the README documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/models/core/gemma/README.md`:
- Around line 853-854: The README's MMMU support line incorrectly lists
text-only checkpoints (26B-A4B and 31B) as supported for this vision
multiple-choice task; update the MMMU support matrix so it only lists
multimodal-capable checkpoints (e.g., E2B / E4B) and remove or mark 26B-A4B and
31B as "text-only / not supported" in the MMMU entry so the modality claim is
consistent with the earlier note about text-only checkpoints.
- Around line 833-845: The commands in the README invoke
quickstart_multimodal.py using a repo-root-relative path which will fail when
run from examples/models/core/gemma/; update the examples to either prepend the
correct relative path to quickstart_multimodal.py or add an explicit instruction
to cd to the repository root before running. Specifically edit the README
entries that reference quickstart_multimodal.py (the multimodal example lines)
to use a correct relative path from examples/models/core/gemma/ to the script or
add a preceding "cd" instruction so the quickstart_multimodal.py invocation
resolves correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bbb42d75-0cae-47d7-8bfb-b07cfcff52dd

📥 Commits

Reviewing files that changed from the base of the PR and between 989671b and a704288.

📒 Files selected for processing (1)
  • examples/models/core/gemma/README.md

Comment thread examples/models/core/gemma/README.md Outdated
Comment thread examples/models/core/gemma/README.md
Hudayday added 2 commits May 19, 2026 03:31
…age+video)

The previous variant table mislabeled 26B-A4B and 31B as text-only. All four
Gemma 4 checkpoints ship a vision tower covering image and video; only E2B
and E4B additionally ship an audio tower. Update the modality table, the MM
intro paragraph, the inference subsection, the table-of-contents anchor, and
add a video example to quickstart_multimodal.py invocations.

Signed-off-by: Tianrui Hu <tianruih@nvidia.com>
Signed-off-by: Hudayday <tianruih@nvidia.com>
The quickstart_multimodal.py invocation in the previous revision hits
`AttributeError: 'TrtllmAttentionMetadata' object has no attribute 'kv_layout'`
on the current Gemma 4 flow, so the example is not safe to ship as-is.
Remove the multimodal subsection and its table-of-contents entry; serve
and trtllm-eval examples remain unchanged.

Signed-off-by: Tianrui Hu <tianruih@nvidia.com>
Signed-off-by: Hudayday <tianruih@nvidia.com>
@Hudayday Hudayday changed the title [None][doc] Gemma 4: usage examples (trtllm-serve / multimodal / trtllm-eval) [None][doc] Gemma 4: usage examples May 19, 2026
Comment thread examples/models/core/gemma/README.md Outdated
Comment thread examples/models/core/gemma/README.md Outdated
Comment thread examples/models/core/gemma/README.md Outdated
@Hudayday Hudayday enabled auto-merge (squash) May 20, 2026 05:18
- Drop --backend pytorch from trtllm-serve / trtllm-eval examples;
  pytorch is the default backend now.
- Reword the apply_chat_template explanation per reviewer suggestion.

Signed-off-by: Hudayday <tianruih@nvidia.com>
@Hudayday
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "document updates, all commands tested"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49345 [ skip ] triggered by Bot. Commit: f5383b8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #49345 [ skip ] completed with state SUCCESS. Commit: f5383b8
Skipping testing for commit f5383b8

Link to invocation

@Hudayday Hudayday merged commit aac0d65 into NVIDIA:main May 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants