Skip to content

[TRTLLM-11523][feat] Handle different chat template types#12336

Merged
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-chat-template
Apr 4, 2026
Merged

[TRTLLM-11523][feat] Handle different chat template types#12336
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-chat-template

Conversation

@2ez4bz
Copy link
Copy Markdown
Collaborator

@2ez4bz 2ez4bz commented Mar 19, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced content format detection system for multimodal models, enabling automatic and explicit specification of how chat templates handle content across different models.
    • Added support for interleaved multimodal placeholder insertion in text content.
    • Implemented content format registry for transparent model-specific configuration.
  • Tests

    • Added comprehensive test coverage for content format detection and multimodal placeholder handling.

Description

  • Why?

Previously, the multimodal placeholder insertion was dictated by hardcoded exception lists, which add cognitive burden when onboarding new models, and did not account for the fact that different versions of the same model architecture could have different types of chat templates that require different handling.

In addition, all placeholders were either added before or after the text, instead of possibly interleaved.

  • What?

This commit addresses the above gaps by mimicking what is done in vLLM.

To that end, it:

  1. introduces content format detection based on Jinja AST inspection of the chat template.
  2. preserves the interleaved positions of text and media items during message parsing.
  3. dispatches to the appropriate logic based on the (possibly auto-detected) content format before applying the chat template: either the template handles multimodal content natively (OpenAI-style dicts), or expects plain strings with placeholders pre-inserted.
  4. inserts the multimodal placeholders.
  5. validates that the expected count is met.

Models can also explicitly declare their content_format during registration if they are only meant to support one.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@2ez4bz 2ez4bz force-pushed the dev-chat-template branch 2 times, most recently from 6f59a91 to 07962d4 Compare March 19, 2026 19:12
@2ez4bz 2ez4bz marked this pull request as ready for review March 19, 2026 19:12
@2ez4bz 2ez4bz requested review from a team as code owners March 19, 2026 19:12
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces a content-format-driven system to replace hardcoded chat template exceptions. It adds a ContentFormat enum (OPENAI, STRING, PASSTHROUGH) with Jinja-template-based auto-detection, refactors multimodal placeholder handling to support interleaved insertion, and updates model registrations to declare explicit content formats.

Changes

Cohort / File(s) Summary
Core Content Format Infrastructure
tensorrt_llm/inputs/content_format.py, tensorrt_llm/inputs/__init__.py
Introduces ContentFormat enum with three modes, implements detect_content_format() function that parses Jinja templates to identify multimodal dictionary iteration patterns, and re-exports at package level for public API.
Registry and Metadata
tensorrt_llm/inputs/registry.py
Adds content_format: Optional[ContentFormat] field to MultimodalPlaceholderMetadata and introduces get_content_format() method to retrieve registered format for a model type.
Multimodal Utilities Refactoring
tensorrt_llm/inputs/utils.py
Replaces hardcoded exception lists with content-format-driven logic; refactors ConversationMessage to include content_parts for tracking interleaved text/media; adds interleave_mm_placeholders(), _build_openai_content(), and _validate_and_fix_placeholders() functions; updates apply_chat_template() and default_multimodal_input_loader() to dispatch on resolved content format.
Chat Message Processing
tensorrt_llm/serve/chat_utils.py
Extends parse_chat_message_content_parts() to construct ordered content_parts sequence alongside flattened content; updates parse_chat_messages_coroutines() to select placeholder strategy using registered content format and conditionally invoke interleave_mm_placeholders() for STRING format models.
Model-Specific Registrations
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_flash.py, tensorrt_llm/_torch/models/modeling_llava_next.py, tensorrt_llm/_torch/models/modeling_mistral.py, tensorrt_llm/_torch/models/modeling_vila.py
Updates multimodal placeholder metadata to include explicit content_format parameter (OPENAI or PASSTHROUGH) in register_input_processor() decorators, replacing previous exception-based handling.
New Test Coverage
tests/unittest/inputs/test_content_format.py, tests/unittest/inputs/test_chat_template_dispatch.py
Adds comprehensive tests for detect_content_format() Jinja parsing logic, ContentFormat enum construction, and validates _resolve_content_format(), _build_openai_content(), interleave_mm_placeholders(), and _validate_and_fix_placeholders() across explicit and auto-detected formats.

Sequence Diagram

sequenceDiagram
    participant Chat as Chat Handler
    participant Registry as Placeholder Registry
    participant Resolver as Content Format Resolver
    participant Template as Template Processor
    participant Placeholder as Placeholder Manager

    Chat->>Registry: get_content_format(model_type)
    Registry-->>Chat: ContentFormat (OPENAI/STRING/PASSTHROUGH/None)
    
    Chat->>Resolver: _resolve_content_format(format, template)
    alt Explicit Format
        Resolver-->>Chat: ContentFormat (explicit)
    else Auto-detect (None)
        Resolver->>Template: parse_jinja_template(template)
        Template->>Template: _ast_has_content_iteration()
        alt Has multimodal iteration
            Template-->>Resolver: OPENAI
        else Plain content
            Template-->>Resolver: STRING
        end
        Resolver-->>Chat: Detected ContentFormat
    end
    
    alt PASSTHROUGH
        Chat->>Template: skip_template_processing()
    else OPENAI
        Chat->>Placeholder: _build_openai_content(message)
        Placeholder-->>Chat: rebuild content with dicts
        Chat->>Template: render_with_openai_dicts()
    else STRING
        Chat->>Placeholder: interleave_mm_placeholders()
        Placeholder-->>Chat: inject placeholders at media positions
        Chat->>Template: render_with_placeholders()
    end
    
    Template-->>Chat: rendered_output
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly summarizes the main feature: introducing support for handling different chat template types, which aligns with the primary change introducing content format detection and dispatch.
Description check ✅ Passed The PR description clearly explains the problem (hardcoded exceptions, lack of interleaving), the solution (content format detection, AST inspection, dispatch), and includes the required PR checklist with confirmation. Test coverage specifics are omitted but overall structure is complete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.

Add a .trivyignore file to your project to customize which findings Trivy reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
tensorrt_llm/inputs/utils.py (3)

732-741: Consider extracting modality inference logic to a shared helper.

The logic to infer modality from placeholder string (checking for "video", "audio", "so_embedding" in lowercase) is duplicated between _build_openai_content (lines 734-739) and interleave_mm_placeholders (lines 577-584). Extracting this to a small helper function would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/utils.py` around lines 732 - 741, Extract the modality
inference logic duplicated in _build_openai_content and
interleave_mm_placeholders into a small helper function (e.g.,
infer_modality_from_placeholder) that accepts a placeholder string and returns
"image", "video", or "audio"; replace the inline checks in both functions to
call this helper (use mm_placeholder_count and placeholder variables as inputs)
so both sites use the same implementation and avoid duplicated lowercase checks
for "video", "audio", and "so_embedding".

663-670: Potential false positive with substring matching in placeholder count.

rendered_text.count(placeholder) may over-count if the placeholder string appears as a substring of another token. For example, if placeholder is "<image>" and text contains "<image_special>", both would be counted.

Consider using a regex with word boundaries or a more precise matching approach if placeholder substrings are a concern in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/utils.py` around lines 663 - 670, The current count using
rendered_text.count(placeholder) can over-count when a placeholder appears as a
substring of another token; change the counting to use a regex-based exact-token
match: escape the placeholder with re.escape and build a pattern that matches
the placeholder as a standalone token (e.g., using lookarounds like (?<!\S) and
(?!\S) or appropriate word-boundary logic), then compute actual_count =
len(re.findall(pattern, rendered_text)). Update the block that references
placeholder, expected_count, actual_count, rendered_text, and model_type to use
this regex count (and add import re if needed).

800-804: Add strict=True to zip() to catch length mismatches.

The zip() call assumes conversation and mm_placeholder_counts have the same length. Adding strict=True (Python 3.10+) would raise a ValueError if the lengths differ, catching potential bugs early rather than silently truncating.

-        for conv, mm_placeholder_count in zip(conversation,
-                                              mm_placeholder_counts):
+        for conv, mm_placeholder_count in zip(conversation,
+                                              mm_placeholder_counts,
+                                              strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/inputs/utils.py` around lines 800 - 804, The zip over
conversation and mm_placeholder_counts in the loop can silently truncate if
lengths differ; update the zip(...) call used with conversation and
mm_placeholder_counts inside the loop (where _build_openai_content is invoked)
to use zip(..., strict=True) so a ValueError is raised on length mismatch,
catching bugs early (requires Python 3.10+).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/inputs/content_format.py`:
- Around line 109-116: Replace the insecure and overly broad parse block by
constructing the Jinja2 Environment with autoescaping enabled (e.g., call
Environment(autoescape=True) or use select_autoescape) instead of Environment(),
and narrow the except to only the specific parse exception (catch
jinja2.exceptions.TemplateSyntaxError) so that _ast_has_content_iteration(ast)
is still checked and on TemplateSyntaxError the function falls back to returning
STRING; update imports to reference TemplateSyntaxError and adjust the
try/except around env.parse accordingly while leaving the ContentFormat.OPENAI
return intact.

In `@tensorrt_llm/serve/chat_utils.py`:
- Around line 316-324: parse_chat_messages_coroutines currently resolves content
format using only MULTIMODAL_PLACEHOLDER_REGISTRY (defaulting to
ContentFormat.STRING), which can conflict with _resolve_content_format
(inputs/utils.py) that auto-detects Jinja/OPENAI formats; update
parse_chat_messages_coroutines to accept either the chat_template or a
pre-resolved content_format flag and call _resolve_content_format (or use the
passed format) instead of registry-only logic so its placeholder strategy
matches apply_chat_template and _build_openai_content, or alternatively add a
clear comment/docstring in parse_chat_messages_coroutines explaining why
registry-only fallback is safe and referencing MULTIMODAL_PLACEHOLDER_REGISTRY
and _resolve_content_format.

In `@tests/unittest/inputs/test_content_format.py`:
- Around line 92-98: The test_caching currently only checks identity (result1 is
result2) which will pass due to enum singletons; instead verify the actual
lru_cache behavior by inspecting detect_content_format.cache_info() before and
after calls: call detect_content_format.cache_info() to capture initial
hits/misses, invoke detect_content_format(template) twice (or once then again)
and assert that cache_info().hits increased (or misses decreased appropriately)
to prove a cache hit, and optionally call detect_content_format.cache_clear() at
the start to ensure a clean state; reference the test method name test_caching
and the function detect_content_format and its cache_info()/cache_clear()
methods when making the changes.

---

Nitpick comments:
In `@tensorrt_llm/inputs/utils.py`:
- Around line 732-741: Extract the modality inference logic duplicated in
_build_openai_content and interleave_mm_placeholders into a small helper
function (e.g., infer_modality_from_placeholder) that accepts a placeholder
string and returns "image", "video", or "audio"; replace the inline checks in
both functions to call this helper (use mm_placeholder_count and placeholder
variables as inputs) so both sites use the same implementation and avoid
duplicated lowercase checks for "video", "audio", and "so_embedding".
- Around line 663-670: The current count using rendered_text.count(placeholder)
can over-count when a placeholder appears as a substring of another token;
change the counting to use a regex-based exact-token match: escape the
placeholder with re.escape and build a pattern that matches the placeholder as a
standalone token (e.g., using lookarounds like (?<!\S) and (?!\S) or appropriate
word-boundary logic), then compute actual_count = len(re.findall(pattern,
rendered_text)). Update the block that references placeholder, expected_count,
actual_count, rendered_text, and model_type to use this regex count (and add
import re if needed).
- Around line 800-804: The zip over conversation and mm_placeholder_counts in
the loop can silently truncate if lengths differ; update the zip(...) call used
with conversation and mm_placeholder_counts inside the loop (where
_build_openai_content is invoked) to use zip(..., strict=True) so a ValueError
is raised on length mismatch, catching bugs early (requires Python 3.10+).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cb64dcc0-c6aa-4bd0-b922-0a4cb6b02bac

📥 Commits

Reviewing files that changed from the base of the PR and between e37493a and 07962d4.

📒 Files selected for processing (11)
  • tensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_flash.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
  • tensorrt_llm/_torch/models/modeling_mistral.py
  • tensorrt_llm/_torch/models/modeling_vila.py
  • tensorrt_llm/inputs/__init__.py
  • tensorrt_llm/inputs/content_format.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/inputs/utils.py
  • tensorrt_llm/serve/chat_utils.py
  • tests/unittest/inputs/test_chat_template_dispatch.py
  • tests/unittest/inputs/test_content_format.py

@2ez4bz 2ez4bz force-pushed the dev-chat-template branch 3 times, most recently from 3b47046 to 1077763 Compare March 19, 2026 22:25
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Mar 19, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39642 [ run ] triggered by Bot. Commit: 1077763 Link to invocation

Copy link
Copy Markdown
Collaborator

@jdebache jdebache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that this will simply preserve the existing behaviour for Mistral models?

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39642 [ run ] completed with state SUCCESS. Commit: 1077763
/LLM/main/L0_MergeRequest_PR pipeline #30847 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Mar 20, 2026

Is it correct that this will simply preserve the existing behaviour for Mistral models?

@hypdeb that is definitely the intent. I will run some A/B tests for some of the models to print the rendered prompts before and after the change and update the PR description.

Copy link
Copy Markdown
Collaborator

@JunyiXu-nv JunyiXu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Is there any e2e tests could validate these changes?

@2ez4bz 2ez4bz force-pushed the dev-chat-template branch 2 times, most recently from 4f31be0 to 679381b Compare April 2, 2026 19:56
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 2, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41499 [ run ] triggered by Bot. Commit: 679381b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41499 [ run ] completed with state SUCCESS. Commit: 679381b
/LLM/main/L0_MergeRequest_PR pipeline #32417 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@2ez4bz 2ez4bz enabled auto-merge (squash) April 3, 2026 04:26
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41587 [ run ] triggered by Bot. Commit: 679381b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41587 [ run ] completed with state FAILURE. Commit: 679381b

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41598 [ run ] triggered by Bot. Commit: 679381b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41598 [ run ] completed with state SUCCESS. Commit: 679381b
/LLM/main/L0_MergeRequest_PR pipeline #32506 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

* Why?

Previously, the multimodal placeholder insertion was dictated by
hardcoded exception lists, which add cognitive burden when onboarding
new models, and did not account for the fact that different versions
of the same model architecture could have different types of chat
templates that require different handling.

In addition, all placeholders were either added before or after the
text, instead of possibly interleaved.

* What?

This commit addresses the above gaps by mimicking what is done in vLLM.

To that end, it:

1. introduces content format detection based on Jinja AST inspection of
   the chat template.
2. preserves the interleaved positions of text and media items during
   message parsing.
3. dispatches to the appropriate logic based on the (possibly
   auto-detected) content format before applying the chat template:
   either the template handles multimodal content natively (OpenAI-style
   dicts), or expects plain strings with placeholders pre-inserted.
4. inserts the multimodal placeholders.
5. validates that the expected count is met.

Models can also explicitly declare their `content_format` during
registration if they are only meant to support one.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz 2ez4bz force-pushed the dev-chat-template branch from 679381b to 3802ba7 Compare April 3, 2026 06:43
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41613 [ run ] triggered by Bot. Commit: 3802ba7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41613 [ run ] completed with state SUCCESS. Commit: 3802ba7
/LLM/main/L0_MergeRequest_PR pipeline #32522 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yechank-nvidia
Copy link
Copy Markdown
Collaborator

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41639 [ run ] triggered by Bot. Commit: 3802ba7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41639 [ run ] completed with state SUCCESS. Commit: 3802ba7
/LLM/main/L0_MergeRequest_PR pipeline #32546 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41714 [ run ] triggered by Bot. Commit: 3802ba7 Link to invocation

Copy link
Copy Markdown
Collaborator

@amukkara amukkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving phi model changes

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41714 [ run ] completed with state SUCCESS. Commit: 3802ba7
/LLM/main/L0_MergeRequest_PR pipeline #32615 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41758 [ run ] triggered by Bot. Commit: 3802ba7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41758 [ run ] completed with state SUCCESS. Commit: 3802ba7
/LLM/main/L0_MergeRequest_PR pipeline #32657 completed with status: 'SUCCESS'

CI Report

Link to invocation

@2ez4bz 2ez4bz merged commit 82c5102 into NVIDIA:main Apr 4, 2026
5 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
* Why?

Previously, the multimodal placeholder insertion was dictated by
hardcoded exception lists, which add cognitive burden when onboarding
new models, and did not account for the fact that different versions
of the same model architecture could have different types of chat
templates that require different handling.

In addition, all placeholders were either added before or after the
text, instead of possibly interleaved.

* What?

This commit addresses the above gaps by mimicking what is done in vLLM.

To that end, it:

1. introduces content format detection based on Jinja AST inspection of
   the chat template.
2. preserves the interleaved positions of text and media items during
   message parsing.
3. dispatches to the appropriate logic based on the (possibly
   auto-detected) content format before applying the chat template:
   either the template handles multimodal content natively (OpenAI-style
   dicts), or expects plain strings with placeholders pre-inserted.
4. inserts the multimodal placeholders.

Models can also explicitly declare their `content_format` during
registration if they are only meant to support one.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
* Why?

Previously, the multimodal placeholder insertion was dictated by
hardcoded exception lists, which add cognitive burden when onboarding
new models, and did not account for the fact that different versions
of the same model architecture could have different types of chat
templates that require different handling.

In addition, all placeholders were either added before or after the
text, instead of possibly interleaved.

* What?

This commit addresses the above gaps by mimicking what is done in vLLM.

To that end, it:

1. introduces content format detection based on Jinja AST inspection of
   the chat template.
2. preserves the interleaved positions of text and media items during
   message parsing.
3. dispatches to the appropriate logic based on the (possibly
   auto-detected) content format before applying the chat template:
   either the template handles multimodal content natively (OpenAI-style
   dicts), or expects plain strings with placeholders pre-inserted.
4. inserts the multimodal placeholders.

Models can also explicitly declare their `content_format` during
registration if they are only meant to support one.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.