Skip to content

[Usage]: [Qwen2-VL] Inquiry Regarding the Usage of cross_attention_mask Input in C++ Runtime #7677

@deadpoppy

Description

@deadpoppy

System Info

System Information:

  • OS:
  • Python version:
  • CUDA version:
  • GPU model(s): H800
  • Driver version:
  • TensorRT-LLM version: 0.21

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM

Problem Description
I am attempting to implement a special computation for the Qwen2-VL model, which requires dynamically passing a custom attention_mask during inference.

After reviewing the code, I found that the forward method in the Python-side model definition (tensorrt_llm/models/qwen/model.py#L177) does indeed have an attention_mask parameter. However, in the request interface of TensorRT-LLM's C++ Runtime (tensorrt_llm::executor::Executor), there is no directly exposed field for attention_mask.

As an alternative, I discovered the cross_attention_mask field (e.g., in tensorrt_llm::executor::Request::input). I have successfully modified the model definition of qwen2-vl (model.py) to allow its forward method to receive and process the cross_attention_mask parameter.

However, when I create a request in C++ and pass in cross_attention_mask, the model does not seem to receive this input, or I am unsure how to pass it correctly.

My questions are:

Is the input name cross_attention_mask in the C++ Runtime officially supported? Or is it an internal implementation detail?

If supported, what is its expected tensor shape and data type? (e.g., [batch_size, seq_len] or [batch_size, 1, seq_len, seq_len]? Data type int32 or bool?)

After passing cross_attention_mask, will it affect the model's behavior, and what are the potential impacts?

How should cross_attention_mask be used in the C++ Runtime? Are there any additional settings required to enable it?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

LLM API<NV>High-level LLM Python API & tools (e.g., trtllm-llmapi-launch) for TRTLLM inference/workflows.questionFurther information is requestedtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions