Skip to content

Lookahead decoding and multimodal input support #3137

@maxilevi

Description

@maxilevi

Hi,

I get the following error when:

  • Lookahead decoding is enabled
  • Request has multimodal input (e.g. just custom prompt table with fake vocabulary)
  • batch size > 1
  • Inflight fused batching is enabled

Model is Llama 8B.

[TensorRT-LLM][ERROR] IExecutionContext::inferShapes: Error Code 7: Internal Error (LLaMAForCausalLM/transformer/vocab_embedding/__add___L322/elementwise_binary_L2901/ELEMENTWISE_SUM_0: dimensions not compatible for elementwise. Broadcast has incompatible dimensions: 2 != 18 && 2 != 1 && 18 != 1. Instruction: CHECK_BROADCAST 2 18.)
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Invalid input shape (/home/jenkins/agent/workspace/LLM/release-0.17/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:574)
1       0x7f4097fd7277 /home/maximilianolevi/.cache/pypoetry/virtualenvs/tensorrt-inference-8MUMp6os-py3.10/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6e3277) [0x7f4097fd7277]
2       0x7f4098cadc88 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::prepareBuffers(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 184
3       0x7f4098cb71d6 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeStep(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 1510
4       0x7f4098cb7abf tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 223
5       0x7f4098cc13aa tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1802
6       0x7f4098d4df85 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 437
7       0x7f4098d59cb6 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1206
8       0x7f43a6e215c0 /home/maximilianolevi/.cache/pypoetry/virtualenvs/tensorrt-inference-8MUMp6os-py3.10/lib/python3.10/site-packages/torch/lib/libtorch.so(+0x145c0) [0x7f43a6e215c0]
9       0x7f43aaea4ea7 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f43aaea4ea7]
10      0x7f43aafbaacf clone + 63

Does the max_multimodal_len or the lookahead decoding parameters need to match a specific shape in this case?

Metadata

Metadata

Assignees

Labels

questionFurther information is requestedtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions