-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
questionFurther information is requestedFurther information is requestedtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Hi,
I get the following error when:
- Lookahead decoding is enabled
- Request has multimodal input (e.g. just custom prompt table with fake vocabulary)
- batch size > 1
- Inflight fused batching is enabled
Model is Llama 8B.
[TensorRT-LLM][ERROR] IExecutionContext::inferShapes: Error Code 7: Internal Error (LLaMAForCausalLM/transformer/vocab_embedding/__add___L322/elementwise_binary_L2901/ELEMENTWISE_SUM_0: dimensions not compatible for elementwise. Broadcast has incompatible dimensions: 2 != 18 && 2 != 1 && 18 != 1. Instruction: CHECK_BROADCAST 2 18.)
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Invalid input shape (/home/jenkins/agent/workspace/LLM/release-0.17/L0_Test-x86_64/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:574)
1 0x7f4097fd7277 /home/maximilianolevi/.cache/pypoetry/virtualenvs/tensorrt-inference-8MUMp6os-py3.10/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6e3277) [0x7f4097fd7277]
2 0x7f4098cadc88 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::prepareBuffers(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 184
3 0x7f4098cb71d6 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeStep(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 1510
4 0x7f4098cb7abf tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 223
5 0x7f4098cc13aa tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1802
6 0x7f4098d4df85 tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 437
7 0x7f4098d59cb6 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1206
8 0x7f43a6e215c0 /home/maximilianolevi/.cache/pypoetry/virtualenvs/tensorrt-inference-8MUMp6os-py3.10/lib/python3.10/site-packages/torch/lib/libtorch.so(+0x145c0) [0x7f43a6e215c0]
9 0x7f43aaea4ea7 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f43aaea4ea7]
10 0x7f43aafbaacf clone + 63
Does the max_multimodal_len or the lookahead decoding parameters need to match a specific shape in this case?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requestedtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers