Update TensorRT-LLM #1274

kaiyux · 2024-03-12T07:54:18Z

Model Support
- Support VILA (see “LLaVA and VILA” section in examples/multimodal/README.md)
Features
- Support loading Gemma from Hugging Face
- Add support to context chunking to work with KV cache reuse
- Support auto parallelism planner for high-level API and unified builder workflow
- Enable multi-LoRA for BART LoRA
API
- [BREAKING CHANGE] Remove model parameter from gptManagerBenchmark and gptSessionBenchmark
Bug fixes
- Fix ChatGLM2-6B building failure on INT8 chatglm2-6b int8+kv8 build failed on 0.8.0 branch #1239
- Fix wrong relative path in Baichuan documentation Incorrect documentation in examples /baichuan/ #1242
Performance
- Remove router tensor parallelism to improve performance for MoE models, thanks to the contribution from @megha95 in moe router tp removed #1091
Infra
- TensorRT dependency is updated to 9.3.
- Base Docker image for TensorRT-LLM is updated to nvcr.io/nvidia/pytorch:24.01-py3
- Base Docker image for TensorRT-LLM backend is updated to nvcr.io/nvidia/tritonserver:24.01-py3

Update TensorRT-LLM

ca2e9bd

Shixiaowei02 force-pushed the kaiyu/update branch 2 times, most recently from a3cbbf6 to df77e19 Compare March 12, 2024 09:04

update

e30cf5b

Shixiaowei02 force-pushed the kaiyu/update branch from df77e19 to e30cf5b Compare March 12, 2024 09:05

Shixiaowei02 approved these changes Mar 12, 2024

View reviewed changes

kaiyux merged commit 4bb65f2 into main Mar 12, 2024

kaiyux deleted the kaiyu/update branch March 12, 2024 10:15

hademircii mentioned this pull request Mar 12, 2024

Import Error: ModuleNotFoundError: No module named 'tensorrt_llm.lora_manager' #1289

Open

4 tasks

This was referenced Mar 18, 2024

OOM when using quantize.py to quantize llama-like model #1285

Open

Assertion failed: Failed to deserialize cuda engine #1324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update TensorRT-LLM #1274

Update TensorRT-LLM #1274

kaiyux commented Mar 12, 2024 •

edited

Loading

Update TensorRT-LLM #1274

Update TensorRT-LLM #1274

Conversation

kaiyux commented Mar 12, 2024 • edited Loading

kaiyux commented Mar 12, 2024 •

edited

Loading