-
Notifications
You must be signed in to change notification settings - Fork 177
Open
Labels
feature requestNew feature or requestNew feature or request
Description
Description:
Currently, there are no direct download links to already optimized models. Having to run TensorRT Model Optimizer myself adds extra steps and, in most cases, does not work reliably.
It would be very helpful if you could provide downloadable, pre-optimized models for SM120 Blackwell with quantization formats such as FP8 and NVFP4.
Why this is needed:
- Running optimization locally is fragile and often fails.
- Many users just want ready-to-run optimized checkpoints instead of debugging optimization pipelines.
- Providing pre-optimized models would save time and help ensure consistent performance.
Examples tried but not working:
daslab-testing/Llama-3.2-3B-Instruct-FPQuant-QAT-NVFP4-200steps
nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4-Updated
nm-testing/Llama-3.1-8B-Instruct-NVFP4A16
Request:
- Please provide download links to FP8 and NVFP4 quantized models that run on SM120 Blackwell GPUs.
This would significantly improve usability and adoption for those of us testing or deploying on Blackwell hardware.
TensorRT-LLM Docker Image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc1
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request