Please provide download links for optimized FP8 / NVFP4 quantized models (SM120 Blackwell)


**Description:**

Currently, there are no direct download links to already optimized models. Having to run TensorRT Model Optimizer myself adds extra steps and, in most cases, does not work reliably.

It would be very helpful if you could provide **downloadable, pre-optimized models** for SM120 Blackwell with quantization formats such as **FP8** and **NVFP4**.

**Why this is needed:**

* Running optimization locally is fragile and often fails.
* Many users just want ready-to-run optimized checkpoints instead of debugging optimization pipelines.
* Providing pre-optimized models would save time and help ensure consistent performance.

**Examples tried but not working:**

* `daslab-testing/Llama-3.2-3B-Instruct-FPQuant-QAT-NVFP4-200steps`
* `nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4-Updated`
* `nm-testing/Llama-3.1-8B-Instruct-NVFP4A16`

**Request:**

* Please provide **download links** to FP8 and NVFP4 quantized models that run on **SM120 Blackwell GPUs**.

This would significantly improve usability and adoption for those of us testing or deploying on Blackwell hardware.


TensorRT-LLM Docker Image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please provide download links for optimized FP8 / NVFP4 quantized models (SM120 Blackwell) #265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Please provide download links for optimized FP8 / NVFP4 quantized models (SM120 Blackwell) #265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions