Skip to content

Please provide download links for optimized FP8 / NVFP4 quantized models (SM120 Blackwell) #265

@rnik12

Description

@rnik12

Description:

Currently, there are no direct download links to already optimized models. Having to run TensorRT Model Optimizer myself adds extra steps and, in most cases, does not work reliably.

It would be very helpful if you could provide downloadable, pre-optimized models for SM120 Blackwell with quantization formats such as FP8 and NVFP4.

Why this is needed:

  • Running optimization locally is fragile and often fails.
  • Many users just want ready-to-run optimized checkpoints instead of debugging optimization pipelines.
  • Providing pre-optimized models would save time and help ensure consistent performance.

Examples tried but not working:

  • daslab-testing/Llama-3.2-3B-Instruct-FPQuant-QAT-NVFP4-200steps
  • nm-testing/TinyLlama-1.1B-Chat-v1.0-NVFP4-Updated
  • nm-testing/Llama-3.1-8B-Instruct-NVFP4A16

Request:

  • Please provide download links to FP8 and NVFP4 quantized models that run on SM120 Blackwell GPUs.

This would significantly improve usability and adoption for those of us testing or deploying on Blackwell hardware.

TensorRT-LLM Docker Image: nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions