Problem serving nvidia/DeepSeek-V3-0324-FP4 on 8xH200

Hi,

I'm trying to deploy huggingface [nvidia/DeepSeek-V3-0324-FP4](https://huggingface.co/nvidia/DeepSeek-V3-0324-FP4) on 8xH200 server. Both trtllm-serve, vLLM, and sglang are not able to serve this model because of the GPU arch not support this model.

I noticed that the supported device in README of huggingface is:
```
Inference:
Engine: TensorRT-LLM
Test Hardware: B200
```

So, my questions is:
1. how could I run this model on NVIDIA H200?
2. If not, is there any document that I can convert model of deepseek v3 0324 myself to run FP4 on H200?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem serving nvidia/DeepSeek-V3-0324-FP4 on 8xH200 #6038

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Problem serving nvidia/DeepSeek-V3-0324-FP4 on 8xH200 #6038

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions