Hi,
I'm trying to deploy huggingface nvidia/DeepSeek-V3-0324-FP4 on 8xH200 server. Both trtllm-serve, vLLM, and sglang are not able to serve this model because of the GPU arch not support this model.
I noticed that the supported device in README of huggingface is:
Inference:
Engine: TensorRT-LLM
Test Hardware: B200
So, my questions is:
- how could I run this model on NVIDIA H200?
- If not, is there any document that I can convert model of deepseek v3 0324 myself to run FP4 on H200?
Thank you!
Hi,
I'm trying to deploy huggingface nvidia/DeepSeek-V3-0324-FP4 on 8xH200 server. Both trtllm-serve, vLLM, and sglang are not able to serve this model because of the GPU arch not support this model.
I noticed that the supported device in README of huggingface is:
So, my questions is:
Thank you!