-
Notifications
You must be signed in to change notification settings - Fork 123
Open
Description
Hi,
In monologue streaming generation, the model generates 80ms audio in about 95ms time using NVIDIA A30 GPU, with bf16 true.
- Is there a way to improve the latency to be able to generate real time?
- What is the Realtime factor on L20 GPU?
- Does the model support VLLM or NVIDIA TensorRT-LLM serving?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels