MOVA training 1024 GPUs for 42 days (~43 000 GPU‑days). and you don't bother to optimize and get your model out there on reddit and forums .
i'm reading the paper and this model should give more than even the demos you have don't reflect it's true power
"Computational Resources. All three phases run on 1024 GPUs (128 nodes, 8 GPUs per node). For 360p
training (Phases 1–2), we use CP=8, yielding effective batch size 128. For 720p fine-tuning (Phase 3), increased
sequence length requires CP=16, reducing effective batch size to 64. The complete training spans 42 days,
totaling approximately 43,000 GPU-days. "