the model is underrated , are you trying to fly under the radar ?

 MOVA   training 1024 GPUs for 42 days (~43 000 GPU‑days). and you don't bother to optimize and get your model out there on reddit and forums . 
i'm reading the paper and this model should give more than even the demos you have don't reflect it's true power 


"Computational Resources. All three phases run on 1024 GPUs (128 nodes, 8 GPUs per node). For 360p
training (Phases 1–2), we use CP=8, yielding effective batch size 128. For 720p fine-tuning (Phase 3), increased
sequence length requires CP=16, reducing effective batch size to 64. The complete training spans 42 days,
totaling approximately 43,000 GPU-days.  "

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the model is underrated , are you trying to fly under the radar ? #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the model is underrated , are you trying to fly under the radar ? #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions