Skip to content

the model is underrated , are you trying to fly under the radar ? #33

@mr-lab

Description

@mr-lab

MOVA training 1024 GPUs for 42 days (~43 000 GPU‑days). and you don't bother to optimize and get your model out there on reddit and forums .
i'm reading the paper and this model should give more than even the demos you have don't reflect it's true power

"Computational Resources. All three phases run on 1024 GPUs (128 nodes, 8 GPUs per node). For 360p
training (Phases 1–2), we use CP=8, yielding effective batch size 128. For 720p fine-tuning (Phase 3), increased
sequence length requires CP=16, reducing effective batch size to 64. The complete training spans 42 days,
totaling approximately 43,000 GPU-days. "

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions