High failure rate in T2AV generation

Describe
When using the MOVA pipeline for pure T2V/T2AV generation, I am experiencing a highly unstable generation process. Approximately 40% of the generated videos result in corrupted, solid-color outputs (the entire video is just a flat color with no coherent structures or details).

<img width="1832" height="1002" alt="Image" src="https://github.com/user-attachments/assets/20946583-f767-40bc-a007-fe59f9b0a8f1" />
![Image](https://github.com/user-attachments/assets/7461b429-a023-47cb-9b39-52f7db79d3d8)

Following the standard T2V approach for MOVA, I am passing a pure white PIL.Image as the image condition to the pipeline. I strongly suspect the issue lies in how the pipeline_mova.py loads, encodes, or concatenates this pure white frame in the prepare_latents stage, leading to a latent collapse  during the diffusion process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High failure rate in T2AV generation #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High failure rate in T2AV generation #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions