Skip to content

Wan2.2 TI2V training #73

@HSID

Description

@HSID

I noticed that when DreamZero is trained with the Wan2.2 TI2V backbone, the CLIP embedding of the first video frame is injected via cross-attention. As far as I understand, this differs from the standard conditioning setup in vanilla Wan2.2 TI2V. Could you clarify the motivation behind this design choice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions