Skip to content

HVision-NKU/MutualForcing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 

Repository files navigation

MutualForcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

๐ŸŽฅ Video Demo

mutualforcing.mp4

Please turn on the sound to hear the audio. MutualForcing 1min30s Video Demo (The video was heavily compressed due to GitHub's 10 MB upload file size limit.)

โœจ Highlights

  • Fast auto-regressive audio-video joint generation with only 4โ€“8 inference steps
  • Supports streaming generation and long-duration audio-video synchronization
  • A two-stage training strategy for stable multimodal optimization
  • A unified dual-mode self-evolution framework for few-step and multi-step generation
  • No need for an extra bidirectional teacher model
  • Lower memory cost and more flexible training on long sequences
  • Matches or outperforms prior methods that require 50 steps
teaser

๐Ÿ“š Table of Contents

  • [๐ŸŽฌ Multi-Domain Generalization Result]
    • [๐ŸŽค Singing]
    • [๐ŸŽผ BGM Music]
    • [๐Ÿ—ฃ๏ธ Multi-Person Speaking]
    • [๐Ÿพ Animal & ๐Ÿฝ๏ธ Eating]
    • [1min Long Video Generation]
  • [๐Ÿ“Œ Open-source TODO]

๐ŸŽฌ Multi-Domain Generalization Result


๐ŸŽค Singing

sing1.mp4
sing2.mp4
play_musical_instrument.mp4
"Blonde young woman wearing gold earrings and necklace, sits at white piano, singing into microphone; eyes sometimes closed, sometimes looking slightly forward; warm indoor background, mid-close low-angle shot. Clear female voice with soft piano, lyrics: 'said his mind was made up, but we both know that he lies', gentle slightly sad emotion." Dark-brown long-haired woman in black sheer-sleeve top sings at microphone, eyes slightly to right, dark stage background; static mid-close shot. Clear female voice singing, lyrics: '...found me back home', focus on vocal performance. Southeast Asian elderly man, bare-chested, plays bamboo flute on porch, cheeks puffed, fingers moving skillfully, body slightly swaying with rhythm; ultra-wide fisheye shot slowly moving right and panning left. Calm and evolving flute solo melody.

๐ŸŽผ BGM Music

video_with_bgm1.mp4
video_with_bgm2.mp4
video_with_bgm3.mp4
Brown long-curled-haired woman in red top, wearing a straw hat, staring at the camera by the seaside, head slightly tilted, hair blown by wind, waves behind; camera slowly moves right and pans left. Background music: female pop singing, lyrics include 'oh why, oh why' and 'feeling drunk and high, so high, so high...', creating a pensive atmosphere. Short-haired woman in gray fluffy coat walking in a park, looking thoughtful; camera slowly pans left tracking, composition shifts from right-heavy to left-heavy. Background music: slow melancholic cello, creating a calm and sad atmosphere. Blonde woman in pink short sleeve, hair blown by wind, slowly turns head and upper body from right to front, looking alert; camera slowly moves back, composition shifts from right-heavy to center. Background music: tense suspenseful, with clear wind sounds.

๐Ÿ—ฃ๏ธ Multi-Person Speaking

multi_person1.mp4
multi_person2.mp4
multi_person3.mp4
Two men in camouflage in mossy forest; foreground man talks to camera selfie-style holding rifle, background man wearing full-face mask uses binoculars; mostly static shot. Low-voice English dialogue: 'Ah. I think Claire or something on the other side of the river. So if we get up onto this knoll just in front of us, we might...', tense restrained atmosphere Glasses-wearing braided schoolgirl and boy in suit uniform sit on beige sofa; girl turns to talk to boy, ornate living room background, static mid-shot bright lighting. Light English conversation: Girl: 'Oh, Brighton, do you have a date for the seventh-grade dance? Yeah' Boy: 'I got a couple of irons in the fire, put out a few feelers.'; youthful relaxed atmosphere. Bald gray-bearded man knitting, white-haired woman holding folder and pen sitting on brown leather sofa in living room with Christmas tree; static mid-shot, warm indoor light. Calm English dialogue: Male: 'Let me and your mom handle it.' Female: 'I didn't realize we'd come'; quiet homely atmosphere.

๐Ÿพ Animal & ๐Ÿฝ๏ธ Eating

animal.mp4
eating.mp4

1min Long Video Generation

longvid1.mp4
longvid2.mp4
longvid3.mp4

๐Ÿ“Œ Open-source TODO

  • Project page
  • Paper release
  • Inference code
  • Training code
  • Checkpoints
  • Data preprocessing pipeline
  • Evaluation scripts
  • Streaming generation support
  • Long-duration generation examples
  • Reproducibility instructions
  • Hugging Face / demo integration
  • Full documentation

Releases

No releases published

Packages

 
 
 

Contributors