π₯ Video Generation & Multimodal Large Language Models (MLLM)
π§βπ» Integrated M.S./Ph.D. @CVLAB in KAIST AI
I design next-generation video generation models and build evaluation frameworks for understanding and improving video diffusion models.
Currently exploring interaction-aware video generation and multimodal understanding of videos.
- π¬ Video Generation & Evaluation β Improving interaction fidelity and multi-instance understanding in video diffusion transformers
- π§© Video Object Segmentation (VOS) β Multi-granularity & referring VOS with language and temporal reasoning
- π§ MLLM for Video β Leveraging multimodal large language models to better understand and describe video content
-
Self-Evolving Neural Radiance Fields
Wild3D Workshop @ ICCV 2025
π Project Page -
MUG-VOS: Multi-Granularity Video Object Segmentation
AAAI 2025
π Project Page -
Referring Video Object Segmentation via Language Aligned Track Selection
arXiv 2025
π Project Page -
InterRVOS: Interaction-aware Referring Video Object Segmentation
Under review at AAAI 2026
π Project Page -
MATRIX: Mask Track Alignment for Interaction-Aware Video Generation
Under review at ICLR 2026
- π Google Scholar
- πΌ LinkedIn
- π¦ X (Twitter)
- π Personal Website
β¨ βUnderstanding the World through Video and Multimodalities.β
π Last updated: 2025λ 9μ 28μΌ | π» Made with β€οΈ by Deep Overflow