Stars
Janus-Series: Unified Multimodal Understanding and Generation Models
ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation"
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Code of "3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces"
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[CVPR-2022] Official implementation for "Knowledge Distillation with the Reused Teacher Classifier".
A PyTorch Implementation of AC-SUM-GAN from "AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization" (IEEE TCSVT 2021)
Source code for the paper "Unsupervised Video Summarization via Multi-source Features" published at ICMR 2021
The code for ICASSP23 paper "MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization"
video summarization research repo
Pytorch code for paper Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)
Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
[CVPR 2023] Official implementation for "CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion."
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Swift Parameter-free Attention Network for Efficient Super-Resolution