ICCV-2023-Papers Application Vision and Audio Title Repo Paper Video Sound Source Localization is All About Cross-Modal Alignment ➖ ➖ Class-Incremental Grouping Network for Continual Audio-Visual Learning ➖ Audio-Visual Class-Incremental Learning ➖ DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-Guided Speaker Embedding ➖ ➖ The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion ➖ SIDGAN: High-Resolution Dubbed Video Generation via Shift-Invariant Learning ➖ ➖ On the Audio-Visual Synchronization for Lip-to-Speech Synthesis ➖ ➖ Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples ➖ ➖ Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation ➖ Hyperbolic Audio-Visual Zero-Shot Learning ➖ ➖ AdVerb: Visually Guided Audio Dereverberation Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation ➖