Stars
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation.
MuCR is a benchmark designed to evaluate Multimodal Large Language Models' (MLLMs) ability to discern causal links across modalities
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
An easy-to-use debug print tool for deep learning projects in python. PyPi: https://pypi.org/project/pydprint/