Grad student in Shanghai. Multimodal / speech / vision foundation models. Half my repos exist because debugging was easier than writing the bug report.
-
Shanghai Jiao Tong University
- Shanghai, China
Popular repositories Loading
-
audio-vis-align
audio-vis-align PublicTraining and evaluation toolkit for audio-visual contrastive representation alignment (CLIP-style, but for audio + video).
Python 92
-
mllm-playground
mllm-playground PublicA Gradio-based interactive playground for poking at multimodal LLMs — compare outputs side-by-side, swap prompts, inspect attention.
Python
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
