Stars
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
High-Resolution Image Synthesis with Latent Diffusion Models
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
CCMusic, an open Chinese music database, integrates diverse datasets. It ensures data consistency via cleaning, label refinement and structure unification. A unified evaluation framework is used fo…
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Official PyTorch implementation of paper "Ultra-Resolution Adaptation with Ease".
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems
Understanding R1-Zero-Like Training: A Critical Perspective
An Open-source RL System from ByteDance Seed and Tsinghua AIR
Official data preparation scripts for the URGENT 2024 Challenge
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
Open-Sora: Democratizing Efficient Video Production for All