CCMusic, an open Chinese music database, integrates diverse datasets. It ensures data consistency via cleaning, label refinement and structure unification. A unified evaluation framework is used fo…

Python 19 Updated Mar 26, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 1,568 98 Updated Mar 29, 2025

Huage001 / URAE

Official PyTorch implementation of paper "Ultra-Resolution Adaptation with Ease".

Python 85 7 Updated Mar 27, 2025

a43992899 / openl2s

Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.

Python 5 Updated Mar 25, 2025

linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Python 2,326 179 Updated Feb 14, 2025

Alexw1111 / RefAudioEmoTagger

一种基于Emotion2Vec的批量音频情感自动标注脚本

Python 325 20 Updated Mar 7, 2025

pengzhendong / g2p-mix

Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.

Python 92 12 Updated Mar 20, 2025

emova-ollm / EMOVA

Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)

Python 23 1 Updated Mar 16, 2025

canopyai / Orpheus-TTS

TTS Towards Human-Sounding Speech

Python 3,162 227 Updated Mar 27, 2025

SusungHong / MusicInfuser

Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"

Python 60 4 Updated Mar 27, 2025

ML-GSAI / Concat-ID

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Python 26 Updated Mar 22, 2025

zihaod / MusiLingo

Python 38 4 Updated Aug 27, 2024

uniaudio666 / UniAudio

The official source code of UniAudio

Python 91 9 Updated Mar 29, 2024

jzq2000 / MoonCast

Python 88 7 Updated Mar 27, 2025

numediart / EmoV-DB

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

Python 263 20 Updated Oct 10, 2023

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 721 30 Updated Mar 29, 2025

BytedTsinghua-SIA / DAPO

An Open-source RL System from ByteDance Seed and Tsinghua AIR

915 35 Updated Mar 27, 2025

urgent-challenge / urgent2024_challenge

Official data preparation scripts for the URGENT 2024 Challenge

Python 77 6 Updated Jan 9, 2025

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,650 118 Updated Jul 5, 2024

xiaomi-research / r1-aqa

🤗 R1-AQA Model: mispeech/r1-aqa

Python 209 17 Updated Mar 28, 2025

ZZDoog / ProDubber

[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"

Python 14 1 Updated Mar 27, 2025

LiuZH-19 / SongGen

Python 210 18 Updated Mar 18, 2025

YaoFANGUK / video-subtitle-remover

基于AI的图片/视频硬字幕去除、文本水印去除，无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API，本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Python 5,791 755 Updated Feb 19, 2025

ZeyueT / AudioX

326 16 Updated Mar 14, 2025

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 25,889 2,493 Updated Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aaronchen chenchy

Block or report chenchy

Stars

bytedance / MegaTTS3

tencent-ailab / MuQ

jonflynng / qwen2-audio-finetune

CompVis / latent-diffusion

sanderwood / clamp3

monetjoe / ccmusic_eval