Skip to content
View chenchy's full-sized avatar

Block or report chenchy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 327 20 Updated Mar 29, 2025

Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

Python 169 6 Updated Jan 9, 2025

Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.

Jupyter Notebook 14 Updated Nov 23, 2024

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 12,598 1,593 Updated Feb 29, 2024

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages

Python 129 3 Updated Feb 28, 2025

CCMusic, an open Chinese music database, integrates diverse datasets. It ensures data consistency via cleaning, label refinement and structure unification. A unified evaluation framework is used fo…

Python 19 Updated Mar 26, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 1,568 98 Updated Mar 29, 2025

Official PyTorch implementation of paper "Ultra-Resolution Adaptation with Ease".

Python 85 7 Updated Mar 27, 2025

Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.

Python 5 Updated Mar 25, 2025

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Python 2,326 179 Updated Feb 14, 2025

一种基于Emotion2Vec的批量音频情感自动标注脚本

Python 325 20 Updated Mar 7, 2025

Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.

Python 92 12 Updated Mar 20, 2025

Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)

Python 23 1 Updated Mar 16, 2025

TTS Towards Human-Sounding Speech

Python 3,162 227 Updated Mar 27, 2025

Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"

Python 60 4 Updated Mar 27, 2025

Concat-ID: Towards Universal Identity-Preserving Video Synthesis

Python 26 Updated Mar 22, 2025
Python 38 4 Updated Aug 27, 2024

The official source code of UniAudio

Python 91 9 Updated Mar 29, 2024
Python 88 7 Updated Mar 27, 2025

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

Python 263 20 Updated Oct 10, 2023

Understanding R1-Zero-Like Training: A Critical Perspective

Python 721 30 Updated Mar 29, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

915 35 Updated Mar 27, 2025

Official data preparation scripts for the URGENT 2024 Challenge

Python 77 6 Updated Jan 9, 2025

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,650 118 Updated Jul 5, 2024

🤗 R1-AQA Model: mispeech/r1-aqa

Python 209 17 Updated Mar 28, 2025

[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"

Python 14 1 Updated Mar 27, 2025
Python 210 18 Updated Mar 18, 2025

基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.

Python 5,791 755 Updated Feb 19, 2025
326 16 Updated Mar 14, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 25,889 2,493 Updated Mar 27, 2025
Next
Showing results