macroustc

Follow

macroustc

Follow

1 follower · 4 following

Stars

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,033 228 Updated Mar 17, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 5,699 594 Updated Mar 21, 2025

SesameAILabs / csm

A Conversational Speech Generation Model

Python 10,845 813 Updated Mar 22, 2025

qi-hua / async_cosyvoice

使用vllm加速cosyvoice2的推理

Jupyter Notebook 108 12 Updated Mar 11, 2025

GuijiAI / HeyGem.ai

C 4,085 750 Updated Mar 21, 2025

mannaandpoem / OpenManus

No fortress, purely open ground. OpenManus is Coming.

Python 38,769 6,400 Updated Mar 22, 2025

kijai / ComfyUI-WanVideoWrapper

Python 1,550 87 Updated Mar 22, 2025

Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

Python 8,898 949 Updated Mar 21, 2025

stepfun-ai / Step-Audio

Python 4,033 324 Updated Mar 12, 2025

Zyphra / Zonos

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 6,180 655 Updated Mar 5, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,026 1,374 Updated Mar 3, 2025

xinchen-ai / Westlake-Omni

Python 192 18 Updated Sep 24, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,620 1,473 Updated Mar 22, 2025

FireRedTeam / FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Python 634 51 Updated Oct 17, 2024

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,890 646 Updated Mar 21, 2025

FerdinandZhong / punctuator

A small seq2seq punctuator tool based on DistilBERT

Python 50 8 Updated Dec 23, 2024

xieyuankun / Codecfake

This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".

Python 54 4 Updated Dec 13, 2024

robin1001 / nn-vad

simple dnn based vad

C++ 70 49 Updated Dec 2, 2018

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,226 278 Updated Nov 5, 2024

yannqi / Draw-an-Audio-Code

Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

45 2 Updated Sep 11, 2024

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 269 35 Updated Jan 15, 2025

GitYCC / g2pW

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

Python 317 38 Updated Oct 20, 2024

hacksider / Deep-Live-Cam

real time face swap and one-click video deepfake with only a single image

Python 45,234 6,781 Updated Mar 22, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 12,216 1,225 Updated Mar 21, 2025

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 5,027 458 Updated Jan 8, 2025

luosiallen / Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 178 18 Updated May 29, 2024

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 104,880 8,237 Updated Mar 22, 2025

Tele-AI / TeleSpeech-ASR

Python 658 59 Updated Jun 7, 2024

bootphon / phonemizer

Simple text to phones converter for multiple languages

Python 1,351 183 Updated Sep 26, 2024

JosephPai / Awesome-Talking-Face

📖 A curated list of resources dedicated to talking face.

1,468 117 Updated Dec 23, 2024