Skip to content
View macroustc's full-sized avatar

Block or report macroustc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,033 228 Updated Mar 17, 2025

Spark-TTS Inference Code

Python 5,699 594 Updated Mar 21, 2025

A Conversational Speech Generation Model

Python 10,845 813 Updated Mar 22, 2025

使用vllm加速cosyvoice2的推理

Jupyter Notebook 108 12 Updated Mar 11, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 38,769 6,400 Updated Mar 22, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 8,898 949 Updated Mar 21, 2025
Python 4,033 324 Updated Mar 12, 2025

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …

Python 6,180 655 Updated Mar 5, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,026 1,374 Updated Mar 3, 2025
Python 192 18 Updated Sep 24, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 10,620 1,473 Updated Mar 22, 2025

An Open-Sourced LLM-empowered Foundation TTS System

Python 634 51 Updated Oct 17, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,890 646 Updated Mar 21, 2025

A small seq2seq punctuator tool based on DistilBERT

Python 50 8 Updated Dec 23, 2024

This is the official repo of our work titled "The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio".

Python 54 4 Updated Dec 13, 2024

simple dnn based vad

C++ 70 49 Updated Dec 2, 2018

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,226 278 Updated Nov 5, 2024

Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.

45 2 Updated Sep 11, 2024

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 269 35 Updated Jan 15, 2025

Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)

Python 317 38 Updated Oct 20, 2024

real time face swap and one-click video deepfake with only a single image

Python 45,234 6,781 Updated Mar 22, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 12,216 1,225 Updated Mar 21, 2025

Multilingual Voice Understanding Model

Python 5,027 458 Updated Jan 8, 2025

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 178 18 Updated May 29, 2024

A feature-rich command-line audio/video downloader

Python 104,880 8,237 Updated Mar 22, 2025
Python 658 59 Updated Jun 7, 2024

Simple text to phones converter for multiple languages

Python 1,351 183 Updated Sep 26, 2024

📖 A curated list of resources dedicated to talking face.

1,468 117 Updated Dec 23, 2024
Next
Showing results