- Chengdu, China
Stars
Instant voice cloning by MIT and MyShell. Audio foundation model.
chinese speech pretrained models
Faster Whisper transcription with CTranslate2
🔊 Text-Prompted Generative Audio Model
Easily train a good VC model with voice data <= 10 mins!
🌐 The Internet OS! Free, Open-Source, and Self-Hostable.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
kaldi-asr/kaldi is the official location of the Kaldi project.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
[WIP] Layer Diffusion for WebUI (via Forge)
WebUI extension for ControlNet
a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multi…
GUI for a Vocal Remover that uses Deep Neural Networks.
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
so-vits-svc fork with realtime support, improved interface and more features.
Robust Speech Recognition via Large-Scale Weak Supervision
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Stable Diffusion web UI