A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 9,215 938 Updated Mar 28, 2025

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 3,806 318 Updated Jan 8, 2025

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 9,597 1,458 Updated Mar 29, 2025

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,487 2,760 Updated Mar 30, 2025

lllyasviel / sd-forge-layerdiffuse

[WIP] Layer Diffusion for WebUI (via Forge)

Python 4,005 344 Updated Aug 30, 2024

Mikubill / sd-webui-controlnet

WebUI extension for ControlNet

Python 17,478 2,003 Updated Aug 12, 2024

zuruoke / watermark-removal

a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image

Python 3,220 392 Updated Aug 16, 2024

yerfor / GeneFacePlusPlus

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,673 241 Updated Oct 18, 2024

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 3,814 479 Updated Mar 28, 2025

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,748 1,149 Updated Apr 24, 2024

nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)

Python 693 113 Updated Mar 28, 2025

Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 11,635 2,448 Updated Feb 10, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,992 2,612 Updated Mar 4, 2025

facebookresearch / svoice

We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multi…

Python 1,279 186 Updated Nov 16, 2023

Anjok07 / ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 20,021 1,469 Updated Mar 13, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 38,922 4,897 Updated Aug 16, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 43,347 4,821 Updated Mar 26, 2025

voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Python 8,948 1,190 Updated Mar 16, 2025

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 79,114 9,499 Updated Jan 4, 2025

Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

C++ 9,103 772 Updated Aug 3, 2024

AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI

Python 150,205 27,988 Updated Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fifi Lyu fifilyu

Achievements

Achievements

Block or report fifilyu

Stars

espnet / espnet

myshell-ai / OpenVoice

TencentGameMate / chinese_speech_pretrain

SYSTRAN / faster-whisper

suno-ai / bark

RVC-Project / Retrieval-based-Voice-Conversion-WebUI

HeyPuter / puter

pyannote / pyannote-audio

kaldi-asr / kaldi

modelscope / FunASR