Awesome Audio / Speech

Awesome list about audio, speech and DSP(Digital signal processing)

Recognition

Deep Speech (Baidu Research)
Deep Speech 2 (Baidu Research)
Google Speech-to-Text
Amazon Transcribe
PocketSphinx (CMU Sphinx)
SpeechKit (Yandex)
DeepSpeech (Mozilla)
Wav2Letter (Facebook AI)
ESPnet: End-to-End Speech Processing Toolkit
Kaldi Speech Recognition Toolkit
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Whisper - OpenAI's robust speech recognition system.
Whisper X - An extension of OpenAI's Whisper.
Faster Whisper - An optimized implementation for faster processing.
DistilWhisper - Hugging Face's distilled version of Whisper.

Filtering / Denoising

Diarization

Speaker Diarization with LSTM - A paper on using LSTM networks for speaker diarization.
Fully Supervised Speaker Diarization - A novel approach to speaker diarization using fully supervised learning.
NVIDIA's Speaker Diarization - NVIDIA's advanced approach to speaker diarization.

Synthesis

Open source projects

SoX - A cross-platform audio processing tool that provides a command-line interface for converting, editing, and playing audio files.
librosa - A library for audio and music analysis in Python, providing functions for computing features, such as MFCCs, chroma, and beat-related features.
Audacity - A cross-platform audio editor and recorder that supports many formats and provides a user-friendly interface.
PulseAudio - A cross-platform sound server for Linux, Unix, and Windows systems that provides sound server functionality to other applications.
PyTorch Audio - A library that provides a PyTorch-based implementation of common audio functions, such as spectrogram computation, audio pre-processing, and spectrogram-based features.
DeepSpeech - A speech-to-text engine developed by Mozilla Research.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

logo.png

logo.png

Repository files navigation

Awesome Audio / Speech

Contents

Recognition

Filtering / Denoising

Diarization

Synthesis

Open source projects

Research papers

Blog posts

Books

About

Releases

Packages

License

KennethanCeyer/awesome-audio-speech

Folders and files

Latest commit

History

Repository files navigation

Awesome Audio / Speech

Contents

Recognition

Filtering / Denoising

Diarization

Synthesis

Open source projects

Research papers

Blog posts

Books

About

Topics

Resources

License

Stars

Watchers

Forks