Awesome list about audio, speech and DSP(Digital signal processing)
- Recognition
- Filtering / Denoising
- Diarization
- Synthesis
- Open source projects
- Research papers
- Blog posts
- Books
- Deep Speech (Baidu Research)
- Deep Speech 2 (Baidu Research)
- Google Speech-to-Text
- Amazon Transcribe
- PocketSphinx (CMU Sphinx)
- SpeechKit (Yandex)
- DeepSpeech (Mozilla)
- Wav2Letter (Facebook AI)
- ESPnet: End-to-End Speech Processing Toolkit
- Kaldi Speech Recognition Toolkit
- Transformer-based Acoustic Modeling for Hybrid Speech Recognition
- Whisper - OpenAI's robust speech recognition system.
- Whisper X - An extension of OpenAI's Whisper.
- Faster Whisper - An optimized implementation for faster processing.
- DistilWhisper - Hugging Face's distilled version of Whisper.
- Fast Fourier Transform (FFT)
- Short-Time Fourier Transform (STFT)
- Adaptive filtering
- Least Mean Squares (LMS) algorithm
- Kalman filter
- Wiener filter
- Spectral subtraction
- Blind source separation (BSS)
- Non-negative matrix factorization (NMF)
- Infinite Impulse Response (IIR) filter
- Finite Impulse Response (FIR) filter
- Speaker Diarization with LSTM - A paper on using LSTM networks for speaker diarization.
- Fully Supervised Speaker Diarization - A novel approach to speaker diarization using fully supervised learning.
- NVIDIA's Speaker Diarization - NVIDIA's advanced approach to speaker diarization.
- Tacotron (Google)
- Tacotron 2 (Google)
- DeepVoice (Baidu Research)
- DeepVoice 2 (Baidu Research)
- DeepVoice 3 (Baidu Research)
- VoiceLoop (Lyrebird AI)
- WaveNet (DeepMind)
- ClariNet (MIT)
- SampleRNN (University of Montreal)
- MelNet (OpenAI)
- FastSpeech (AI Speech Lab, ByteDance)
- Transformer-TTS (IBM Research)
- SoX - A cross-platform audio processing tool that provides a command-line interface for converting, editing, and playing audio files.
- librosa - A library for audio and music analysis in Python, providing functions for computing features, such as MFCCs, chroma, and beat-related features.
- Audacity - A cross-platform audio editor and recorder that supports many formats and provides a user-friendly interface.
- PulseAudio - A cross-platform sound server for Linux, Unix, and Windows systems that provides sound server functionality to other applications.
- PyTorch Audio - A library that provides a PyTorch-based implementation of common audio functions, such as spectrogram computation, audio pre-processing, and spectrogram-based features.
- DeepSpeech - A speech-to-text engine developed by Mozilla Research.
- WaveNet: A Generative Model for Raw Audio
- ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
- Universal Sound Separation
- Speech Recognition with Deep Recurrent Neural Networks
- SEGAN: Speech Enhancement Generative Adversarial Network
- Introducing Whisper
- Tacotron 2: Generating Human-like Speech from Text
- WaveNet: A generative model for raw audio
- Looking to Listen: Audio-Visual Speech Separation
- Practical Deep Learning Audio Denoising
- Digital Signal Processing: Principles, Algorithms, and Applications by John G. Proakis and Dimitris K Manolakis.
- Signals and Systems by Alan V. Oppenheim and Alan S. Willsky.
- The Scientist and Engineer's Guide to Digital Signal Processing by Steven W. Smith.
- Discrete-Time Signal Processing by Alan V. Oppenheim and Ronald W. Schafer.
- DSP First: A Multimedia Approach by James H. McClellan and Ronald W. Schafer.
- Adaptive Filter Theory by Simon Haykin.