A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).
- Read audio data and sampling frequency from .wav file
- Frame signal
- Apply window function to frame (default=hamming)
- Calculate DFT of frame
- Calculate periodogram power spectral density estimate for each DFT bin
- Apply Mel-Frequency filterbank to signal
- Sum energies within each filter and take the base 10 logarithm
- Take DCT of each filter
- Keep coefficients
[1:13]
- Compute DTW best path and euclidean distance of reference vector and input vector
- Noise gate
- Pre-emphasis / Lifter
- Feature vector database
- Audio record / playback (
audio.py
) - Multithread MFCC extraction
- Create MFCC extractor as class?