by: José Agustín Barrachina & Matias Dwek
This programs loads an audio of one or two speakers and performs the following audio processing algorithms.
- Phonems Identifier using a LPC analysis system and formants detection
- Voice recognition using a RNN-LMST [1]
- Input of the network 13 MFCC (Mel-Frequency Cepstral Coeficients)
- 3 hidden layers
- 250 LSTM cells
- Optimizer: Adam
- Loss: CTC (Connectionist Temporal Classification)
- Speaker recognition (up to two speakers) using a GMM (Gaussian Mixture Model) [2]
- Speed modifier
- Uses over/sub sampling to avoid deep/acute deformation
- Pitch modification using FFT
- Mouth image shows the speaker mouth movement
- Progress line in green (in sync with the playing audio)
- Done with pyQT5
[1] Abdel-rahman Mohamed Alex Graves and Geoffrey Hinton (2012). “SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS”. Department of Computer Science, University of Toronto.
[2] Vibha Tiwari (2010). “MFCC and its applications in speaker recognition”. En: International journal on emerging technologies 1.1, p´ags. 19-22.