LipSync

by: José Agustín Barrachina & Matias Dwek

This programs loads an audio of one or two speakers and performs the following audio processing algorithms.

Phonems Identifier using a LPC analysis system and formants detection
Voice recognition using a RNN-LMST [1]
- Input of the network 13 MFCC (Mel-Frequency Cepstral Coeficients)
- 3 hidden layers
- 250 LSTM cells
- Optimizer: Adam
- Loss: CTC (Connectionist Temporal Classification)
Speaker recognition (up to two speakers) using a GMM (Gaussian Mixture Model) [2]
Speed modifier
- Uses over/sub sampling to avoid deep/acute deformation
Pitch modification using FFT

GUI

Mouth image shows the speaker mouth movement
Progress line in green (in sync with the playing audio)
Done with pyQT5

References

[1] Abdel-rahman Mohamed Alex Graves and Geoffrey Hinton (2012). “SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS”. Department of Computer Science, University of Toronto.

[2] Vibha Tiwari (2010). “MFCC and its applications in speaker recognition”. En: International journal on emerging technologies 1.1, p´ags. 19-22.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LipSync

GUI

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

LipSync

GUI

References