LipSync

by: José Agustín Barrachina & Matias Dwek

This programs loads an audio of one or two speakers and performs the following audio processing algorithms.

Phonems Identifier using a LPC analysis system and formants detection
Voice recognition using a RNN-LMST [1]
- Input of the network 13 MFCC (Mel-Frequency Cepstral Coeficients)
- 3 hidden layers
- 250 LSTM cells
- Optimizer: Adam
- Loss: CTC (Connectionist Temporal Classification)
Speaker recognition (up to two speakers) using a GMM (Gaussian Mixture Model) [2]
Speed modifier
- Uses over/sub sampling to avoid deep/acute deformation
Pitch modification using FFT

GUI

Mouth image shows the speaker mouth movement
Progress line in green (in sync with the playing audio)
Done with pyQT5

References

[1] Abdel-rahman Mohamed Alex Graves and Geoffrey Hinton (2012). “SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS”. Department of Computer Science, University of Toronto.

[2] Vibha Tiwari (2010). “MFCC and its applications in speaker recognition”. En: International journal on emerging technologies 1.1, p´ags. 19-22.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
PPT		PPT
Papagayo		Papagayo
data		data
doc		doc
images		images
info		info
liblip		liblip
out		out
sounds		sounds
src		src
Capture.PNG		Capture.PNG
README.md		README.md
TP Vox 2017.ipynb		TP Vox 2017.ipynb
Usar en el TP1.txt		Usar en el TP1.txt
ideas.txt		ideas.txt
links.txt		links.txt
mouth_types.jpg		mouth_types.jpg
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipSync

GUI

References

About

Releases

Packages

Languages

dcn01/LipSync-NEGU93

Folders and files

Latest commit

History

Repository files navigation

LipSync

GUI

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages