LipReadingProject

An automated lip reading algorithm that leverages a combination of convolutional neural networks (CNNs) and LSTMs in order to read lips based purely on visems.

Data Collection

One of my project teamamtes recorded videos of him vocalizing English phonemes in at uniform frequences at equally spaced intervals. E.g. the phoneme "ah" would have been enunciated roughly 100 times, 1 second apart, to obtain an empirical distribution of the phoneme. The videos were recorded by a camcorder at 30fps and then split into individual frames.

Data Preprocessing

In the first iteration of the project, the data was split into individual frames, and the individual frames were passed as raw input into our model. To speed up training and eliminate potential erroneous data, we included a facial segmentation algorithm in the data preprocessing pipeline which would isolate only the lip movements to be used as input for our model. With our limited compute resources, lip segmentation provided decent performance, but full facial expressions would most likely enhance model performance by providing additional, potentially helpful signals for the model to interpret certain phonemes.

Model Development and Training

We chose to use a CNN-LSTM model. The CNN is the state-of-the-art computer vision and deep learning algorithm for image interpretation and analysis, and it functinoed as an encoder in our model, transforming each image into a latent space. Finally, the LSTM served to decode the encoded images while incorporating the time-based nature of lip movements to successfully determine the phonetic sounds corresponding to each phoneme.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CNN_LSTM.py		CNN_LSTM.py
README.md		README.md
arrayConverter.py		arrayConverter.py
detect_face_parts.py		detect_face_parts.py
letterByLetter2.py		letterByLetter2.py
letter_label_pairs.py		letter_label_pairs.py
model_evaluator_LSTM.py		model_evaluator_LSTM.py
mouthV3_parallel.py		mouthV3_parallel.py
preprocessData.py		preprocessData.py
resizeList.py		resizeList.py
videoFrameConverter_parallel.py		videoFrameConverter_parallel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNN_LSTM.py

CNN_LSTM.py

README.md

README.md

arrayConverter.py

arrayConverter.py

detect_face_parts.py

detect_face_parts.py

letterByLetter2.py

letterByLetter2.py

letter_label_pairs.py

letter_label_pairs.py

model_evaluator_LSTM.py

model_evaluator_LSTM.py

mouthV3_parallel.py

mouthV3_parallel.py

preprocessData.py

preprocessData.py

resizeList.py

resizeList.py

videoFrameConverter_parallel.py

videoFrameConverter_parallel.py

Repository files navigation

LipReadingProject

Data Collection

Data Preprocessing

Model Development and Training

About

Releases

Packages

Languages

anishadoni/LipReadingProject

Folders and files

Latest commit

History

Repository files navigation

LipReadingProject

Data Collection

Data Preprocessing

Model Development and Training

About

Resources

Stars

Watchers

Forks

Languages