LipNet: End-to-End Sentence-level Lipreading - COMP 562 - Alexei Kouminov Version

Meant for use on Tensorflow-GPU 2.1.0 in GCP

*NOTE: models 3 and 4 had weights that were too big for Github and therefore they are in the drive folder that I attached to the paper

Specs of use

Debian Deep Learning on Linux Includes a V2P model (model 3)

used with Grid Corpus

For data manipulation use some of these common commands:

unzip a file

sudo unzip align.zip -d align_full

un tar a file

sudo tar -xvf s32.tar

for me this was useful because I used thes commands for quick alignment preperation

There were too many of them to the rename operation so I had to do them all in their own respective s* directories

for i in {1..34}; do sudo mkdir s$i;sudo mv s$i.tar s$i/s$i.tar; cd s$i; sudo tar -xvf s$i.tar;cd ..;done

I had to rename them because my files were split up between regular .mp4 files and a *_mouth.mp4 version, it was easer to rename the aligns than the videos

for i in {1..34}; do cd s$i/align;sudo rename 's/.align/_mouth.align/' *;cd ../..; done

sudo mkdir align

putting them all into one directory

for i in {1..34}; do sudo cp s$i/align/* align; done

removing non homogenous videos, some more work can be done here as I still get errors when training

for i in {25..30};do sudo rm -rf s$i; done for i in {3..5};do sudo rm -rf s$i; done for i in {12..15};do sudo rm -rf s$i; done

do this before training, it just creates a bunch of symbolic links that it needs to start training

python prepare.py ~/video/video/video/ ~/align_full/align/ 1000

when in training directory under a specific scenario you can use this to train

note you have to move lipnet directory inside of it otherwise it cannot find it

python train.py s1

dependency of model4 only

pip install --no-deps tensorflow-addons==0.5.1

I created a convert mp4 to frame script as the framework plays nicer with frames

sudo ~/LipNetEnv/bin/python2.7 convert_mp4_to_frames.py video *.mp4 video

Keras implementation of the method described in the paper 'LipNet: End-to-End Sentence-level Lipreading' by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas (https://arxiv.org/abs/1611.01599).

Results

Scenario	Epoch	CER	WER	BLEU
Unseen speakers [C]	N/A	N/A	N/A	N/A
Unseen speakers	178	6.19%	14.19%	88.21%
Overlapped speakers [C]	N/A	N/A	N/A	N/A
Overlapped speakers	368	1.56%	3.38%	96.93%

Notes:

[C] means using curriculum learning.
N/A means either the training is in progress or haven't been performed.
Your contribution in sharing the results of this model is highly appreciated :)

Dependencies

Keras 2.0+
Tensorflow 1.0+
PIP (for package installation)

Plus several other libraries listed on setup.py

Usage

To use the model, first you need to clone the repository:

git clone https://github.com/rizkiarm/LipNet

Then you can install the package:

cd LipNet/
pip install -e .

Note: if you don't want to use CUDA, you need to edit the setup.py and change tensorflow-gpu to tensorflow

You're done!

Here is some ideas on what you can do next:

Modify the package and make some improvements to it.
Train the model using predefined training scenarios.
Make your own training scenarios.
Use pre-trained weights to do lipreading.
Go crazy and experiment on other dataset! by changing some hyperparameters or modify the model.

Dataset

This model uses GRID corpus (http://spandh.dcs.shef.ac.uk/gridcorpus/)

Pre-trained weights

For those of you who are having difficulties in training the model (or just want to see the end results), you can download and use the weights provided here: https://github.com/rizkiarm/LipNet/tree/master/evaluation/models.

More detail on saving and loading weights can be found in Keras FAQ.

Training

There are five different training scenarios that are (going to be) available:

Prerequisites

Download all video (normal) and align from the GRID Corpus website.
Extracts all the videos and aligns.
Create datasets folder on each training scenario folder.
Create align folder inside the datasets folder.
All current train.py expect the videos to be in the form of 100x50px mouthcrop image frames. You can change this by adding vtype = "face" and face_predictor_path (which can be found in evaluation/models) in the instantiation of Generator inside the train.py
The other way would be to extract the mouthcrop image using scripts/extract_mouth_batch.py (usage can be found inside the script).
Create symlink from each training/*/datasets/align to your align folder.
You can change the training parameters by modifying train.py inside its respective scenarios.

Random split (Unmaintained)

Create symlink from training/random_split/datasets/video to your video dataset folder (which contains s* directory).

Train the model using the following command:

./train random_split [GPUs (optional)]

Note: You can change the validation split value by modifying the val_split argument inside the train.py.

Unseen speakers

Create the following folder:

training/unseen_speakers/datasets/train
training/unseen_speakers/datasets/val

Then, create symlink from training/unseen_speakers/datasets/[train|val]/s* to your selection of s* inside of the video dataset folder.

The paper used s1, s2, s20, and s22 for evaluation and the remainder for training.

Train the model using the following command:

./train unseen_speakers [GPUs (optional)]

Unseen speakers with curriculum learning

The same way you do unseen speakers.

Note: You can change the curriculum by modifying the curriculum_rules method inside the train.py

./train unseen_speakers_curriculum [GPUs (optional)]

Overlapped Speakers

Run the preparation script:

python prepare.py [Path to video dataset] [Path to align dataset] [Number of samples]

Notes:

[Path to video dataset] should be a folder with structure: /s{i}/[video]
[Path to align dataset] should be a folder with structure: /[align].align
[Number of samples] should be less than or equal to min(len(ls '/s{i}/*'))

Then run training for each speaker:

python training/overlapped_speakers/train.py s{i}

Overlapped Speakers with curriculum learning

Copy the prepare.py from overlapped_speakers folder to overlapped_speakers_curriculum folder, and run it as previously described in overlapped speakers training explanation.

Then run training for each speaker:

python training/overlapped_speakers_curriculum/train.py s{i}

Note: As always, you can change the curriculum by modifying the curriculum_rules method inside the train.py

Evaluation

To evaluate and visualize the trained model on a single video / image frames, you can execute the following command:

./predict [path to weight] [path to video]

Example:

./predict evaluation/models/overlapped-weights368.h5 evaluation/samples/id2_vcd_swwp2s.mpg

Work in Progress

This is a work in progress. Errors are to be expected. If you found some errors in terms of implementation please report them by submitting issue(s) or making PR(s). Thanks!

Some todos:

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
assets		assets
common		common
evaluation		evaluation
lipnet		lipnet
scripts		scripts
tests		tests
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
predict		predict
requirements.txt		requirements.txt
setup.py		setup.py
train		train

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipNet: End-to-End Sentence-level Lipreading - COMP 562 - Alexei Kouminov Version

unzip a file

un tar a file

for me this was useful because I used thes commands for quick alignment preperation

There were too many of them to the rename operation so I had to do them all in their own respective s* directories

I had to rename them because my files were split up between regular .mp4 files and a *_mouth.mp4 version, it was easer to rename the aligns than the videos

putting them all into one directory

removing non homogenous videos, some more work can be done here as I still get errors when training

do this before training, it just creates a bunch of symbolic links that it needs to start training

when in training directory under a specific scenario you can use this to train

note you have to move lipnet directory inside of it otherwise it cannot find it

dependency of model4 only

I created a convert mp4 to frame script as the framework plays nicer with frames

Results

Dependencies

Usage

Dataset

Pre-trained weights

Training

Prerequisites

Random split (Unmaintained)

Unseen speakers

Unseen speakers with curriculum learning

Overlapped Speakers

Overlapped Speakers with curriculum learning

Evaluation

Work in Progress

License

About

Releases

Packages

Languages

License

akouminov/LipNet

Folders and files

Latest commit

History

Repository files navigation

LipNet: End-to-End Sentence-level Lipreading - COMP 562 - Alexei Kouminov Version

unzip a file

un tar a file

for me this was useful because I used thes commands for quick alignment preperation

There were too many of them to the rename operation so I had to do them all in their own respective s* directories

I had to rename them because my files were split up between regular .mp4 files and a *_mouth.mp4 version, it was easer to rename the aligns than the videos

putting them all into one directory

removing non homogenous videos, some more work can be done here as I still get errors when training

do this before training, it just creates a bunch of symbolic links that it needs to start training

when in training directory under a specific scenario you can use this to train

note you have to move lipnet directory inside of it otherwise it cannot find it

dependency of model4 only

I created a convert mp4 to frame script as the framework plays nicer with frames

Results

Dependencies

Usage

Dataset

Pre-trained weights

Training

Prerequisites

Random split (Unmaintained)

Unseen speakers

Unseen speakers with curriculum learning

Overlapped Speakers

Overlapped Speakers with curriculum learning

Evaluation

Work in Progress

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages