This repository contains the implementation of LipNet architecture on one of the parts of the dataset. LipNet is an archtitecture proposed to read the LIPs from a video and translate that information into words.
The architecture consists of Conv and LSTM Layers. Its a CNN-RNN architecture.
Here you can find the checkpoint file for the model that has been trained for quite some time as training it takes a huge amount of time.
The dataset can be found on internet when searched for LIPNET.