Tensorflow implementation of the method described in the paper LipNet: End-to-End Sentence-level Lipreading
by Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, and Nando de Freitas (https://arxiv.org/abs/1611.01599).
I have made a few changes to the method used in the paper -
- Instead of using all 32 speakers in the GRID Dataset, I have used only 1 so as to save training time and computational resources.
- The paper uses dlib for extracting the lip region.I have done it statically myself so as to keep it straight forward