LipSpeak

UC Berkeley - MIDS Capstone Fall 2020

Mission: To improve quality of life for people who lost their ability to speak by helping them communicate effectively using latest advances in data science.

Team: Lina Gurevich, Erik Hou, Avinash Chandrasekaran, Daisy Ya

[Final Project Website]

python
>>> import requests, json, os
>>> data = {'queries': ['call an ambulance', 'difficulty breathing']}
>>> files = {"file": (os.path.basename('./demo.mp4'),
             open('./demo.mp4','rb'),'application/octet-stream'),"phrasebook": (None, json.dumps(data))}
>>> resp = requests.post("http://url-where-server-is-running.com:5000/predict",files=files)

The above set of commands, pass a demo video and user defined phrasebook to the server that has been setup earlier.

Expected output: Prediction is "difficulty breathing"

The demo video corresponds to mouthing the words "difficulty breathing". The model predicts this correctly. If tried out in our app, the app would sound "I have difficulty breathing" phrase

3. Description

The models used in our project have been trained and evaluated on LRW and LRS datasets. The pre-trained deep lip reading model can be located at models/lrs2_lip_model and the keyword-spotting model can be located at misc/pretrained_models
When mouthing a video through the app, the appropriate features required by the lip reading model are pre-computed and saved in data/lipspeak. config.py specifies the necessary configuration setup for the lip reading model.
KWSNet keyword spotting model configuration is available in configs/demo/eval.json
app.py is the python script that initializes all models & starts the flask server for backend inference task. It exposes a REST API to the mobile app, and expects the video and a user defined phrasebook as inputs. When inputs are obtained, we first extract the visual features from the video, run the keyword spotting model and compute probabilities to identify what the mouthed phrase was. This is then reported back to the app

4. Mobile Setup

For details regarding our mobile development, please refer to LipSpeak App Project

5. Limitations

We would like to emphasise that this research represents a working progress towards, and as such, has a few limitations that we are aware of.

Homophemes - for example, the words "may", "pay", "bay" cannot be distinguished without audio as the visemes "m", "p", "b" visually look the same.
Accents, speed of speech and mumbling which modify lip movements.
Variable imaging conditions such as lighting, motion and resolution which modiy the appearance of the lips.
Shorter keywords which are harder to visually spot.

6. Citation

If you use this code, please cite the following:

@misc{momeni2020seeing,
    title={Seeing wake words: Audio-visual Keyword Spotting},
    author={Liliane Momeni and Triantafyllos Afouras and Themos Stafylakis
            and Samuel Albanie and Andrew Zisserman},
    year={2020},
    eprint={2009.01225},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
base		base
configs		configs
data		data
data_loader		data_loader
datas		datas
language_model		language_model
lip_model		lip_model
logger		logger
media		media
misc		misc
model		model
trainer		trainer
util		util
utils		utils
vocab		vocab
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
demo.mp4		demo.mp4
download_models.sh		download_models.sh
parse_config.py		parse_config.py
requirements.txt		requirements.txt
server.py		server.py
setup		setup
test_LRS.py		test_LRS.py
train_LRS.py		train_LRS.py
train_LRW.py		train_LRW.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LipSpeak

UC Berkeley - MIDS Capstone Fall 2020

Contents

1. Preparation

2. Running a Demo

3. Description

4. Mobile Setup

5. Limitations

6. Citation

About

Releases

Packages

Languages

License

avinashsc/Lipspeak

Folders and files

Latest commit

History

Repository files navigation

LipSpeak

UC Berkeley - MIDS Capstone Fall 2020

Contents

1. Preparation

2. Running a Demo

3. Description

4. Mobile Setup

5. Limitations

6. Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages