日本語の解説記事はこちら(Japanese article is Here.)

OpenFace+機械学習で視線検知

0. Background

This article is about what I've worked when I was undergraduate.

Since the aim of this activity is to learn machine learning technology, there would be no novelty.

1. Goal

My goal was to detect eyesight from one face image.

I mean whether the person in the image looks camera or not.

Some people in lab, where I worked, have done research on interactive robots system.

So I hoped my work would contribute to their interactive system, utilizing eyesight detection.

2. Related Technology(OpenFace)

There are various images of face, for example with respect to size and position.

For that reason, I tried to detect eyesight after detection of face with OpenFace API.

I used the API for following two objects.

Get position(coordinates) of faces's landmarks, black points in the image below.
Crop face in the shape of box.

As you can see in the following image, 64 face's landmarks are pointed.

Then, cropping face area using points around eyes.

You can also get coordinates of each landmark.

3. Model for Prediction

I tested 3 following models to detect eyesight.

SVM(Support Vector Machine) with raw pixels as input.
SVM with SIFT features as input.
CNN(Convolutional Neural Network) with raw pixels as input.

I'm going to show each model below.

1. SVM(Support Vector Machine) with raw pixels as input.

As preprocessing, I applied histogram flattening to face images which cropped by OpenFace.

Then, I trained SVM with top one third of cropped images as input.

I show the example below.

The both Process of transfer to grayscale and use of top one third aim for dimension reduction.

And histogram flattening is for robustness of divergence of contrast.

Finally, I trained SVM, using processed vectors as input.

2. SVM with SIFT features as input.

SIFT features is often used for image recognition.

In my case, since I knew coordinates of face's landmarks around eyes, I calculated SIFT features in those points.

I trained SVM using features obtained in that way.

3. CNN(Convolutional Neural Network) with raw pixels as input.

Input is top one third of processed face image, same as model 1(SVM with raw pixels).

I show network architecture below.

4. Evaluation Experiments

Dataset

The dataset I used for experiments includes 1300 positive images(looking camera) and 600 negative images(not looking camera).

I evaluate in hold-out way, divide whole images in the ration of 9 to 1.

Results

Model	Accuracy [%]
1. SVM(Input: Raw pixel)	81.3
2. SVM(Input: SIFT Features)	82.3
3. CNN	88.7

Conclusion（Impression）

Neural Network outperformed while the model was easy to deploy.

5. Appendix

I applied trained models to movie captured by web camera.

The blue box means the model predicts a person looks camera, and red means otherwise.

1. SVM(Input: Raw Pixel)

2. CNN

I had a feeling that CNN model is more stable than another.

As I thought, prediction seems to be difficult when the person doesn't face front.

Maybe, this is because we didn't have a sufficient number of images for training.

In addition, if we apply these models for movie, we should use techniques such as smoothing or sequential modeling.

I'm going to organize my code and upload to Github.

Thank you for reading!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
data_preparation		data_preparation
output		output
video		video
.gitignore		.gitignore
README.md		README.md
cnn_train.py		cnn_train.py
presentation.pdf		presentation.pdf
realtime_classification_cnn.py		realtime_classification_cnn.py
requirements.txt		requirements.txt
svm_train.py		svm_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

日本語の解説記事はこちら(Japanese article is Here.)

0. Background

1. Goal

2. Related Technology(OpenFace)

3. Model for Prediction

1. SVM(Support Vector Machine) with raw pixels as input.

2. SVM with SIFT features as input.

3. CNN(Convolutional Neural Network) with raw pixels as input.

4. Evaluation Experiments

Dataset

Results

Conclusion（Impression）

5. Appendix

1. SVM(Input: Raw Pixel)

2. CNN

About

Releases

Packages

Languages

takuya29/EyeSight

Folders and files

Latest commit

History

Repository files navigation

日本語の解説記事はこちら(Japanese article is Here.)

0. Background

1. Goal

2. Related Technology(OpenFace)

3. Model for Prediction

1. SVM(Support Vector Machine) with raw pixels as input.

2. SVM with SIFT features as input.

3. CNN(Convolutional Neural Network) with raw pixels as input.

4. Evaluation Experiments

Dataset

Results

Conclusion（Impression）

5. Appendix

1. SVM(Input: Raw Pixel)

2. CNN

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages