-
Notifications
You must be signed in to change notification settings - Fork 67
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
25 changed files
with
2,034 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,267 @@ | ||
# Prerequisites | ||
|
||
This code is written in [Python3](https://www.python.org/downloads/) and requires [TensorFlow 1.7+](https://www.tensorflow.org/install/). In addition, you need to install a few more packages to process data, [Anaconda](https://www.anaconda.com/download/) is recommended for python environment. | ||
|
||
First, you need to clone our repository. | ||
|
||
```bash | ||
$ git clone https://github.com/PreferredAI/Tutorials.git | ||
$ cd tutorials/1806-image-classification | ||
``` | ||
|
||
Run command below to install some required packages. | ||
|
||
```bash | ||
$ pip3 install -r requirements.txt | ||
``` | ||
|
||
# Tutorial #1 - Facial Expression Recognition | ||
|
||
## Dataset | ||
|
||
In this tutorial, we provide data consisting of 48x48 pixel grayscale images of faces. The task is to categorize each face based on the emotion shown in the facial expression in to one of two categories (0=Sad, 1=Happy). | ||
|
||
I have provided a script to download the dataset. | ||
|
||
```bash | ||
$ cd face-emotion | ||
$ chmod +x download.sh | ||
$ ./download.sh | ||
``` | ||
|
||
The data is already split into training and testing sets with the statistics shown in table below. | ||
|
||
| Class | Training (# images) | Test (# images) | | ||
| :---------: | :-----------------: | :-------------: | | ||
| happy (1) | 4347 | 483 | | ||
| sad (0) | 4347 | 483 | | ||
| **Total** | 8694 | 966 | | ||
|
||
## Model | ||
|
||
### Multilayer Perceptron (MLP) | ||
|
||
The MLP architecture can be viewed as: | ||
|
||
| Layer | Dim/Kernel | Parameters | | ||
| :-------: | :--------: | --------------------------: | | ||
| fc1 | 512 | 48 x 48 x 512+1 = 1,179,649 | | ||
| fc2 | 512 | 512 x 512+1 = 262,145 | | ||
| output | 2 | 512 x 2+1 = 1025 | | ||
| **Total** | | 1,442,819 | | ||
|
||
### Shallow CNN | ||
|
||
The shallow CNN architecture can be viewed as: | ||
|
||
| Layer | Dim/Kernel | Parameters | | ||
| :-------: | :--------: | -------------------------------: | | ||
| conv | 5 x 5 | 5 x 5 x 32+1 = 801 | | ||
| pool | | 0 | | ||
| fc | 512 | 24 x 24 x 32 x 512+1 = 9,437,185 | | ||
| output | 2 | 512 x 2+1 = 1025 | | ||
| **Total** | | 9,439,011 | | ||
|
||
|
||
### Deep CNN | ||
|
||
The deep CNN architecture can be viewed as: | ||
|
||
| Layer | Dim/Kernel | Parameters | | ||
| :-------: | :--------: | --------------------------------: | | ||
| conv1 | 5 x 5 | 5 x 5 x 32 + 1 = 801 | | ||
| pool1 | | 0 | | ||
| conv2 | 3 x 3 | 3 x 3 x 64 + 1 = 577 | | ||
| pool2 | | 0 | | ||
| conv3 | 3 x 3 | 3 x 3 x 128 + 1 = 1153 | | ||
| conv4 | 3 x 3 | 3 x 3 x 128 + 1 = 1153 | | ||
| pool3 | | 0 | | ||
| fc | 512 | 6 x 6 x 128 x 512 + 1 = 2,359,397 | | ||
| output | 2 | 512 x 2 + 1 = 1025 | | ||
| **Total** | | 2,364,006 | | ||
|
||
|
||
## Training and Evaluation | ||
|
||
Train model: | ||
|
||
```bash | ||
$ cd src | ||
$ python3 train.py --model [model_name] | ||
``` | ||
|
||
``` | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
--data_dir DATA_DIR | ||
Path to data folder (default: ../data) | ||
--model MODEL | ||
Type of CNN model (shallow or deep) | ||
--checkpoint_dir CHECKPOINT_DIR | ||
Path to checkpoint folder (default: ../checkpoint) | ||
--num_classes NUM_CLASSES | ||
Number of label classes (default: 2) | ||
--num_checkpoints NUM_CHECKPOINTS | ||
Number of checkpoints to store (default: 1) | ||
--log_dir LOG_DIR | ||
Path to data folder (default: ../log) | ||
--num_epochs NUM_EPOCHS | ||
Number of training epochs (default: 10) | ||
--batch_size BATCH_SIZE | ||
Batch Size (default: 32) | ||
--dropout_rate DROPOUT_RATE | ||
Probability of dropping neurons (default: 0.25) | ||
--learning_rate LEARNING_RATE | ||
Learning rate (default: 0.001) | ||
--allow_soft_placement ALLOW_SOFT_PLACEMENT | ||
Allow device soft device placement | ||
``` | ||
|
||
|
||
## Results | ||
|
||
### Multilayer Perceptron | ||
|
||
```bash | ||
$ python3 train.py --model mlp | ||
``` | ||
|
||
```text | ||
Epoch number: 1 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:10<00:00, 26.85it/s, loss=0.608] | ||
train_loss = 0.6452, train_acc = 63.61 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 37.88it/s, loss=0.536] | ||
test_loss = 0.5819, test_acc = 68.74 % | ||
... | ||
... | ||
... | ||
Epoch number: 10 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:09<00:00, 28.55it/s, loss=0.53] | ||
train_loss = 0.4391, train_acc = 79.03 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 35.52it/s, loss=0.18] | ||
test_loss = 0.5077, test_acc = 74.53 % | ||
Saved model checkpoint to ..\checkpoints\mlp\epoch_10 | ||
Best accuracy = 74.53 % | ||
``` | ||
|
||
|
||
### Shallow CNN | ||
|
||
```bash | ||
$ python3 train.py --model shallow | ||
``` | ||
|
||
```text | ||
Epoch number: 1 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:33<00:00, 8.11it/s, loss=0.594] | ||
train_loss = 0.6214, train_acc = 65.38 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:01<00:00, 22.09it/s, loss=0.396] | ||
test_loss = 0.5418, test_acc = 73.19 % | ||
... | ||
... | ||
... | ||
Epoch number: 10 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:33<00:00, 8.15it/s, loss=0.64] | ||
train_loss = 0.2343, train_acc = 90.40 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:01<00:00, 20.50it/s, loss=0.108] | ||
test_loss = 0.4628, test_acc = 78.67 % | ||
Best accuracy = 78.88 % | ||
``` | ||
|
||
### Deep CNN | ||
|
||
```bash | ||
$ python3 train.py --model deep | ||
``` | ||
|
||
```text | ||
Epoch number: 1 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [01:00<00:00, 4.47it/s, loss=0.649] | ||
train_loss = 0.6766, train_acc = 56.68 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:02<00:00, 13.47it/s, loss=0.639] | ||
test_loss = 0.6473, test_acc = 62.94 % | ||
... | ||
... | ||
... | ||
Epoch number: 10 | ||
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [01:01<00:00, 4.41it/s, loss=0.374] | ||
train_loss = 0.3042, train_acc = 86.71 % | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:02<00:00, 12.41it/s, loss=0.586] | ||
test_loss = 0.3569, test_acc = 84.06 % | ||
Best accuracy = 86.13 % | ||
``` | ||
|
||
## Test trained models with other images | ||
|
||
To test our trained model with other images: | ||
|
||
```bash | ||
$ python3 test.py --model [model_name] --data_dir [path_to_image_folder] | ||
``` | ||
|
||
Some images are already in *test_images* folder for a quick test. | ||
|
||
```bash | ||
$ python3 test.py --model deep --data_dir ../test_images | ||
``` | ||
|
||
|
||
# Tutorial #2 - Visual Sentiment Analysis | ||
|
||
|
||
## Dataset and pre-trained models | ||
|
||
```bash | ||
$ cd vs-cnn | ||
$ chmod +x download.sh | ||
$ ./download.sh | ||
$ cd src | ||
``` | ||
|
||
## Base Model (VS-CNN) | ||
|
||
Evaluate pre-trained model. | ||
|
||
```bash | ||
$ python3 eval_base.py --dataset user | ||
``` | ||
|
||
```text | ||
Boston | ||
Loading data file: ../data/user/val_Boston.txt | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 19/19 [00:36<00:00, 1.94s/it] | ||
Pointwise Accuracy = 0.544 | ||
Pairwise Accuracy = 0.546 | ||
Avg. Pointwise Accuracy = 0.544 | ||
Avg. Pointwise Accuracy = 0.546 | ||
``` | ||
|
||
|
||
## Factor Model (uVS-CNN) | ||
|
||
Evaluate pre-trained model with trained factor weights. | ||
|
||
```bash | ||
$ python3 eval_factor.py --dataset user | ||
``` | ||
|
||
```text | ||
Boston | ||
Loading data file: ../data/user/val_Boston.txt | ||
Testing: 100%|███████████████████████████████████████████████████████████████████| 1194/1194 [00:48<00:00, 24.41it/s] | ||
Pointwise Accuracy = 0.664 | ||
Pairwise Accuracy = 0.720 | ||
Avg. Pointwise Accuracy = 0.664 | ||
Avg. Pointwise Accuracy = 0.720 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
data/ | ||
log/ | ||
checkpoints/ | ||
|
||
src/.idea/ | ||
src/__pycache__/ | ||
src/data_prepare.py | ||
src/*.pyc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#!/bin/bash | ||
|
||
if [ -d "data" ]; then | ||
echo "'data' folder already exists!" | ||
echo "You need to remove it before downloading again." | ||
else | ||
if [ ! -f 'data.tar.gz' ]; then | ||
echo 'Data downloading ...' | ||
echo | ||
curl -L 'https://static.preferred.ai/tutorial/face-emotion/data.tar.gz' -o data.tar.gz | ||
fi | ||
|
||
echo "Data extracting ..." | ||
tar -zxf data.tar.gz | ||
rm data.tar.gz | ||
echo "Data is ready!" | ||
fi |
87 changes: 87 additions & 0 deletions
87
1806-image-classification/face-emotion/src/data_generator.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
import tensorflow as tf | ||
import pandas as pd | ||
import numpy as np | ||
|
||
class DataGenerator(object): | ||
def __init__(self, train_file, test_file, batch_size, num_threads, buffer_size=10000, train_shuffle=True): | ||
self.batch_size = batch_size | ||
self.num_threads = num_threads | ||
self.buffer_size = buffer_size | ||
self.train_shuffle = train_shuffle | ||
|
||
# read datasets from csv files | ||
self.train_img_paths, self.train_labels = self._read_csv_file(train_file) | ||
self.test_img_paths, self.test_labels = self._read_csv_file(test_file) | ||
|
||
# number of batches per epoch | ||
self.train_batches_per_epoch = int(np.ceil(len(self.train_labels) / batch_size)) | ||
self.test_batches_per_epoch = int(np.ceil(len(self.test_labels) / batch_size)) | ||
|
||
# build datasets | ||
self._build_train_set() | ||
self._build_test_set() | ||
|
||
# create an reinitializable iterator given the dataset structure | ||
self.iterator = tf.data.Iterator.from_structure(self.train_set.output_types, | ||
self.train_set.output_shapes) | ||
self.train_init_opt = self.iterator.make_initializer(self.train_set) | ||
self.test_init_opt = self.iterator.make_initializer(self.test_set) | ||
self.next = self.iterator.get_next() | ||
|
||
def load_train_set(self, session): | ||
session.run(self.train_init_opt) | ||
|
||
def load_test_set(self, session): | ||
session.run(self.test_init_opt) | ||
|
||
def get_next(self, session): | ||
return session.run(self.next) | ||
|
||
def _read_csv_file(self, data_file): | ||
"""Read the content of the text file and store it into lists.""" | ||
df = pd.read_csv(data_file, header=None) | ||
img_paths = df[0].values | ||
labels = df[1].values | ||
return img_paths, labels | ||
|
||
def _build_data_set(self, img_paths, labels, map_fn, shuffle=False): | ||
img_paths = tf.convert_to_tensor(img_paths, dtype=tf.string) | ||
labels = tf.convert_to_tensor(labels, dtype=tf.int32) | ||
data = tf.data.Dataset.from_tensor_slices((img_paths, labels)) | ||
if shuffle: | ||
data = data.shuffle(buffer_size=self.buffer_size) | ||
data = data.map(map_fn, num_parallel_calls=self.num_threads) | ||
data = data.batch(self.batch_size) | ||
data = data.prefetch(self.num_threads) | ||
return data | ||
|
||
def _build_train_set(self): | ||
self.train_set = self._build_data_set(self.train_img_paths, | ||
self.train_labels, | ||
self._parse_function_train, | ||
self.train_shuffle) | ||
|
||
def _build_test_set(self): | ||
self.test_set = self._build_data_set(self.test_img_paths, | ||
self.test_labels, | ||
self._parse_function_test) | ||
|
||
def _parse_function_train(self, filename, label): | ||
# load and preprocess the image | ||
img_string = tf.read_file(filename) | ||
img_decoded = tf.cast(tf.image.decode_jpeg(img_string), dtype=tf.float32) | ||
""" | ||
Data augmentation comes here. | ||
""" | ||
img_flipped = tf.image.random_flip_left_right(img_decoded) | ||
img_scaled = tf.divide(img_flipped, tf.constant(255.0, dtype=tf.float32)) | ||
img_centered = tf.subtract(img_scaled, tf.constant(0.5, dtype=tf.float32)) | ||
return img_centered, label | ||
|
||
def _parse_function_test(self, filename, label): | ||
# load and preprocess the image | ||
img_string = tf.read_file(filename) | ||
img_decoded = tf.cast(tf.image.decode_jpeg(img_string), dtype=tf.float32) | ||
img_scaled = tf.divide(img_decoded, tf.constant(255.0, dtype=tf.float32)) | ||
img_centered = tf.subtract(img_scaled, tf.constant(0.5, dtype=tf.float32)) | ||
return img_centered, label |
Oops, something went wrong.