Skip to content

Commit

Permalink
Add image classification tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
tqtg committed Jun 1, 2018
1 parent a7884e0 commit 33860fe
Show file tree
Hide file tree
Showing 25 changed files with 2,034 additions and 0 deletions.
267 changes: 267 additions & 0 deletions 1806-image-classification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# Prerequisites

This code is written in [Python3](https://www.python.org/downloads/) and requires [TensorFlow 1.7+](https://www.tensorflow.org/install/). In addition, you need to install a few more packages to process data, [Anaconda](https://www.anaconda.com/download/) is recommended for python environment.

First, you need to clone our repository.

```bash
$ git clone https://github.com/PreferredAI/Tutorials.git
$ cd tutorials/1806-image-classification
```

Run command below to install some required packages.

```bash
$ pip3 install -r requirements.txt
```

# Tutorial #1 - Facial Expression Recognition

## Dataset

In this tutorial, we provide data consisting of 48x48 pixel grayscale images of faces. The task is to categorize each face based on the emotion shown in the facial expression in to one of two categories (0=Sad, 1=Happy).

I have provided a script to download the dataset.

```bash
$ cd face-emotion
$ chmod +x download.sh
$ ./download.sh
```

The data is already split into training and testing sets with the statistics shown in table below.

| Class | Training (# images) | Test (# images) |
| :---------: | :-----------------: | :-------------: |
| happy (1) | 4347 | 483 |
| sad (0) | 4347 | 483 |
| **Total** | 8694 | 966 |

## Model

### Multilayer Perceptron (MLP)

The MLP architecture can be viewed as:

| Layer | Dim/Kernel | Parameters |
| :-------: | :--------: | --------------------------: |
| fc1 | 512 | 48 x 48 x 512+1 = 1,179,649 |
| fc2 | 512 | 512 x 512+1 = 262,145 |
| output | 2 | 512 x 2+1 = 1025 |
| **Total** | | 1,442,819 |

### Shallow CNN

The shallow CNN architecture can be viewed as:

| Layer | Dim/Kernel | Parameters |
| :-------: | :--------: | -------------------------------: |
| conv | 5 x 5 | 5 x 5 x 32+1 = 801 |
| pool | | 0 |
| fc | 512 | 24 x 24 x 32 x 512+1 = 9,437,185 |
| output | 2 | 512 x 2+1 = 1025 |
| **Total** | | 9,439,011 |


### Deep CNN

The deep CNN architecture can be viewed as:

| Layer | Dim/Kernel | Parameters |
| :-------: | :--------: | --------------------------------: |
| conv1 | 5 x 5 | 5 x 5 x 32 + 1 = 801 |
| pool1 | | 0 |
| conv2 | 3 x 3 | 3 x 3 x 64 + 1 = 577 |
| pool2 | | 0 |
| conv3 | 3 x 3 | 3 x 3 x 128 + 1 = 1153 |
| conv4 | 3 x 3 | 3 x 3 x 128 + 1 = 1153 |
| pool3 | | 0 |
| fc | 512 | 6 x 6 x 128 x 512 + 1 = 2,359,397 |
| output | 2 | 512 x 2 + 1 = 1025 |
| **Total** | | 2,364,006 |


## Training and Evaluation

Train model:

```bash
$ cd src
$ python3 train.py --model [model_name]
```

```
optional arguments:
-h, --help show this help message and exit
--data_dir DATA_DIR
Path to data folder (default: ../data)
--model MODEL
Type of CNN model (shallow or deep)
--checkpoint_dir CHECKPOINT_DIR
Path to checkpoint folder (default: ../checkpoint)
--num_classes NUM_CLASSES
Number of label classes (default: 2)
--num_checkpoints NUM_CHECKPOINTS
Number of checkpoints to store (default: 1)
--log_dir LOG_DIR
Path to data folder (default: ../log)
--num_epochs NUM_EPOCHS
Number of training epochs (default: 10)
--batch_size BATCH_SIZE
Batch Size (default: 32)
--dropout_rate DROPOUT_RATE
Probability of dropping neurons (default: 0.25)
--learning_rate LEARNING_RATE
Learning rate (default: 0.001)
--allow_soft_placement ALLOW_SOFT_PLACEMENT
Allow device soft device placement
```


## Results

### Multilayer Perceptron

```bash
$ python3 train.py --model mlp
```

```text
Epoch number: 1
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:10<00:00, 26.85it/s, loss=0.608]
train_loss = 0.6452, train_acc = 63.61 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 37.88it/s, loss=0.536]
test_loss = 0.5819, test_acc = 68.74 %
...
...
...
Epoch number: 10
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:09<00:00, 28.55it/s, loss=0.53]
train_loss = 0.4391, train_acc = 79.03 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 35.52it/s, loss=0.18]
test_loss = 0.5077, test_acc = 74.53 %
Saved model checkpoint to ..\checkpoints\mlp\epoch_10
Best accuracy = 74.53 %
```


### Shallow CNN

```bash
$ python3 train.py --model shallow
```

```text
Epoch number: 1
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:33<00:00, 8.11it/s, loss=0.594]
train_loss = 0.6214, train_acc = 65.38 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:01<00:00, 22.09it/s, loss=0.396]
test_loss = 0.5418, test_acc = 73.19 %
...
...
...
Epoch number: 10
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [00:33<00:00, 8.15it/s, loss=0.64]
train_loss = 0.2343, train_acc = 90.40 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:01<00:00, 20.50it/s, loss=0.108]
test_loss = 0.4628, test_acc = 78.67 %
Best accuracy = 78.88 %
```

### Deep CNN

```bash
$ python3 train.py --model deep
```

```text
Epoch number: 1
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [01:00<00:00, 4.47it/s, loss=0.649]
train_loss = 0.6766, train_acc = 56.68 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:02<00:00, 13.47it/s, loss=0.639]
test_loss = 0.6473, test_acc = 62.94 %
...
...
...
Epoch number: 10
Training: 100%|███████████████████████████████████████████████████████████████████| 272/272 [01:01<00:00, 4.41it/s, loss=0.374]
train_loss = 0.3042, train_acc = 86.71 %
Testing: 100%|███████████████████████████████████████████████████████████████████| 31/31 [00:02<00:00, 12.41it/s, loss=0.586]
test_loss = 0.3569, test_acc = 84.06 %
Best accuracy = 86.13 %
```

## Test trained models with other images

To test our trained model with other images:

```bash
$ python3 test.py --model [model_name] --data_dir [path_to_image_folder]
```

Some images are already in *test_images* folder for a quick test.

```bash
$ python3 test.py --model deep --data_dir ../test_images
```


# Tutorial #2 - Visual Sentiment Analysis


## Dataset and pre-trained models

```bash
$ cd vs-cnn
$ chmod +x download.sh
$ ./download.sh
$ cd src
```

## Base Model (VS-CNN)

Evaluate pre-trained model.

```bash
$ python3 eval_base.py --dataset user
```

```text
Boston
Loading data file: ../data/user/val_Boston.txt
Testing: 100%|███████████████████████████████████████████████████████████████████| 19/19 [00:36<00:00, 1.94s/it]
Pointwise Accuracy = 0.544
Pairwise Accuracy = 0.546
Avg. Pointwise Accuracy = 0.544
Avg. Pointwise Accuracy = 0.546
```


## Factor Model (uVS-CNN)

Evaluate pre-trained model with trained factor weights.

```bash
$ python3 eval_factor.py --dataset user
```

```text
Boston
Loading data file: ../data/user/val_Boston.txt
Testing: 100%|███████████████████████████████████████████████████████████████████| 1194/1194 [00:48<00:00, 24.41it/s]
Pointwise Accuracy = 0.664
Pairwise Accuracy = 0.720
Avg. Pointwise Accuracy = 0.664
Avg. Pointwise Accuracy = 0.720
```
8 changes: 8 additions & 0 deletions 1806-image-classification/face-emotion/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
data/
log/
checkpoints/

src/.idea/
src/__pycache__/
src/data_prepare.py
src/*.pyc
17 changes: 17 additions & 0 deletions 1806-image-classification/face-emotion/download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

if [ -d "data" ]; then
echo "'data' folder already exists!"
echo "You need to remove it before downloading again."
else
if [ ! -f 'data.tar.gz' ]; then
echo 'Data downloading ...'
echo
curl -L 'https://static.preferred.ai/tutorial/face-emotion/data.tar.gz' -o data.tar.gz
fi

echo "Data extracting ..."
tar -zxf data.tar.gz
rm data.tar.gz
echo "Data is ready!"
fi
87 changes: 87 additions & 0 deletions 1806-image-classification/face-emotion/src/data_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
import tensorflow as tf
import pandas as pd
import numpy as np

class DataGenerator(object):
def __init__(self, train_file, test_file, batch_size, num_threads, buffer_size=10000, train_shuffle=True):
self.batch_size = batch_size
self.num_threads = num_threads
self.buffer_size = buffer_size
self.train_shuffle = train_shuffle

# read datasets from csv files
self.train_img_paths, self.train_labels = self._read_csv_file(train_file)
self.test_img_paths, self.test_labels = self._read_csv_file(test_file)

# number of batches per epoch
self.train_batches_per_epoch = int(np.ceil(len(self.train_labels) / batch_size))
self.test_batches_per_epoch = int(np.ceil(len(self.test_labels) / batch_size))

# build datasets
self._build_train_set()
self._build_test_set()

# create an reinitializable iterator given the dataset structure
self.iterator = tf.data.Iterator.from_structure(self.train_set.output_types,
self.train_set.output_shapes)
self.train_init_opt = self.iterator.make_initializer(self.train_set)
self.test_init_opt = self.iterator.make_initializer(self.test_set)
self.next = self.iterator.get_next()

def load_train_set(self, session):
session.run(self.train_init_opt)

def load_test_set(self, session):
session.run(self.test_init_opt)

def get_next(self, session):
return session.run(self.next)

def _read_csv_file(self, data_file):
"""Read the content of the text file and store it into lists."""
df = pd.read_csv(data_file, header=None)
img_paths = df[0].values
labels = df[1].values
return img_paths, labels

def _build_data_set(self, img_paths, labels, map_fn, shuffle=False):
img_paths = tf.convert_to_tensor(img_paths, dtype=tf.string)
labels = tf.convert_to_tensor(labels, dtype=tf.int32)
data = tf.data.Dataset.from_tensor_slices((img_paths, labels))
if shuffle:
data = data.shuffle(buffer_size=self.buffer_size)
data = data.map(map_fn, num_parallel_calls=self.num_threads)
data = data.batch(self.batch_size)
data = data.prefetch(self.num_threads)
return data

def _build_train_set(self):
self.train_set = self._build_data_set(self.train_img_paths,
self.train_labels,
self._parse_function_train,
self.train_shuffle)

def _build_test_set(self):
self.test_set = self._build_data_set(self.test_img_paths,
self.test_labels,
self._parse_function_test)

def _parse_function_train(self, filename, label):
# load and preprocess the image
img_string = tf.read_file(filename)
img_decoded = tf.cast(tf.image.decode_jpeg(img_string), dtype=tf.float32)
"""
Data augmentation comes here.
"""
img_flipped = tf.image.random_flip_left_right(img_decoded)
img_scaled = tf.divide(img_flipped, tf.constant(255.0, dtype=tf.float32))
img_centered = tf.subtract(img_scaled, tf.constant(0.5, dtype=tf.float32))
return img_centered, label

def _parse_function_test(self, filename, label):
# load and preprocess the image
img_string = tf.read_file(filename)
img_decoded = tf.cast(tf.image.decode_jpeg(img_string), dtype=tf.float32)
img_scaled = tf.divide(img_decoded, tf.constant(255.0, dtype=tf.float32))
img_centered = tf.subtract(img_scaled, tf.constant(0.5, dtype=tf.float32))
return img_centered, label

0 comments on commit 33860fe

Please sign in to comment.