# Keras Demo - Food Photo Classification

In this demo, we are going to build a convolutional neural network using Keras, and train it to classify 4 types of foods - Ais Kacang, Ang Ku Kueh, Apam Balik, and Asam Laksa.

<table>
<tr>
<td><img src="datasets/train/AisKacang/001.jpg" width=200 height=200 /></td>
<td><img src="datasets/train/AngKuKueh/050.jpg" width=200 height=200 /></td>
<td><img src="datasets/train/ApamBalik/015.jpg" width=200 height=200 /></td>
<td><img src="datasets/train/AsamLaksa/012.jpg" width=200 height=200 /></td>
</tr>
</table>

The dataset photos are provided by [Jack](https://github.com/jackgoh/) from the FoodTag project.

**!** The full dataset will not be made available on GitHub. You can however, try to put in other dataset like [Dogs & Cats](https://www.kaggle.com/c/dogs-vs-cats).

References (citation mesti bagi):
1. [Where I copied & pasted sample codes from](https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d)
2. [Goh Cheng Kee](https://github.com/jackgoh), Wong Chin Yee, [John See](https://john-see.github.io) (2017). FoodTag: Automatic Classification of Food Photos Using Deep Learning.

In [1]:
train_dir = 'datasets/train'
test_dir  = 'datasets/test'
nb_train_samples = 400
nb_test_samples = 400
img_width = 227
img_height = 227

## Design the neural network

Here, we build the convolutional neural network that will be used to classify the images. This is how a basic convnet can look like:

![LeNet](img/mylenet.png)


In [2]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D

model = Sequential()

# Most of these values like nb_filters, nb_rows, nb_columns are mostly trial and error.
# There is no definite answer to what is the correct one, just keep trying.
# input_shape[0] = 3 because of RGB channels
model.add(Convolution2D(32, 5, 5, input_shape=(3, img_width, img_height)))

# ReLU ensures none of the output value from CONV layer falls below 0
# To prevent vanishing gradient problem (failure to learn) from happening
model.add(Activation('relu'))

# For every 2x2 "pixels" in the output feature map, pick the highest value,
# more like pick the most activated one.
# Most importantly, pooling effectively reduces the feature map size by 75%
model.add(MaxPooling2D(pool_size=(2,2)))

# Repeat a few times before going into fully connected (dense) layers
# You are free to comment them out or add more conv relu pool layers
# See what it does to the accuracy
model.add(Convolution2D(64, 5, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Convolution2D(128, 5, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Convolution2D(256, 5, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

# We can do as many convolutions and poolings as we like, but it's time to really classify
# Again, you can do as many dense layers as you like, use different activations and dropout %
# It's all trial and error, this demo code may not be the most optimal after all!
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

# Final layer, 4 nodes representing each class, softmax activation makes sure every node sums up to 1
# That means the output nodes are in % score for each food.
model.add(Dense(4))
model.add(Activation('softmax'))

# The architecture is there. But how to measure how close we are to perfection?
# Measure the difference from prediction to ground truth - the loss
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Using TensorFlow backend.


## Prepare image data generator

The inputs for neural networks are tensors - think matrices of numbers. In case of images, the tensors can be rows and columns of pixels, with each pixel quantised into red, green, blue channels. Hence, the 3 layers of convolutional neural networks, along with its fixed widths and heights.

Keras provides this nifty data generator function to generate tensors of images during training. There are plenty of options that can be passed into the generator to augment images in real time. However, we won't be touching those yet.

By simply passing in the directory names into the generator, you saved yourself from:
* Manually generating RGB matrices from images
* Resizing them to reduce processing on neural network's end (full-size photos may not fit into your graphics card VRAM)
* Reshuffle them in every epoch

See:
- [Keras Blog - Building powerful image classification models using very little data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html)
- [ImageDataGenerator documentation](https://keras.io/preprocessing/image/)

In [3]:
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rescale=1./255)

training_generator = datagen.flow_from_directory(train_dir,
    target_size=(img_width, img_height),
    batch_size=20,
    class_mode="categorical")

testing_generator = datagen.flow_from_directory(test_dir,
    target_size=(img_width, img_height),
    batch_size=20,
    class_mode="categorical")

Found 400 images belonging to 4 classes.
Found 400 images belonging to 4 classes.


## LET'S TRAIN THE CONVNET!

Since there are 4 food types (classes) to classify, a neural network that doesn't work will return random results - and yield 0.25 accuracy. So if you network keeps yielding 0.25 `val_acc` after a few epochs, check your setup.


In [4]:
model.fit_generator(
        training_generator,
        samples_per_epoch=nb_train_samples,
        nb_epoch=20,
        validation_data=testing_generator,
        nb_val_samples=nb_test_samples)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f06a1e0cef0>

## Let's try augment the image data for higher accuracy

In previous image data generator, we simply ask it to stream from the photo folders, and pass the photo at a smaller size into the convnet. We can reach about 70% validation accuracy - that means every 10 times the network is shown a photo they haven't seen before, the network got it right 7 times.

A few things we can do to _maybe_ make the training better:

* Randomly rotate the images left and right because people don't always take food photos upright?
* Randomly flip the photo horizontally?

Keras built-in ImageDataGenerator is able to augment these images at training time, this is extremely handy when you have little training data, but you want your model to be able to generalise for wider range of applications - to look at different food photos better.

#### How to know if rotating and flipping images is working?
Look at the increase of `val_acc` and drop of `val_loss` compared to the log above. That means the neural network is able to generalise for the photos it has never looked at before.


In [5]:
datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=50,
    horizontal_flip=True)

training_generator = datagen.flow_from_directory(train_dir,
    target_size=(img_width, img_height),
    batch_size=20,
    class_mode="categorical")

testing_generator = datagen.flow_from_directory(test_dir,
    target_size=(img_width, img_height),
    batch_size=20,
    class_mode="categorical")

model.fit_generator(
        training_generator,
        samples_per_epoch=nb_train_samples,
        nb_epoch=20,
        validation_data=testing_generator,
        nb_val_samples=nb_test_samples)

Found 400 images belonging to 4 classes.
Found 400 images belonging to 4 classes.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f06a02820b8>

## It worked!

Our dataset is really small, with only 4 types of food and only 100 photos per category to train, `ImageDataGenerator` really is our friend for rotating and flipping our images around so our convolutional neural network can learn to classify foods better - with a peak validating accuracy of 80%!

## Explore more

1. [CS231n - Convolutional Neural Net](http://cs231n.github.io/convolutional-networks/#fc)
2. [Keras homepage](https://keras.io/layers)