# Recipe classification using image data

In order to try this notebook, you need prepare your own data. <br>
Directory structure is assumed as following (category0 or 1 can be replaced with the name of the category).

- /work/data/image/
  - /train
    - /category0/*.jpg
    - /category1/*.jpg
    - ...
  - /test
    - /category0/*.jpg
    - /category1/*.jpg
    - ...


We use a pretrained model which is trained by ImageNet data; keras has several pretrained models ( https://github.com/fchollet/deep-learning-models ).

Pretrained models are quite useful to construct powerful models for our problems with minimum waste by using fine tuning.

Outline:

- **1. Prepare dataset**
- **2. Construct a fine tuning model**
- **3. Load the trained model (to restore the model)**
- **4. How can we improve the model?**

## 1. Prepare dataset

We assume you have already downloaded the data from the s3 bucket.

We use keras ImageDataGenerator which is a data generator with various processing methods.

If the data is huge which cannot be loaded onto the memory, the generator is indispensable.

In [None]:
import os
import sys
import json
import glob
import pandas as pd
import numpy as np

from keras.preprocessing.image import ImageDataGenerator

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

In [None]:
DATA_DIR = "/work/data/image/"

In [None]:
# Fixed constants for dataset
SIZE = 224
BATCH_SIZE = 16

# Data dirs {train, validation}
TRAIN_DATA_DIR = os.path.normpath(os.path.join(DATA_DIR, "train"))
VALID_DATA_DIR = os.path.normpath(os.path.join(DATA_DIR, "valid"))

Parameters of ImageDataGenerator: https://keras.io/preprocessing/image/

These paramters enable us to do data augmentation which makes a little different training data by using several geometical transformation.

Data augmentation is a very important method to acquire the generalization performance of a model.

In [None]:
TRAIN_DATAGEN = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        channel_shift_range=0.2,
        rotation_range=15,
        width_shift_range=0.25,
        height_shift_range=0.25,
        horizontal_flip=True,
        vertical_flip=False,
        fill_mode='nearest'
)

In [None]:
TRAIN_GENERATOR = TRAIN_DATAGEN.flow_from_directory(
        directory=TRAIN_DATA_DIR,
        target_size=(SIZE, SIZE),
        class_mode='sparse',
        batch_size=BATCH_SIZE,
        shuffle=True,
        seed=1729
)

In many case, we do NOT apply data augmentation to validation dataset because augmentation creates a little different images from original ones, which leads to non-robust validation estimation.

In [None]:
VALID_DATAGEN = ImageDataGenerator(
        rescale=1./255
)

In [None]:
VALID_GENERATOR = VALID_DATAGEN.flow_from_directory(
        directory=VALID_DATA_DIR,
        target_size=(SIZE, SIZE),
        class_mode='sparse',
        batch_size=BATCH_SIZE,
        shuffle=True,
        seed=1729
)

## 2. Construct a fine tuning model

Using the keras InceptionV3 class which was trained on ImageNet dataset, we can easily create a fine tuning model.

To build a fine tuning model, the top part of the model should be replaced with a new one that matches our problem.

Here we introduce new component.

- GlobalAveragePooling <br>
  This component performs to take the average over (height, width) for each channel. <br>
  Data dimension changes from (batch, height, width, channel) -> (batch, channel) <br>

In [None]:
BASE_MODEL_NAME = "imagenet"
# BASE_MODEL_NAME = "/work/notebooks/trained_models/classifier_image"
TRAINED_MODEL_NAME = "classifier_image"
MODEL_SAVE_PATH = os.path.join("/work/notebooks/trained_models/", TRAINED_MODEL_NAME)

In [None]:
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model, model_from_json
from keras.layers import Dense, GlobalAveragePooling2D
from keras import optimizers

def complile_model(base_model_name, only_top=False):
    '''
    input : 
        base_model_name - 'imagenet' or model_prefix of your trained model
        only_top - if true the model weight except top layers are freezed
    return : 
        compiled model
    '''
    # Load ImageNet trained model as a base model
    base_model = InceptionV3(weights='imagenet', include_top=False)
    
    if base_model_name == 'imagenet':
        x = base_model.output
        x = GlobalAveragePooling2D()(x)
        x = Dense(1024, activation='relu')(x)
        predictions = Dense(TRAIN_GENERATOR.num_class, activation='softmax')(x)
        
        model = Model(inputs=base_model.input, outputs=predictions)
        
    else:
        with open("{0}.json".format(base_model_name), 'r') as f:
            model_json = json.dumps(json.load(f)) # Need to convert json to str
            model = model_from_json(model_json)
        with open("{0}-labels.json".format(base_model_name), 'r') as f:
            category_dict = json.load(f)
            
        model.load_weights("{0}.hdf5".format(base_model_name))
        model = Model(inputs=model.input, outputs=model.output)
    
    # Set layers be trainable
    if only_top:
        for layer in model.layers[:len(base_model.layers)]:
            layer.trainable = False
        for layer in model.layers[len(base_model.layers):]:
            layer.trainable = True
    else:
        for layer in model.layers:
            layer.trainable = True
    
    # Model compile
    optimizer = optimizers.Adam(lr=0.001, decay=0.01)
    model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=["accuracy"])
    
    return model

We can use callbacks for better managing the training process; e.g., saving the best val acc model during training, early stopping to avoid overfitting, and so on.

This is optional so we do not use callbacks. (please try them if you are interested in)

In [None]:
# from keras.callbacks import ModelCheckpoint
# from keras.callbacks import EarlyStopping

# FILEPATH = MODEL_SAVE_PATH + "-{epoch:02d}-{val_acc:.3f}.hdf5"

# CHECKPOINT = ModelCheckpoint(
#     FILEPATH
#     , monitor='val_acc'
#     , verbose=1
#     , save_best_only=False
#     , mode='max'
# )

# EARLYSTOPPING = EarlyStopping(
#     monitor='val_loss'
#     , patience=5
#     , verbose=1
#     , mode='min'
# )

# CALLBACKS_LIST = [CHECKPOINT, EARLYSTOPPING]

Define a function for model training.

Be careful about the difference between fit_generator method and fit method (a little bit confusing).

In [None]:
def train_model(model):
    '''
    input : 
        keras model
    return : 
        trained model & tarin history
    '''
    history = model.fit_generator(
        generator=TRAIN_GENERATOR
        , steps_per_epoch= TRAIN_GENERATOR.n // BATCH_SIZE # This corresponds to use all images once for each epoch
        , epochs=5
        , verbose=1
        , validation_data=VALID_GENERATOR
        , validation_steps=VALID_GENERATOR.n // BATCH_SIZE
    )
    
    model.save_weights('{0}.hdf5'.format(MODEL_SAVE_PATH))
    with open("{0}.json".format(MODEL_SAVE_PATH), 'w') as f:
        json.dump(json.loads(model.to_json()), f) # model.to_json() is a STRING of json
    with open("{0}-labels.json".format(MODEL_SAVE_PATH), 'w') as f:
        json.dump(TRAIN_GENERATOR.class_indices, f)
    
    return model, history

In [None]:
%%time
MODEL = complile_model(BASE_MODEL_NAME, only_top=True)

In [None]:
%%time
model, history = train_model(MODEL)

In [None]:
def plot_history(history):
    # plot of loss function
    plt.figure(figsize=(13,7))
    plt.plot(history.history['loss'],"o-",label="loss",)
    plt.plot(history.history['val_loss'],"o-",label="val_loss")
    plt.title('model loss')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(loc='upper right')
    plt.show()

    # plot of accuracy
    plt.figure(figsize=(13,7))
    plt.plot(history.history['acc'],"o-",label="accuracy")
    plt.plot(history.history['val_acc'],"o-",label="val_acc")
    plt.title('model accuracy')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.legend(loc="lower right")
    plt.show()

plot_history(history)

We can of course draw the model architecture; however, InceptionV3 model is too huge to draw.

In [None]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

SVG( model_to_dot(model, show_layer_names=True, show_shapes=True).create(prog='dot', format='svg') )

In [None]:
model.summary()

### Prediction of trained model.

In [None]:
from scipy.misc import imread
from scipy.misc import imresize
from skimage.color import gray2rgb

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

Since we preprocessed data during training, we have to do the same preprocessing when using the model for prediciton.

In [None]:
def preprocess(img_arr, size=224):
    '''
    input : 
        image as numpy array
    return : 
        preprocessed image numpy array
    '''
    # Convert grayscale img  to colored one
    if len(img_arr.shape) == 2:
        img_arr = gray2rgb(img_arr)

    height, width, chan = img_arr.shape
    
    # Crop the square area whose center is the center of the image
    centery = height // 2
    centerx = width // 2
    radius = min((centerx, centery))
    img_arr = img_arr[centery-radius:centery+radius, centerx-radius:centerx+radius]
    
    # Resize the image to the same shape of the model input
    img_arr = imresize(img_arr, size=(size, size), interp='bilinear')
    
    # Convert to float32
    img_arr = np.array(img_arr, dtype=np.float32)
    
    # Rescale and some modification (the weight of the model is assumed this scale !)
    img_arr /= 255.
    
    return img_arr

Define a class for treating test data.

In [None]:
class TestData(object):
    '''
    Data preparation for prediction test
    '''
    def __init__(self, size):
        '''
        Set image height and width
        '''
        self.size = size
    
    def get_data_paths(self,dirs):
        '''
        Get all of image paths from given dirs (only .jpg so far)
        '''
        file_paths = []
        for elem in glob.glob("{}/*".format(dirs)):
            paths = []
            for ext in ["jpg","jpeg","JPG","JPEG"]:
                paths.extend( glob.glob(os.path.normpath("{}/*.{}").format(elem,ext)) )
            file_paths.extend(paths)
        return file_paths
     
    def chunked(self, iterable, N):
        '''
        Create N chunked lists for given list
        '''
        return [iterable[x:x + N] for x in range(0, len(iterable), N)]
    
    def preprocess_data(self, file_paths, category_dict):
        '''
        Preprocess the images from the set file_paths
            input : 
                file_paths list, category_dict as {'name',num}
            return : 
                preprocessed np arrays of the images
        '''
        test_data = []
        test_labels = []
        test_paths = []

        for file_path in file_paths:
            img = imread(file_path)
            img = preprocess(img, self.size)
            test_data.append(img)

            label = file_path.split('/')[-2]
            test_labels.append(category_dict[label])
            test_paths.append(file_path)
            
        test_data = np.array(test_data).astype(np.float32)
        test_data = test_data.transpose((0, 1, 2, 3))

        return test_data, test_labels, test_paths
    
    def get_N_sample(self, file_paths, N):
        '''
        Randomly pick up N images from the set file_paths
            input : 
                file_paths, N as number of picking images
            return : 
                picked N file paths
        '''
        import random
        index = random.sample(range(len(file_paths)), N)
        samples = [file_paths[i] for i in index]
        return samples

Instansiation of TestData and define the distionary of the target categories.

You should set the category dict as a python dictionary: {'category0' : 0, ...}

In [None]:
testdata = TestData(SIZE)
paths = testdata.get_data_paths("/work/data/image/valid/")
category_dict = {}

Get preprocessed data.

In [None]:
test_data, test_labels, test_paths = testdata.preprocess_data(paths, category_dict)
test_data.shape

Model prediction.

In [None]:
%%time
prediction = model.predict( test_data )

In [None]:
result = pd.DataFrame({
    'prediction' : [np.argmax(elem) for elem in prediction]
    , 'answer' : test_labels
    , 'path' : test_paths
})

Check the result.

In [None]:
result[0:5]

### Draw the confusion matrix.

In [None]:
def draw_cofusion_matrix(result, category_dict):
    '''
    input : prediction result as a DF and category dictionary
    output : plot of confusion matrix
    '''
    #Compute confusion matrix
    conf_arr = confusion_matrix(result['answer'],result['prediction'])
    #Get category names in the order of category values
    sorted_categories = sorted(category_dict.items(), key=lambda x:x[1])
    labels = [ elem[0] for elem in sorted_categories ]
    
    #Compute normalized confusion matrix for coloring
    norm_conf = []
    for i in conf_arr:
        a = 0
        tmp_arr = []
        a = sum(i, 0)
        for j in i:
            tmp_arr.append(float(j)/float(a))
        norm_conf.append(tmp_arr)
    
    #Draw figure
    plt.rcParams["font.size"] = 16
    fig = plt.figure()
    plt.clf()
    fig.set_size_inches(20, 10, forward=True)
    ax = fig.add_subplot(111)
    ax.set_aspect(1)
    res = ax.imshow(np.array(norm_conf), cmap=plt.cm.jet, interpolation='nearest')

    width, height = conf_arr.shape

    plt.xticks(range(len(category_dict)), labels, rotation='vertical')
    plt.yticks(range(len(category_dict)), labels)

    for x in range(width):
        for y in range(height):
            ax.annotate(str(conf_arr[x][y]), xy=(y, x), horizontalalignment='center', verticalalignment='center')

In [None]:
draw_cofusion_matrix(result,  category_dict)

### Check the misclassified images.

The i-th row means the model predictions are i-th categories (images in the 0th row were predicted as gyoza (0th category) ).

In [None]:
plt.figure(figsize = (15,15))
gs = gridspec.GridSpec(TRAIN_GENERATOR.num_class, 5)
gs.update(wspace=0.025, hspace=0.05) # set the spacing between axes. 

for idx,cat in enumerate(category_dict.keys()):
    wrong_answers_list = list( result[ (result['prediction'] == category_dict[cat]) & (result['answer'] != category_dict[cat]) ].index )
    num = min([5,len(wrong_answers_list)])
    for i in range(num):
        ax = plt.subplot(gs[idx,i])
        path = result['path'][wrong_answers_list[i]]
        plt.imshow( imread(path) ) # plot
        plt.axis('off')

plt.show()

## 3. Load the trained model

Here we try to load the trained model. <br>
We can restore the model by using the following files.<br>

- {model_name}.json <br>
  It stores the model structure (network architecture). <br>
- {model_name}-labels.json <br>
  It stores the label information, category names and corresponding indices. <br>
- {model_name}.hdf5 <br>
  It stores weights values. <br>

In [None]:
# from keras import backend as K
# from tensorflow import reset_default_graph

In [None]:
# del model
# reset_default_graph()
# K.clear_session()

Load the trained model.

In [None]:
# base_model_name = "/work/notebooks/trained_models/classifier_image"

# with open("{0}.json".format(base_model_name), 'r') as f:
#     model_json = json.dumps(json.load(f)) # Need to convert json to str
#     model = model_from_json(model_json)
# with open("{0}-labels.json".format(base_model_name), 'r') as f:
#     category_dict = json.load(f)

# model.load_weights("{0}.hdf5".format(base_model_name))
# model = Model(inputs=model.input, outputs=model.output)

Check the category information.

In [None]:
# category_dict

In [None]:
# %%time
# prediction = model.predict( test_data )
# prediction = [ np.argmax(elem) for elem in prediction ]

Consistency check by comparing the predictions.

In [None]:
# sum( prediction == result['prediction'] )

## 4. How can we improve the model?

- Data size is the most important factor
- Data cleansing
- Mofication of the model
- Rethinking problem settings
- ...

**It requires your creativity!!!**

1. Please describe your ideas to improve the model performance
- Please implement your ideas and check the result
- Can you explain what is the inception module?
- Can you explain why we can change the shape of the input though we are using fixed trained weights?
- Can you explain the differences among inception versions (V1 ~ V4)?
- Can you guess why it's difficult to discriminate ramen from pasta?
- Can you have any idea to improve the computational efficiency of the finetuning training?
- Can you convert the trained model of Keras to the model of tensorflow?
- Can you compare the perfomance of the inference of the Keras model with that of the tensorflow (NOT using Keras) one?