# Notebook for training the image detection algorithm.

# Start HACK 2024, University of St.Gallen - Team "Last Minute"

In [1]:
top_10_dishes = [
    "pizza",
    "hamburger",
    "fried_rice",
    "ice_cream",
    "french_fries",
    "chocolate_cake",
    "tacos",
    "lobster_roll_sandwich",
    "lasagna",
    "chicken_wings"
]

First, the script imports defaultdict from the collections module, two functions (copy and copytree) from the shutil module, and the os module. defaultdict is a dictionary subclass that calls a factory function to supply missing values. copy and copytree are used for copying files and directories respectively. The os module allows interacting with the operating system.

The <b>`setup_training_data`</b> function is then defined. It takes four arguments - txtfile, source, destination, and food_list. The txtfile is a text file that lists the food images, source is the folder where the images are currently stored, destination is where you want the selected images (based on food_list) to be copied to, and food_list which is the list of specific food types you're interested in.

The function has a defaultdict food_types for storing food types as keys and their corresponding images as values. It reads the txtfile line by line. Each line is assumed to have "/". The part before "/" is the food type, and the part after it is the image file name for that food type. If the food type is in the food_list, it's added to food_types along with its image file name.

Finally, it loops over the keys in food_types (i.e., the types of food), and for each food type, it checks if a directory with the food type’s name exists in the destination folder. If not, it creates such a directory. Then, it copies each image file associated with that food type from the source to the destination folder.

In [3]:
from collections import defaultdict
from shutil import copy, copytree, rmtree
import os


def setup_training_data(txtfile, source, destination, food_list=top_10_dishes):

    food_types = defaultdict(list)
    with open(txtfile, "r") as file:
        lines = [line.strip() for line in file.readlines()]
        for l in lines:
            food_type = l.split("/")
            if food_type[0] in food_list:
                food_types[food_type[0]].append(food_type[1] + ".jpg")
    
    for food in food_types.keys():
        print("  " + food, end="  ")
        if not os.path.exists(os.path.join(destination, food)):
            os.makedirs(os.path.join(destination, food))
        for n in food_types[food]:
            copy(os.path.join(source, food, n), os.path.join(destination, food, n))

Creating the train and test folders for the 10 products.

In [4]:
setup_training_data('train.txt', 'images', 'train')
setup_training_data('test.txt', 'images', 'test')

  chicken_wings    chocolate_cake    french_fries    fried_rice    hamburger    ice_cream    lasagna    lobster_roll_sandwich    pizza    tacos    chicken_wings    chocolate_cake    french_fries    fried_rice    hamburger    ice_cream    lasagna    lobster_roll_sandwich    pizza    tacos  

## PreProcess the data (Keras: ImageDataGenerator)

ImageDataGenerator can be employed for image preprocessing and resizing tasks in preparation for training. Apart from these basic functionalities, ImageDataGenerator is also capable of image augmentation. Image augmentation involves applying transformations like shifting, rotating, or brightness alteration to the existing images for the purpose of enhancing the diversity of the training data. Such augmentations allow the machine learning model to anticipate and adapt to various scenarios during deployment, and enhance its ability to accurately identify a particular food irrespective of its orientation or brightness levels.

In [5]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
data_gen = ImageDataGenerator(rotation_range=30, rescale=1./255, validation_split=0.2)

C:\Users\pacos\anaconda3\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
C:\Users\pacos\anaconda3\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll


In [6]:
image_size = (235, 235)

train_gen = data_gen.flow_from_directory('train', target_size=image_size) # class_mode default categorical
test_gen = data_gen.flow_from_directory('test', target_size=image_size) # default batch_size is 32

Found 7500 images belonging to 10 classes.
Found 2500 images belonging to 10 classes.


In [7]:
print(train_gen.class_indices)

{'chicken_wings': 0, 'chocolate_cake': 1, 'french_fries': 2, 'fried_rice': 3, 'hamburger': 4, 'ice_cream': 5, 'lasagna': 6, 'lobster_roll_sandwich': 7, 'pizza': 8, 'tacos': 9}


# Creating the models

Convolutional Neural Networks often leverage the foundations of a previously trained model for computer vision tasks. That is, they utilize a base with pre-trained weights.

What we usually do is take a model that has already been trained on image data and append this as the base to an untrained model, or what we call the 'head'. The base model retains its weights, which enable it to extract key features from images. We don't have to train the base from scratch. Instead, we simply connect it to new Dense layers in the head, to train the model to categorize these features.

To put it concisely:

<b>Base</b> → This is used to distill features using a convolutional base.

<b>Head</b> → This is used to determine the class of the image using a dense head.

We experimented with various base models, with InceptionV3 providing the best performance.

## Using InceptionV3


First things first, import the InceptionV3 model from Keras. Then set the weights to the pre-trained imagenet weights and set the input shape to the width and height of the image. Finally, we got to make sure that we’re freezing the weights since we don’t want to have to retrain our base.



In [8]:
from tensorflow.keras.applications import InceptionV3
input_shape = (235, 235, 3) # 235, 235 stands for w and h. The 3 stands for the three channels for RGB
inception_model = InceptionV3(weights="imagenet", input_shape=input_shape, include_top=False)
inception_model.trainable = False # freeze weights

## Creating the head

### Build a Dense Head

To substantially minimize the volume of information fed into the Dense layers, a GlobalAveragePooling layer was incorporated. A 30% dropout layer was also added to help avert overfitting. This is done by "dropping out" certain neurons which ensures the model doesn't depend excessively on particular neurons. The last step is to conclude with a Dense layer that consists of units equivalent to the total number of distinct classes.

In [9]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Conv2D, MaxPool2D, GlobalAveragePooling2D

model = Sequential()
model.add(inception_model)
#model.add(Conv2D(32, 2, activation='relu')) # filters=32, kernel_size=(2,2)
#model.add(MaxPool2D(pool_size=(2,2), strides=2))
model.add(GlobalAveragePooling2D())
model.add(Dense(64, activation="relu")) # 64 units
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 inception_v3 (Functional)   (None, 6, 6, 2048)        21802784  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 64)                131136    
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 dense_1 (Dense)             (None, 10)                650       
                                                                 
Total params: 21,934,570
Trainable params: 131,786
Non-trainable params: 21,802,784
______________________________________

### Compiling our model
At this stage, we need to set the optimizer, loss function, and metrics for our model. We'll be utilizing Adam as our optimizer. Adam, which stands for Adaptive Moment Estimation, is a popular variant of the stochastic gradient descent algorithm and is widely considered to be a high-performing general optimizer.

The role of the loss function is to measure the difference between the model's predictions and the true values. In our scenario, since we are dealing with multi-class classification, we would use categorical crossentropy as the loss function. This is contrasted with a regression problem, which would typically use a loss function such as mean absolute error.

In [10]:
from tensorflow.keras.optimizers import Adam
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

### Define Callbacks

Callbacks are specific functions executed during the model's training process. In our specific instance, we use an 'early stopping' callback, which will cease the model's training if it fails to display the minimum specified improvement (designated here as min_delta=0.001) within the given duration (which is 5 epochs in this case).

The 'checkpoint' function serves as another form of callback. It enables us to save snapshot versions or 'checkpoints' of our model at regular intervals throughout training. This can be advantageous for continuing training from a specific point if needed, or for model versioning.

In [11]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
earlystopping = EarlyStopping(min_delta=0.001, patience=5, restore_best_weights=True)
checkpoint = ModelCheckpoint(filepath='food_model.hdf5', verbose=1, save_best_only=True, save_weights_only=True)

### Fit our model to the data
At last, we reach the point where we actually train our model. We fit our model to our training and validation data, and incorporate the callbacks defined earlier in the process.

It's important to remember that due to the presence of the early stopping callback, it's more beneficial to set a larger number of epochs. This allows the model to stop its training when it no longer registers an improvement in accuracy, rather than stopping prematurely due to reaching the predefined number of epochs.

In [None]:
history = model.fit(train_gen, validation_data=test_gen, epochs=30, verbose=1, callbacks=[checkpoint, earlystopping])


## Graph Accuracy and Loss

Convert “history” into a pandas DataFrame so that we can use the built in function to quickly plot the change in accuracy and loss over time.



In [None]:
import pandas as pd

history_df = pd.DataFrame(history.history)
history_df.loc[:,['loss', 'val_loss']].plot()
history_df.loc[:,['accuracy', 'val_accuracy']].plot()

## Save the model

In [None]:
model.save("my_model_101_top_10.h5")