Celem projektu jest klasyfikacja zdjęć fast foodów. Dane składają się ze zdjęć 10 kategorii: burger, donut, hot dog, pizza, sandwich, baked potato, cripsy chicken, fries, taco, taqutio. 

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import os
import math
from PIL import Image
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from keras.utils import load_img
from keras.utils import img_to_array
from keras.models import load_model
from keras.callbacks import EarlyStopping
from keras.callbacks import LearningRateScheduler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report

In [None]:
base_path = '/kaggle/input/fast-food-classification-dataset/Fast Food Classification V2'

In [None]:
os.listdir(base_path)

In [None]:
train_path = os.path.join(base_path, 'Train')
val_path = os.path.join(base_path, 'Valid')
test_path = os.path.join(base_path, 'Test')

In [None]:
labels = os.listdir(train_path)

In [None]:
labels

Dane zawierają trzy zbiory: treningowy, testowy i walidacyjny, które składają się odpowiednio z 15000, 1500 oraz 3500 zdjęć różnych rozmiarów. Nie wszystkie zdjęcia są zapisane w jednym formacie. Zostały one zatem przefiltrowane tak aby miały ten sam format pliku – „jpeg”. Wszystkie klasy mają bardzo podobną liczbę zdjęć przez co nie występuje problem z różnymi wielkościami klas. 

In [None]:
fig = plt.figure(figsize = (10,5))

for i in range(1, 11):    
    index = np.random.randint(len(train_df))
    
    image_path = train_df['path'][index]
    category = train_df['label'][index]
    
    image = np.asarray(Image.open(image_path))
    plt.subplot(2,5,i)
    plt.imshow(image)
    plt.axis('off')
    plt.title(category)
    
plt.show()

Zdjęcia w zbiorze danych są bardzo różne. Niektóre zdjęcia składają się z kilku dań, inne natomiast zawierają napisy bądź ludzi. Są również takie, na których fast food jest bardzo mały – dużą część obrazka  zajmuje tło.

Do klasyfikacji może posłużyć algorytm KNN. Aby to zrobić trzeba wczytać każde zdjęcie osobno i skonwertować je na listę liczb. Trzeba również zamienić kategorie na odpowiedniki liczbowe. Do tego posłuży LabelEncoder().

In [None]:
def get_images_for_KNN(base_path, categories, n_sample):
    images = []
    labels = []
    
    for category in categories:
        path = os.path.join(base_path, category)
        files = os.listdir(path)
        
        for i in range(n_sample):
            img = os.path.join(path, files[i])
            if img[-4:] == 'jpeg':
                img = load_img(img, target_size = (256, 256))
                img = img_to_array(img)
                images.append(img.flatten())
                labels.append(category)
    
    images = np.vstack(images)
    
    return (np.array(images), np.array(labels))

In [None]:
X_train, Y_train = get_images_for_KNN(train_path, labels, 500)

In [None]:
X_train.shape

In [None]:
X_test, Y_test = get_images_for_KNN(test_path, labels, 100)

In [None]:
label_encoder = LabelEncoder()
labels = label_encoder.fit_transform(Y_train)

In [None]:
Y_test = label_encoder.transform(Y_test)

In [None]:
KNN = KNeighborsClassifier(n_neighbors = 5)

In [None]:
KNN.fit(X_train, labels)

In [None]:
preds = KNN.predict(X_test)

In [None]:
print(classification_report(Y_test, preds, target_names = label_encoder.classes_))

In [None]:
accuracy_score(Y_test, preds)

In [None]:
KNN10 = KNeighborsClassifier(n_neighbors = 10)

In [None]:
KNN10.fit(X_train, labels)

In [None]:
preds = KNN10.predict(X_test)

In [None]:
print(classification_report(Y_test, preds, target_names = label_encoder.classes_))

In [None]:
accuracy_score(Y_test, preds)

In [None]:
KNN20 = KNeighborsClassifier(n_neighbors = 20)

In [None]:
KNN20.fit(X_train, labels)

In [None]:
preds = KNN20.predict(X_test)

In [None]:
print(classification_report(Y_test, preds, target_names = label_encoder.classes_))

In [None]:
accuracy_score(Y_test, preds)

In [None]:
KNN30 = KNeighborsClassifier(n_neighbors = 30)

In [None]:
KNN30.fit(X_train, labels)

In [None]:
preds = KNN30.predict(X_test)

In [None]:
print(classification_report(Y_test, preds, target_names = label_encoder.classes_))

In [None]:
accuracy_score(Y_test, preds)

Jak widać uzyskane dokładności na zbiorze testowym są bardzo małe. Wybór takiego algorytmu do rozwiązania problemu klasyfikacji zdjęć nie jest najlepszym pomysłem. 

In [None]:
def get_images_for_ANN(base_path, categories, n_sample):
    images = []
    labels = []
    
    for category in categories:
        path = os.path.join(base_path, category)
        files = os.listdir(path)
        
        for i in range(n_sample):
            img = os.path.join(path, files[i])
            if img[-4:] == 'jpeg':
                img = load_img(img, target_size = (64, 64))
                img = img_to_array(img)
                images.append(img.flatten())
                labels.append(category)
    
    return (np.array(images), np.array(labels))

In [None]:
categories = os.listdir(train_path)

In [None]:
X_train, Y_train = get_images_for_ANN(train_path, categories, 500)

In [None]:
X_train.shape

In [None]:
label_encoder = LabelEncoder()
Y_train = label_encoder.fit_transform(Y_train)

In [None]:
Y_train

In [None]:
X_val, Y_val = get_images_for_ANN(val_path, categories, 200)

In [None]:
Y_val = label_encoder.transform(Y_val)

In [None]:
X_test, Y_test = get_images_for_ANN(test_path, categories, 100)

In [None]:
Y_test = label_encoder.transform(Y_test)

In [None]:
Y_test

In [None]:
ANN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(12288, input_shape = (12288,)),
    
    tf.keras.layers.Dense(8192, activation = 'relu'),
    #tf.keras.layers.Dense(8192, activation = 'relu'),
    
    tf.keras.layers.Dense(4096, activation = 'relu'),
    #tf.keras.layers.Dense(4096, activation = 'relu'),
    
    tf.keras.layers.Dense(2056, activation = 'relu'),
    tf.keras.layers.Dense(1024, activation = 'relu'),
    tf.keras.layers.Dense(256, activation = 'relu'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])

In [None]:
ANN.summary()

In [None]:
ANN.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(), optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
ANN.fit(X_train, Y_train, batch_size = 5, epochs = 30)

Taka topologia sieci w ogóle nie była w stanie się nauczyć klasyfikować zdjecia. Bardzo możliwe, że model jest po prostu za mały aby rozwiązać tak skomplikowany problem.

In [None]:
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1, patience = 5)

In [None]:
ANN_v2 = tf.keras.models.Sequential([
    tf.keras.layers.Dense(12288, input_shape = (12288,)),
    
    tf.keras.layers.Dense(8192, activation = 'relu'),
    
    tf.keras.layers.Dense(4096, activation = 'relu'),
    tf.keras.layers.Dense(4096, activation = 'relu'),
    
    tf.keras.layers.Dense(2056, activation = 'relu'),
    tf.keras.layers.Dense(2056, activation = 'relu'),
    
    tf.keras.layers.Dense(1024, activation = 'relu'),
    tf.keras.layers.Dense(256, activation = 'relu'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])

In [None]:
ANN_v2.summary()

In [None]:
ANN_v2.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(), optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = ANN_v2.fit(X_train, Y_train, batch_size = 5, validation_data = (X_val, Y_val), epochs = 30, callbacks = [es])

Dodanie kilku warstw nie przyniosło żadnych efektów. Czasami zmiana funkcji aktywacji warstw lub zmiana optimizera pomaga uzyskać lepszy rezultat.

In [None]:
ANN_v3 = tf.keras.models.Sequential([
    tf.keras.layers.Dense(12288, input_shape = (12288,)),
    
    tf.keras.layers.Dense(8192, activation = 'tanh'),
    
    tf.keras.layers.Dense(4096, activation = 'tanh'),
    tf.keras.layers.Dense(4096, activation = 'tanh'),
    
    tf.keras.layers.Dense(2056, activation = 'tanh'),
    tf.keras.layers.Dense(2056, activation = 'tanh'),
    
    tf.keras.layers.Dense(1024, activation = 'tanh'),
    tf.keras.layers.Dense(1024, activation = 'tanh'),
    
    tf.keras.layers.Dense(256, activation = 'tanh'),
    tf.keras.layers.Dense(64, activation = 'tanh'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])

In [None]:
ANN_v3.summary()

In [None]:
ANN_v3.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(), optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = ANN_v3.fit(X_train, Y_train, batch_size = 5, validation_data = (X_val, Y_val), epochs = 30, callbacks = [es])

Jak widać, zmiana funkcji aktywacji również nie pomogła. Powyższe modele były uczone na wartościach pixeli obrazka na przedziale [0; 255]. Można spróbować znormalizować dane skalując wartości do przedziału [0; 1].

In [None]:
X_train_norm = X_train/255
X_val_norm = X_val/255

In [None]:
ANN_NORM = tf.keras.models.Sequential([
    tf.keras.layers.Dense(12288, input_shape = (12288,)),
    
    tf.keras.layers.Dense(8192, activation = 'tanh'),
    
    tf.keras.layers.Dense(4096, activation = 'tanh'),
    tf.keras.layers.Dense(4096, activation = 'tanh'),
    
    tf.keras.layers.Dense(2056, activation = 'tanh'),
    tf.keras.layers.Dense(2056, activation = 'tanh'),
    
    tf.keras.layers.Dense(1024, activation = 'tanh'),
    tf.keras.layers.Dense(1024, activation = 'tanh'),
    
    tf.keras.layers.Dense(256, activation = 'tanh'),
    tf.keras.layers.Dense(64, activation = 'tanh'),
    tf.keras.layers.Dense(10, activation = 'softmax')
])

In [None]:
ANN_NORM.summary()

In [None]:
ANN_NORM.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(), optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = ANN_NORM.fit(X_train_norm, Y_train, batch_size = 5, validation_data = (X_val_norm, Y_val), epochs = 30, callbacks = [es])

Również to nie dało żadnych efektów. Nie jest to nic dziwnego, ponieważ problem jest trudny do rozwiązania. Również stosowanie sztucznych sieci neuronowych nie jest najlepszym pomysłem. Wczytując zdjęcia jako listy 1D z wartościami pixeli tracimy m.in. informacje o lokalizacji obiektu na zdjęciu. Pod względem optymalizacji również nie jest to najlepsze podejście. Wyuczenie tego typu sieci wymaga zastosowania bardzo dużej ilości parametrów. Samo wczytanie zdjęcia 64x64x3 wymaga 12288 neuronów. O wiele lepszym podejściem jest zastosowanie konwolucyjnych sieci neuronowych. Zostały one stworzone do ekstrakcji cech zdjęcia, przez co można je wykorzystać do klasyfikacji zdjęć. Ich główną przewagą nad ANN jest redukcja potrzebnych parametrów. 

In [None]:
file_paths = []
labels = []

categories = os.listdir(train_path)

for category in categories:
    path = os.path.join(train_path, category)
    for file in os.listdir(path):
        if file[-4:] == 'jpeg': 
            file_paths.append(os.path.join(path, file))
            labels.append(category)
        
train_df = pd.DataFrame({'path' : file_paths, 'label' : labels})

In [None]:
train_df

In [None]:
file_paths = []
labels = []

categories = os.listdir(val_path)

for category in categories:
    path = os.path.join(val_path, category)
    for file in os.listdir(path):
        if file[-4:] == 'jpeg': 
            file_paths.append(os.path.join(path, file))
            labels.append(category)
        
val_df = pd.DataFrame({'path' : file_paths, 'label' : labels})

In [None]:
val_df

In [None]:
train_df.groupby(['label']).count().plot.barh(color = 'skyblue', edgecolor = 'black')

In [None]:
val_df.groupby(['label']).count().plot.barh(color = 'skyblue', edgecolor = 'black')

In [None]:
train_generator = ImageDataGenerator().flow_from_dataframe(train_df, 
                                                    x_col = 'path',
                                                    y_col = 'label',
                                                    target_size=(256,256),
                                                    color_mode = 'rgb',
                                                    classes = categories,
                                                    batch_size = 32, 
                                                    class_mode='categorical')

In [None]:
validation_generator = ImageDataGenerator().flow_from_dataframe(val_df,
                                                        x_col = 'path',
                                                        y_col = 'label',
                                                        target_size=(256,256),
                                                        color_mode = 'rgb',
                                                        classes = categories,
                                                        batch_size = 32, 
                                                        class_mode='categorical')

In [None]:
CNN = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN.summary()

Topologia sieci została dobrana w taki sposób aby uwzględniała dużą powierzchnię tła na obrazkach, dlatego właśnie pierwsza warstwa konwolucyjna ma duży rozmiar filtrów. Pozbywamy się w ten sposób niepotrzebnych informacji. Kolejne warstwy mają coraz mniejsze rozmiary filtrów aby wydobyć jak najwięcej szczegółów na obrazku. Wzrosła przez to również ilość samych filtrów. Pomiędzy kolejnymi warstwami konwolucyjnymi są użyte warstwy MaxPooling, tak aby redukować rozmiary obrazków. Po ekstrakcji cech, gdy rozmiar obrazka jest już bardzo mocno zredukowany, wektor cech przechodzi przez warstwę wypłaszczającą po to aby mógł zostać przepuszczony przez warstwę fully-connected w celu klasyfikacji. 

In [None]:
CNN.compile(loss = tf.keras.losses.CategoricalCrossentropy(), optimizer = tf.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = CNN.fit(train_generator, validation_data = validation_generator, epochs = 50)

In [None]:
CNN.save('fastfood_cnn.h5')

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
def get_images_to_predict(base_path, categories, rescale = False):
    images = []
    labels = []
    
    for category in categories:
        path = os.path.join(base_path, category)
        
        for img in os.listdir(path):
            img = os.path.join(path, img)
            img = load_img(img, target_size = (256, 256))
            img = img_to_array(img)
            img = np.expand_dims(img, axis=0)
            if rescale == True:
                img = img/255
            images.append(img)
            labels.append(category)
    
    images = np.vstack(images)
    
    return (images, labels)

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories)

In [None]:
CNN = load_model('fastfood_cnn.h5')

In [None]:
preds = CNN.predict(X_test)

In [None]:
def probabilities_to_labels(preds):
    labels = ['Donut','Sandwich','Hot Dog','Burger','Crispy Chicken','Fries','Baked Potato','Taco','Pizza','Taquito']
    predictions = []

    for probabilities in preds:
        index = np.argmax(probabilities)
        predictions.append(labels[index])
        
    return predictions

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(y_true = Y_test, y_pred = predictions)

Uzyskany wynik nie jest zadowalający, natomiast jest lepszy od wyników KNN i ANN. Sieć bardzo szybko się przeuczyła.

In [None]:
train_datagen_norm = ImageDataGenerator(rescale = 1./255)

train_generator_norm = train_datagen_norm.flow_from_dataframe(train_df, 
                                                    x_col = 'path',
                                                    y_col = 'label',
                                                    target_size=(256,256),
                                                    color_mode = 'rgb',
                                                    batch_size = 32,
                                                    classes = categories,
                                                    class_mode='categorical')

In [None]:
val_datagen_norm = ImageDataGenerator(rescale = 1./255)

validation_generator_norm = val_datagen_norm.flow_from_dataframe(val_df, 
                                                        x_col = 'path',
                                                        y_col = 'label',
                                                        target_size=(256,256),
                                                        color_mode = 'rgb',
                                                        batch_size = 32,
                                                        classes = categories,
                                                        class_mode='categorical')

In [None]:
CNN_Norm = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Norm.summary()

In [None]:
CNN_Norm.compile(loss = 'CategoricalCrossentropy', optimizer = tf.optimizers.Adam(), metrics = 'accuracy')

In [None]:
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1, patience = 5)

In [None]:
history = CNN_Norm.fit(train_generator_norm, validation_data = validation_generator_norm, epochs = 50, callbacks = [es])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
CNN_Norm.save('fastfood_cnn_norm.h5')

In [None]:
CNN_Norm = load_model('fastfood_cnn_norm.h5')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories, rescale = True)

In [None]:
preds = CNN_Norm.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(y_true = Y_test, y_pred = predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

Zastosowanie normalizacji tylko pogorszyło wynik. W celu poprawienia dokładności sieci i pozbycia się overfittingu można zastosować kilka technik. Kolejne modele będę opierać się o warstwy Dropout oraz BatchNormalization.

In [None]:
CNN_Dropout = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Dropout.summary()

In [None]:
CNN_Dropout.compile(loss = 'CategoricalCrossentropy', optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = CNN_Dropout.fit(train_generator, validation_data = validation_generator, epochs = 50, callbacks = [es])

In [None]:
CNN_Dropout.save('fastfood_cnn_dropout.h5')

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories, rescale = False)

In [None]:
preds = CNN_Dropout.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(y_true = Y_test, y_pred = predictions)

Dodanie warstw Dropout pomogło uzyskać kilka punktów procentowych dokładności więcej.

In [None]:
CNN_Batchnorm = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Batchnorm.summary()

In [None]:
CNN_Batchnorm.compile(loss = 'CategoricalCrossentropy', optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = CNN_Batchnorm.fit(train_generator, validation_data = validation_generator, epochs = 50, callbacks = [es])

In [None]:
CNN_Batchnorm.save('fastfood_cnn_batchnorm.h5')

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories, rescale = False)

In [None]:
preds = CNN_Batchnorm.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(Y_test, predictions)

Zastosowanie warstw BatchNormalization dało jeszcze lepszy rezultat. Dokładność modelu wzrosła do 53%. Mimo to dalej występuje problem z nadmiernym dopasowanie. Aby temu zapobiec można zastosować augmentację danych. Polega ona na zastosowaniu różnych przekształceń na wejściowych obrazkach. W skład przekształceń wchodzą m.in.: rotacja, odwrócenie w pionie lub poziomie lub przesunięcie. Zastosowanie augmentacji spowoduje, że model podczas uczenia nie zobaczy tych samych obrazków przez co szansa na przeuczenie maleje.

In [None]:
train_datagen_aug = ImageDataGenerator(rotation_range = 30,
                                       width_shift_range = 0.1,
                                       height_shift_range = 0.1,
                                       fill_mode = 'nearest',
                                       zoom_range = 0.1,
                                       horizontal_flip = True,
                                       vertical_flip = True)

train_generator_aug = train_datagen_aug.flow_from_dataframe(train_df,
                                                    x_col = 'path',
                                                    y_col = 'label',
                                                    target_size=(256,256),
                                                    color_mode = 'rgb',
                                                    classes = categories,
                                                    batch_size = 32, 
                                                    class_mode='categorical')

In [None]:
image = np.expand_dims(image, axis = 0)

In [None]:
image.shape

In [None]:
i = 0
for batch in train_datagen_aug.flow(image, batch_size = 1):
    plt.figure(i)
    imgplot = plt.imshow(tf.keras.utils.array_to_img(batch[0]))
    plt.axis('off')
    i += 1
    if i % 4 == 0:
        break
        
plt.show()

In [None]:
CNN_Batchnorm_Aug = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Batchnorm_Aug.summary()

In [None]:
CNN_Batchnorm_Aug.compile(loss = 'CategoricalCrossentropy', optimizer = tf.optimizers.Adam(), metrics = 'accuracy')

In [None]:
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1, patience = 8)

In [None]:
history = CNN_Batchnorm_Aug.fit(train_generator_aug, validation_data = validation_generator, epochs = 80, callbacks = [es])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
CNN_Batchnorm_Aug.save('fastfood_cnn_batchn_aug.h5')

In [None]:
CNN_Batchnorm_Aug = load_model('fastfood_cnn_batchn_aug.h5')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories)

In [None]:
preds = CNN_Batchnorm_Aug.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(y_true = Y_test, y_pred = predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(Y_test, predictions)

Powyższe wykresy dalej wskazują na overfitting modelu. Mimo wszystko dokładność predykcji na zbiorze testowym bardzo wzrosła. Wyniosła ona 66%. Niektóre fast foody są bardzo dobrze klasyfikowane. Główny problem stanowią pizza, taquito oraz hot dog.

In [None]:
CNN_Dropout_Aug = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 64, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    #tf.keras.layers.Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Dropout_Aug.summary()

In [None]:
CNN_Dropout_Aug.compile(loss = 'CategoricalCrossentropy', optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = CNN_Dropout_Aug.fit(train_generator_aug, validation_data = validation_generator, epochs = 80, callbacks = [es])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
CNN_Dropout_Aug.save('fastfood_cnn_drop_aug.h5')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, labels)

In [None]:
preds = CNN_Dropout_Aug.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(Y_test, predictions)

W tym przypadku augmentacja danych nie pomogła uzyskać lepszch efektów.

Aby ulepszyć model można spróbować dobrać odpowiednie parametry. Jednym z nich jest współczynnik uczenia. Do tego posłuży LearningRateScheduler. Współczynnik uczenia będzie się zmieniał w zależności od danej epoki. Przez kilka epok sieć będzie się uczyć na konkretnym współczynniku, po czym zostanie on odpowiednio zmieniony na kilka kolejnych epok.

In [None]:
def step_decay(epoch):
    initial_lrate = 0.1
    drop = 0.5
    epochs_drop = 5.0
    lrate = initial_lrate * math.pow(drop, math.floor((1+epoch)/epochs_drop))
    return lrate

In [None]:
epochs = np.arange(1,50)
learning_rate = list(map(lambda epoch: step_decay(epoch), epochs))

In [None]:
plt.plot(epochs, learning_rate, 'o')

In [None]:
np.max(learning_rate), np.min(learning_rate)

In [None]:
lr = LearningRateScheduler(step_decay)

In [None]:
history = CNN_Batchnorm_Aug.fit(train_generator_aug, validation_data = validation_generator, epochs = 50, callbacks = [lr])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
l_r = history.history['lr']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc, 'blue')
plt.plot(epochs, val_acc, 'orange')
plt.plot(epochs, l_r, 'green')

In [None]:
plt.plot(epochs, loss, 'blue')
plt.plot(epochs, val_loss, 'orange')
plt.plot(epochs, l_r, 'green')

Jak widać żaden z współczynników nie wyróżnia się zmniejszeniem wartości funkcji straty ani wzrostem dokładności. Może to wynikać z charakterystyki zastosowanego optimizera. ADAM jest optimizerem adaptywnym. Każdy parametr jest akutalizowany przez indywidualny learning rate, a ten określony podczas inicjacji modelu jest maksymalnym do aktualizacji, zatem stosowanie LearningRateSchedulera nie zawsze może przynieść efekty. Przy stosowaniu tego optimizera głównie określa się inne parametry takie jak: beta_1 i beta_2, które odpowiadają za estymacje pierwszego i drugiego momentu.

In [None]:
CNN_Batchnorm_Aug_v2 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 96, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 96, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Batchnorm_Aug_v2.summary()

In [None]:
CNN_Batchnorm_Aug_v2.compile(loss = 'CategoricalCrossentropy', optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
es = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose = 1, patience = 8)

In [None]:
history = CNN_Batchnorm_Aug_v2.fit(train_generator_aug, validation_data = validation_generator, epochs = 100, callbacks = [es])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
CNN_Batchnorm_Aug_v2.save('fastfood_cnn_batchn_aug_v2.h5')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories)

In [None]:
preds = CNN_Batchnorm_Aug_v2.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(Y_test, predictions)

In [None]:
train_datagen_aug_norm = ImageDataGenerator(rescale = 1./255,
                                       rotation_range = 30,
                                       width_shift_range = 0.1,
                                       height_shift_range = 0.1,
                                       fill_mode = 'nearest',
                                       zoom_range = 0.1,
                                       horizontal_flip = True,
                                       vertical_flip = True)

train_generator_aug_norm = train_datagen_aug_norm.flow_from_dataframe(train_df,
                                                    x_col = 'path',
                                                    y_col = 'label',
                                                    target_size=(256,256),
                                                    color_mode = 'rgb',
                                                    classes = categories,
                                                    batch_size = 32, 
                                                    class_mode='categorical')

In [None]:
CNN_Batchnorm_Aug_v3 = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters = 32, kernel_size = (11,11), activation = 'relu', input_shape = (256, 256, 3)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 96, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 96, kernel_size = (5,5), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.Conv2D(filters = 256, kernel_size = (3,3), activation = 'relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPooling2D((2,2)),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units = 1024, activation = 'relu'),
    tf.keras.layers.Dense(units = 256, activation = 'relu'),
    tf.keras.layers.Dense(units = 10, activation = 'softmax')
])

In [None]:
CNN_Batchnorm_Aug_v3.compile(loss = 'CategoricalCrossentropy', optimizer = tf.keras.optimizers.Adam(), metrics = 'accuracy')

In [None]:
history = CNN_Batchnorm_Aug_v3.fit(train_generator_aug_norm, validation_data = validation_generator_norm, epochs = 100, callbacks = [es])

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

In [None]:
plt.plot(epochs, acc)
plt.plot(epochs, val_acc)
plt.title('Accuracy vs Validation accuracy')

In [None]:
plt.plot(epochs, loss)
plt.plot(epochs, val_loss)
plt.title('Loss vs Validation loss')

In [None]:
CNN_Batchnorm_Aug_v3.save('fastfood_cnn_batchn_aug_v3.h5')

In [None]:
X_test, Y_test = get_images_to_predict(test_path, categories, rescale = True)

In [None]:
preds = CNN_Batchnorm_Aug_v3.predict(X_test)

In [None]:
predictions = probabilities_to_labels(preds)

In [None]:
cm = confusion_matrix(Y_test, predictions)

In [None]:
sns.heatmap(cm, cmap='Blues', xticklabels = categories, yticklabels = categories, annot = True, cbar = False, fmt=".1f")

In [None]:
accuracy_score(Y_test, predictions)

Zastosowanie większej ilości filtrów jak i normalizacja wartości pixeli nie przyniosła znacznie lepszych efektów.

Najlepszym klasyfikatorem okazała się sieć konwolucyjna z warstwami normalizacji wsadowej z dodatkowo zastosowaną augmentacją danych. Dokładność na zbiorze testowym wyniosła ok. 66%. Bardzo dużym problemem jest overfitting, którego finalnie nie udało się pozbyć mimo zastosowania kilku technik walki z nadmiernym dopasowaniem.

- https://keras.io/api/
- https://www.tensorflow.org/api_docs/python/tf/all_symbols
- https://scikit-learn.org/stable/
- https://numpy.org/doc/stable/reference/index.html#reference
- https://pandas.pydata.org/docs/
- Francois Chollet 'Deep Learning. Praca z językiem Python i biblioteką Keras'
- https://gist.github.com/ritiek/5fa903f97eb6487794077cf3a10f4d3e
- https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/
- https://machinelearningmastery.com/using-learning-rate-schedules-deep-learning-models-python-keras/
- https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
- https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
- https://neptune.ai/blog/how-to-choose-a-learning-rate-scheduler
- https://optimization.cbe.cornell.edu/index.php?title=Adam
- https://stackoverflow.com/questions/39517431/should-we-do-learning-rate-decay-for-adam-optimizer
- https://pyimagesearch.com/2016/08/08/k-nn-classifier-for-image-classification/
