<h1 align='center'>Cars recognition (Stanford cars)</h1>

Patryk Kośmider s16863 i Krzysztof Marek s16663

In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras import layers
from tensorflow.keras import models
import scipy.io as sio
from PIL import Image

Ładujemy przygotowany dataset

In [None]:
WIDTH = 64
HEIGHT = 64

def get_image_as_array(path, x1, y1, x2, y2):
    with Image.open(path) as img:
        if img.mode == 'RGB':
            return np.around(img.resize((WIDTH, HEIGHT), Image.NEAREST, box=(x1, y1, x2, y2))).tolist()
        else:
            return np.around(img.convert('RGB').resize((WIDTH, HEIGHT), Image.NEAREST, box=(x1, y1, x2, y2))).tolist()

def get_data():
    annos = sio.loadmat('cars_annos.mat')

    X_train = []
    X_test = []
    y_train = []
    y_test = []

    for i, record in enumerate(annos['annotations'][0]):
        #if record[-1][0][0] == 0:
        if (i % 5) < 4:
            X_train.append(get_image_as_array(record[0][0], record[1][0], record[2][0], record[3][0], record[4][0]))
            y_train.append(int(record[-2][0][0] - 1))
        else:
            X_test.append(get_image_as_array(record[0][0], record[1][0], record[2][0], record[3][0], record[4][0]))
            y_test.append(int(record[-2][0][0] - 1))

    classes = [ record[0] for record in annos['class_names'][0] ]

    return np.array(X_train), np.array(y_train), np.array(X_test), np.array(y_test), classes

In [None]:
X_train, y_train, X_test, y_test, classes = get_data()

Funkcja do wyświetlania obrazka

In [None]:
def show_image(X, y, index):
    plt.xticks([])
    plt.yticks([])
    plt.imshow(X[index], interpolation = 'nearest')
    plt.xlabel(classes[y[index]])

In [None]:
show_image(X_train, y_train, 0)

Normalizacja danych treningowych. Mimo, że operacje na zmiennych całkowitych są szybsze to jednak mniej dokładne. Lepsze wyniki osiągane są na zmiennych zmiennoprzecinkowych.

In [None]:
X_train = X_train / 255.0
X_test = X_test / 255.0

Tworzymy model `Convolutional neural network`

In [None]:
cnn = models.Sequential([
    layers.Conv2D(filters = 32, kernel_size = (3, 3), activation = 'relu', input_shape = X_train[0].shape),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(filters = 64, kernel_size = (3, 3), activation = 'relu'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(filters = 128, kernel_size = (3, 3), activation = 'relu'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(len(classes) * 10, activation = 'relu'),
    layers.Dense(len(classes), activation = 'softmax')
])

Kompilujemy nasz model. Używamy `sparse_categorical_crossentropy` gdyż wynik końcowy jest indeksem konkretnej klasy, pojedynczą wartością.

In [None]:
cnn.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = [ 'accuracy' ])

Trenujemy nasz model

In [None]:
cnn.fit(X_train, y_train, epochs = 10)

Sprawdzamy jakość treningu

In [None]:
cnn.evaluate(X_test, y_test)

Dokonujemy predykcji

In [None]:
y_pred = cnn.predict(X_test)

Wybieramy najbardziej prawdobodobną klasę obiektu

In [None]:
y_classes = [ np.argmax(element) for element in y_pred ]

In [None]:
i = 11
show_image(X_test, y_test, i)
classes[y_classes[i]]

### Paper

3D Object Representations for Fine-Grained Categorization

Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei

4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.

https://ai.stanford.edu/~jkrause/papers/3drr13.pdf