To download the images of the dataset, you need to download them from our google drive because stanford stopped hosting them (we used to do `! wget http://ai.stanford.edu/~jkrause/car196/car_ims.tgz`).

[the link](https://drive.google.com/file/d/1NuxNKiw7MXEdXqVVBtBsDv38QNAh2QHT/view?usp=share_link)

Then, put it in the same spot as the notebook. To extract it and download the csv's, you can run the following command:

In [None]:
! tar xzf car_ims.tgz

! wget "https://storage.googleapis.com/monk-public/exercice/car_classification/03_05_2022_update/test_dataset.csv" -O test_dataset.csv -nv
! wget "https://storage.googleapis.com/monk-public/exercice/car_classification/03_05_2022_update/train_dataset.csv" -O train_dataset.csv -nv
! wget "https://storage.googleapis.com/monk-public/exercice/car_classification/03_05_2022_update/validation_dataset.csv" -O validation_dataset.csv -nv
! wget "https://storage.googleapis.com/monk-public/exercice/car_classification/03_05_2022_update/class_names_dict.pkl" -O class_names_dict.pkl -nv

The annotations are split into train/validation/test sets and can be found in the exercise ressources under:
- train_dataset.csv
- validation_dataset.csv
- test_dataset.csv

Annotations are saved as csv files. You can load them using the 'read_csv' method of the pandas library.

Class names are stored under the file : 'class_names_dict.pkl' you can load it using the standard pickle library.

There are 196 car models in the dataset such as:

```
[
  'Rolls-Royce Ghost Sedan 2012',
  'BMW X6 SUV 2012',
  'Jeep Liberty SUV 2012',
  ...,
]
```

Each image is associated with a class id and a bounding box (x1,x2,y1,y2)

We will then proceed in three parts:

*   Classification using Transfer learning: load a pretrained model and finetune on the training data. We do so because our time is limited and we want to leverage the power of some very accurate models already existing and open source.

*  Object detection: use the YOLO model for the car detection and the bounding box (as for the previous point, this is a well known and very performant model)

*   Ensemble Modelling: Merge both to one algorithm that, given two images will draw both bouding box and tells if it is the same car or not




### Imports

In [None]:
import sys
!{sys.executable} -m pip install torchvision
!{sys.executable} -m pip install ultralytics

In [None]:
import os
from google.colab import drive

import cv2
import pickle
from glob import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

from ultralytics import YOLO

from tensorflow.keras.models import load_model, Sequential, Model
from tensorflow.keras.layers import Input, Lambda, Dense, Flatten
from tensorflow.keras.applications.vgg16 import preprocess_input, VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img

I uploaded car_ims on drive because the downloading on colab was not working well (ignore the two following code boxes if download works)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import os
os.chdir('/content/drive/MyDrive/Colab Notebooks')

### Data Preparation

Creating two dataframes with adapted values for class columns

In [None]:
# Load datasets
train_df = pd.read_csv('train_dataset.csv')
test_df = pd.read_csv('test_dataset.csv')

#The class in training and test were not really consistent, with negative values. changing them ranging from 0 to 196
train_df['class'] = train_df['class'].apply(lambda x: x - 1 if x > 0 else x + 255)
train_df['class'] = train_df['class'].astype(str)
test_df['class'] = test_df['class'].apply(lambda x: x - 1 if x > 0 else x + 255)
test_df['class'] = test_df['class'].astype(str)

# Save updated datasets
train_df.to_csv('updated_train.csv', index=False)
test_df.to_csv('updated_test.csv', index=False)

In order to improve the robustness and generalization ability of the model, we employ data augmentation techniques on the training dataset. This helps the model learn to better handle variations and increases its ability to accurately classify unseen examples.

We apply preprocessing to both the training and test datasets for consistency and optimal model performance.


In [None]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

#### Load the data

In [None]:
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,
    directory=None,
    x_col='relative_im_path',
    y_col='class',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

In [None]:
test_generator = test_datagen.flow_from_dataframe(
    dataframe=test_df,
    directory=None,
    x_col='relative_im_path',
    y_col='class',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

### Classification

In this section, we leverage the power of a pretrained VGG16 model for our task. VGG16 is a convolutional neural network architecture that has been pre-trained on the ImageNet dataset, making it capable of recognizing a wide range of visual concepts.

#### Loading the Pretrained Model

We first import VGG16 model with pre-trained weights:

In [None]:
#Model config
IMAGE_SIZE = [224, 224]
vgg = VGG16(input_shape=IMAGE_SIZE + [3], weights='imagenet', include_top=False)

#### Transfer learning

In [None]:
#Transfer Learning

#freeze the base layers:
for layer in vgg.layers:
    layer.trainable = False

# Add new layers on top of the pretrained base
x = Flatten()(vgg.output)

prediction = Dense(196, activation='softmax')(x)

model = Model(inputs=vgg.input, outputs=prediction)

VGG16 consists of a series of convolutional and max-pooling layers followed by fully connected layers. We added a dense layer at the end in order to adapt classification to our problem.

Summary of the model

In [None]:
model.summary()


For the training, we use CrossEntropy loss that allows to handle multi-class classification tasks and the Adam optimizer ensuring efficient convergence during training by adatpting the learning rate.

Accuracy is the preferred metric for VGG16 classification models because it gives a quick and clear measure of how well the model correctly identifies images.

In [None]:
model.compile(
  loss='categorical_crossentropy',
  optimizer='adam',
  metrics=['accuracy']
)

Training the model.

In [None]:
r = model.fit_generator(
  train_generator,
  validation_data=test_generator,
  epochs=30,
  steps_per_epoch=len(train_generator),
  validation_steps=len(test_generator)
)

#### Results Analysis

Visualizing essential metrics, like accuracy and loss, to gain insights into the model's performance and identify areas for improvement.

In [None]:
# plot the loss
plt.plot(r.history['loss'], label='train loss')
plt.plot(r.history['val_loss'], label='val loss')
plt.legend()
plt.show()
plt.savefig('LossVal_loss')

# plot the accuracy
plt.plot(r.history['accuracy'], label='train acc')
plt.plot(r.history['val_accuracy'], label='val acc')
plt.legend()
plt.show()
plt.savefig('AccVal_acc')

Saving the model

In [None]:
model.save('model_vgg.h5')

### Object Detection

The YOLO package is a state-of-the-art model for object detection in images, providing localization and classification of objects with high efficiency and accuracy. We use this model for box identification as for this part, we do not look for specific models but we rather want to identify a car.


#### Loading the Pretrained model

In [None]:
#Model Loading
model1 = YOLO('yolov8n.pt')

#### Box Selection

This model creates a box for every object it detects so I will filter by selecting the biggest box by size.

In [None]:
#calculate the area for a box
def calculate_box_area(box):
    return (box[2] - box[0]) * (box[3] - box[1])

#choosing only the biggest box given a list of boxes by size
def filter_largest_box(boxes):
    if not boxes:
        return None

    biggest_box = boxes[0]
    max_area = calculate_box_area(biggest_box)

    for box in boxes[1:]:
        area = calculate_box_area(box)
        if area > max_area:
            max_area = area
            biggest_box = box

    return biggest_box

In [None]:
#drawing the box on the jpg
def draw_bounding_box(image_path, box, title):
    image = plt.imread(image_path)

    fig, ax = plt.subplots(1)
    ax.imshow(image)

    rect = patches.Rectangle(
        (box[0], box[1]),
        box[2] - box[0],
        box[3] - box[1],
        linewidth=2,
        edgecolor='r',
        facecolor='none'
    )


    ax.add_patch(rect)

    plt.title(title)
    plt.show()

Loading the dictionnary for the cars models name

In [None]:
with open('class_names_dict.pkl', 'rb') as file:
    models = pickle.load(file)

print(list(models.keys())[list(models.values()).index(58)])

#### Detection example

Exemple

In [None]:
img = train_df['relative_im_path'][100]
results = model1(img, show = False)
boxes = results[0].boxes.xyxy.tolist()
largest_box = filter_largest_box(boxes)
print(boxes)
print(largest_box)

draw_bounding_box(img, largest_box,f"{list(models.keys())[list(models.values()).index(train_df['class'][100])]}")

The bounding box is correctly delimiting the car given a jpg file, we will merge both parts so that it detects the car model.

In [None]:
def main(img1, img2):

  img1=image.load_img('img1',target_size=(224,224))
  img2=image.load_img('img2',target_size=(224,224))
  img1=image.img_to_array(img1)/255
  img2=image.img_to_array(img2)/255

  img1=np.expand_dims(img1,axis=0)
  img1_data=preprocess_input(img1)
  img1_data.shape
  img2=np.expand_dims(img2,axis=0)
  img2_data=preprocess_input(img2)
  img2_data.shape


  if np.argmax(model.predict(img1_data), axis=1)==np.argmax(model.predict(img2_data), axis=1):
    print(f'The cars are the same model: {list(models.keys())[list(models.values()).index(np.argmax(model.predict(img1_data), axis=1)+1)]}') #We ranges the labels from 0 to 195 but it is from 1 to 196 in the dictionnary
  else:
    print('The cars are different models')

  results1 = model1(img1, show = False)
  boxes = results[0].boxes.xyxy.tolist()
  largest_box = filter_largest_box(boxes)
  draw_bounding_box(img, largest_box, f'First car model :{list(models.keys())[list(models.values()).index(np.argmax(model.predict(img1_data), axis=1)+1)]}')
  print("Coordinates of the box of the first car:")
  print(f"x1: {largest_box[0]}, y1: {largest_box[1]}")
  print(f"x2: {largest_box[2]}, y2: {largest_box[3]}")

  results = model1(img1, show = False)
  boxes = results[0].boxes.xyxy.tolist()
  largest_box = filter_largest_box(boxes)
  draw_bounding_box(img, largest_box, f'Second car model :{list(models.keys())[list(models.values()).index(np.argmax(model.predict(img2_data), axis=1)+1)]}')
  print("Coordinates of the box of the second car:")
  print(f"x1: {largest_box[0]}, y1: {largest_box[1]}")
  print(f"x2: {largest_box[2]}, y2: {largest_box[3]}")

# **Conclusion**

---

Please provide a critical analysis of your approach and potential next steps you can think of.


I did not train the model properly due to GPU limitation on colab, I wish I could train it on more epochs.
Moreover, the datasets provided do not have as much data as I would have wanted, at first I wanted to use Visual transformers but it is really data hungry so it did not give results good enough.

For the bounding box solution I provided, I think that just choosing the biggest box is not 100% fool proof. I did not have a lot of time left, it was the only idea I had at the moment but I am sure there should be a better solution. Same thing for the part B in exerice 1, I did it last and did not have enough time for the last question.