# Car Classification and Generation

[Martina Cioffi](https://github.com/martinacioffi) – 3010036

[Edoardo Manieri](https://github.com/edoardomanieri) – 3084469

[Valentina Parietti](https://github.com/ValentinaParietti) – 3007385

[Edoardo Pericoli](https://github.com/Edoardopericoli) –  3001596


## Table of contents
1. [Datasets](#datasets)
    1. [Stanford Dataset](#stanford)
    2. [Our Dataset](#our)

2. [Classification](#classification)
    1. [From Scratch](#scratch)
    2. [Tranfer Learning](#tranferleraning)
        1. [EfficientNet B1](#efficientnetB1)
        2. [EfficientNet B7](#efficientnetB7)
    3. [Object Detection](#objectdetection)
    
3. [Generation](#generation)

    1. [Data Preparation](#datapreparation)
    2. [StyleGAN](#stylegan)
    
4. [Profiling](#profile)

5. [Guess the Car](#game)

6. [References](#refs)

Please, note that the whole code, together with a more detailed explanation on how to run it can be found on GitHub at the following [link](https://github.com/Edoardopericoli/Car_Prediction).

## 1. Datasets <a name="datasets"></a>

Our main idea for this project was to train a model to be able to classify cars starting from pictures of them. Following is a brief explanation of the steps followed both in terms of the collection and building of the datasets and of the model(s) used for classification. Moreover, we also generate some new images of cars starting from our own pictures.

### 1.1. Stanford Dataset <a name="stanford"></a>

Our first trial consisted in trying to predict the make, model and year of a car using images from the [Stanford Dataset](https://ai.stanford.edu/~jkrause/cars/car_dataset.html). This contains slightly more than 16,000 images of which, however, only half are labelled. Therefore, in order to train our initial model, we used those 8,144 images.

The dataset contains 196 classes; below is a graphical representation of the distirbution of the different brands. Note, however, that the graph below does not imply imbalance between the classes: indeed, we predict the car's model and year rather than merely the make. Still, given that cars of the same make are inevitably more similar between each other than cars from different makes, the picture is useful in understanding the difficulty of the task.

<img src="graphic_sources/brands_stanford.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

Please, see the [Classification](#classification) section for a summary of the obtained results

### 1.2. Our Dataset <a name="our"></a>

Given, however, that the Stanford dataset has relatively few images per class, and a very high number of classes, we decided to build a new dataset from scratch, containing (i) cars we were more familiar with (i.e. mostly sold in Europe rather than in the United States), and (ii) more images per class (eventually, we had around 200 images on average per each car's model).

The graph below shows the distribution of brands for our final dataset.

<img src="graphic_sources/brands_our.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

## 2. Classification <a name="classification"></a>

When we first approached the classification problem we did it using the Stanford Dataset that was mentioned in the previous section. It is worth reminding that the dataset contains 196 classes and 8,144 images.

It is also important to explain that we only provide the code for the models that we ran on our dataset because we consider what we did before an experimental phase. Therefore, here below you will find some results related to that experimental phase followed by the explanation and the codes for the actual classication models ran on our own dataset.

### 2.1. From Scratch <a name="scratch"></a>

The first thing we wanted to have ready was a baseline model to eventually enrich with more tuning or with more complex architectures. Therefore, we
created a model from scratch and adjusted some parameters to increase the accuracy. An overview of the last version of the architecture is showed below:

<img src="graphic_sources/baseline_architecture.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

This model run on 196 classes with 60 epochs gave us the following performance:

<img src="graphic_sources/evaluation_baseline.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

We stopped the model after 17 epochs since after that it started to overfit. We obtained an accuracy of __3%__ against an accuracy of __0.05%__ of random guessing among 196 classes.

### 2.2. Transfer Learning <a name="tranferleraning"></a>

We wanted to improve the accuracy, since this layout was not capable of getting deeper patterns. We therefore decided to use __transfer learning__. In particular the first
architecture we decided to apply was the EfficientNets family. We took ImageNet pretrained checkpoints and finetuned on our dataset.
__EfficientNets__, in particular, rely on AutoML and compound scaling to achieve superior performance without compromising resource efficiency. This is a feature that makes them really powerful.

### 2.2.1 EfficientNet B1 <a name="efficientnetB1"></a>

#### On Stanford Dataset

The simpler architecture of this family was the EffNet B1. In particular we decided to use it to improve the model on the Stanford Dataset. After fine tuning it on the last layers, we obtained the following performance:

<img src="graphic_sources/evaluation_effnet1_stanford.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

We can see that this architecture really improved the accuracy. We moved from __3%__ to __60%__. What we can also see is that after __8 epochs__ it tended to overfit.

#### On our dataset

After the first experimental phase we decided to build our dataset with images taken from Google and gathering more images for each model (as explained in the previous section). 

We first used EfficientNet B1 on about 4000 images and 20 classes (and in the graph below you can see the performances) and then we enriched the model. 

These are the performances we obtained on the above model:

<img src="graphic_sources/evaluation_effnet1_newdata.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

We obtained __80%__ of accuracy after __3 epochs__

You can use the following commands to train the B1 model on the full new dataset (10000 images and 40 classes). You can find the complete documentation of the parameters in the README at this [link](https://github.com/Edoardopericoli/Car_Prediction).

In [1]:
#ATTENTION! Running only on server
# !python train_main.py --username='trial' --net='EffnetB1'

### 2.2.2 EfficientNet B7 <a name="efficientnetB7"></a>

Finally we decided to apply to our dataset the last version of EfficientNet, that is the B7.

You can use the following commandas to train the B7 model on the full new dataset (10000 images and 40 classes). You can find the complete documentation of the parameters in the README at this [link](https://github.com/Edoardopericoli/Car_Prediction).

In [12]:
#ATTENTION! Running only on server
# !python train_main.py --username='trial' --net='EffnetB7'

This is the performance we obtained:

<img src="graphic_sources/evaluation_effnet7_newdata.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

From the plot above we can see an accuracy of __90%__ after __4 epochs__. 

### 2.3. Object Detection <a name="objectdetection"></a>

You only look once ([YOLO](https://pjreddie.com/darknet/yolo/)) is a state-of-the-art, real-time object detection system. 

We decided to try and implement it into our pipeline since we thought that it might improve prediction. Our idea was to use YOLO to detect the bounding box of the biggest car in the image and use it to crop the image. By feeding the cropped image to the network we thought we might be eliminating some of the noise created by image backgound, other cars in the image, other objects in the image.

Our first approach has been to try and implement YOLO using [Darknet](https://pjreddie.com/darknet/) but we found iterating over different images difficult and we also had issues trying to extract the coordinates of the bounding boxes. 

As a second approach we decided to use [ImageAI](https://imageai.readthedocs.io/en/latest/detection/index.html) which turned out to be much simpler.

An example of the reasoning that we tried to implement is shown in the following image:

<img src="graphic_sources/yolo.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

If you want to apply object detection to the images and running the EfficientNet B7 on the cropped images, execute the following code: 

In [15]:
#ATTENTION! Running only on server
# !python train_main.py --username='trial_YOLO' --net='EffnetB7' --crop_images=True

## 3. Image Generation <a name="generation"></a>

### 3.1. Data Preparation <a name="datapreparation"></a>

In [1]:
#Import necessary libraries

from PIL import Image
import os

In [4]:
#Read raw images 

files = os.listdir('data/raw_data/StyleGAN/StyleGAN_raw')
files.sort()
files=files[1:]

In [5]:
#Define a function that adds white borders to non square images and rescales them to 256x256

def make_square(im, min_size=256, fill_color=(255, 255, 255, 0)):
    x, y = im.size
    size = max(min_size, x, y)
    new_im = Image.new('RGB', (size, size), fill_color)
    new_im.paste(im, (int((size - x) / 2), int((size - y) / 2)))
    return new_im

In [6]:
#Apply the make_square function to the raw images and create a folder with the final images 

for i in files:
    im = Image.open('data/raw_data/StyleGAN/StyleGAN_raw/'+str(i))
    new_im=make_square(im)
    new_size=(256,256)
    new_im = new_im.resize(new_size)
    new_im.save('data/raw_data/StyleGAN/StyleGAN_final/'+str(i))

In [8]:
#Clone the repository needed to generate cars from our dataset

!git clone https://github.com/ValentinaParietti/stylegan.git #This repository has been forked from the
                                                             #original StyleGAN repository and some changes have
                                                             #been made to it in order to run StyleGAN on our dataset. 
                                                             #More on this later

Cloning into 'stylegan'...
remote: Enumerating objects: 419, done.[K
remote: Total 419 (delta 0), reused 0 (delta 0), pack-reused 419[K
Receiving objects: 100% (419/419), 20.69 MiB | 3.60 MiB/s, done.
Resolving deltas: 100% (245/245), done.


In [9]:
#Run the following command to convert the images into .tfrecords (format required by StyleGAN)

!python stylegan/dataset_tool.py create_from_images stylegan/datasets/custom_datasets data/raw_data/StyleGAN/StyleGAN_final

Loading images from "data/raw_data/StyleGAN/StyleGAN_final"
Creating dataset "stylegan/datasets/custom_datasets"
Added 2108 images.                      


In [10]:
#Zip the newly filled datasets folder

os.chdir('stylegan')
!zip -r datasets_zip datasets

  adding: datasets/ (stored 0%)
  adding: datasets/custom_datasets/ (stored 0%)
  adding: datasets/custom_datasets/custom_datasets-r04.tfrecords (deflated 38%)
  adding: datasets/custom_datasets/custom_datasets-r02.tfrecords (deflated 48%)
  adding: datasets/custom_datasets/custom_datasets-r08.tfrecords (deflated 48%)
  adding: datasets/custom_datasets/custom_datasets-r05.tfrecords (deflated 40%)
  adding: datasets/custom_datasets/custom_datasets-r03.tfrecords (deflated 41%)
  adding: datasets/custom_datasets/custom_datasets-r06.tfrecords (deflated 43%)
  adding: datasets/custom_datasets/custom_datasets-r07.tfrecords (deflated 47%)


### 3.2. StyleGAN <a name="stylegan"></a>

[StyleGAN](https://github.com/NVlabs/stylegan) is an alternative generator architecture for generative adversarial networks created by NVIDIA. It borrows from style transfer literature and it creates the artificial image gradually, starting from a very low resolution and moving to a high resolution. StyleGAN modifies the input of each level separately and this allows control over the features that are expressed in that level, from coarse features (i.e. orientation, shape) to details (i.e. colour), without affecting other levels. Finally, it allows for better understanding of the generated output and produces high-resolution images that look more authentic than previously generated images

In order to train StyleGAN and generate new images a GPU is needed.

Therefore, the rest of the code for our generation task can be found on the Google Colab Notebook at this link: https://colab.research.google.com/drive/1FE9GBqh0qBQ8nUDDIDjWhy5R2sdgiqD0

## 4. Profiling <a name="profile"></a>

<img src="graphic_sources/heatmap.png"
     alt="Markdown Monster icon"
     style="float: left; margin-right: 10px;" />

As you can see from the picture above, the functions _shutil.copy_ and _shutil.rmtree_,which copy the images in train test and validation folders, are really computationally expensive. One possible solution could be to let the model get the data directly from the _cars_train_new_ folder. Neverthless we decided not to make this change, because we wanted to mantain a cleaner separation among the folders.

If you want a complete overview of the profiling of our code you can run the following command (please, pay attention to the cache of your browser if you run all of them sequentially):

`!vprof --input-file profiling/heatmap.json`

`!vprof --input-file profiling/profiler.json`

`!vprof --input-file profiling/memory.json`

## 5. Guess the Car <a name="game"></a>

In order to try and challenge our model, we developed a little game that allows users to try to guess the make and model of a randomly displayed car from our dataset, and reports failure or success, as well as the true car model and the prediction of our net.

You can play by running the following two cells.

In [None]:
!cd guess-make

In [None]:
!python3 app.py -path static

## 6. References <a name="refs"></a>

[**3D Object Representations for Fine-Grained Categorization**](https://ai.stanford.edu/~jkrause/cars/car_dataset.html). Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei.
*4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13).* Sydney, Australia. Dec. 8, 2013.

[**A Style-Based Generator Architecture for Generative Adversarial Networks**](https://arxiv.org/abs/1812.04948). Tero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA)

[**EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks**](https://arxiv.org/abs/1905.11946). Mingxing Tan, Quoc V. Le (Google Research, Brain Team, Mountain View, CA.)

[**YOLOv3: An Incremental Improvement**](https://arxiv.org/abs/1804.02767). Joseph Redmon, Ali Farhadi. 2018. *arXiv:1804.02767*.