# A simple Transfer learning project

In this project, we demonstrate the principle of transfert learning; The repurposing of existing neural networks for new projects.

Deep neural networks can be expensive and time-consuming to train, with costs ranging from millions of dollars to processing times that can span months. However, the primary challenge lies in accessing the necessary data. Large companies, such as Google, often train their neural networks using proprietary data, which is closely guarded and constitutes the cornerstone of their flagship products, like Google's search engine.

With the advent of transfer learning, willing corporations such as Google can share fully trained models without compromising their data. Existing networks can therefore be downloaded by the little guys and be cut, twisted and modified at will, saving tons of time and money!

In this project, we will download EfficientNet B7, a convolutional network which, at the time, was known for its outstanding performance on computer vision tasks. The EfficientNet we donwload today was trained on datasets with thousands of labels, ranging from cats and dogs to cars and buildings.

We will repurpose this instance of EfficientNet to work as a vehicle classification module.

## The core principle
The classical transfer learning method and the one we will use is to cut the classification tail of a network and replace it with another model.

![Replacing the classification tail A with a neural network B.](./image/vgg-transfer-learning.png)

The new appendage only needs a few seconds to a few hours of training to be operational. In this project, we will compare the use of two models, the K-NN and the Convolutional Network.

**Terminology alert!**
After removing the tail of our neural network, we call the last layer the "feature map" (see the above image). It is a bunch of raw neurons, each representing a high-level *feature*, or a concept. For instance, a given neuron's activation can represent the presence of fur-like texture in the image, which in addition with other neurons would indicate the presence of a cat in the picture.

Now that we are done with the theory, **let's get started!**

## Get the data
For all intents and purposes, all you need to know is that we separate our data into training, testing, and validation datasets, with `X` as the input (raw image: red, green, blue), and `y` as the label ('bike', 'car', 'truck', ...). *You may skip the rest of this section.*

For curious readers, we retrive our data from a folder structured as such:

![File structure for the project](./image/file-structure.png)

The data is already segragated into train, test and validation sets, with each set containing a folder for each classe. It is an unusual way to store the data. To retrive our images from those folders we use...

`os.path.join(*args)` to create a plateform agnostic filepath (Linux and Windows):

![Join folder names into a full path](./image/mypath.png)

`os.path.split(string)` to get the label, contained as the parent folder name of our images:

![Split path name to keep the label](./image/split-image.png)

`glob` to seach for all possible files, using the `*` wildcard:
![Finding all files in data\train\bike](./image/wildcard-glob.png)

In [4]:
import numpy as np
from os.path import split, join
from PIL import Image
from glob import glob

In [5]:
def get_data(paths):
    images = [np.array(Image.open(path)) for path in paths]
    labels = [split(split(path)[0])[1] for path in paths]
    return np.array(images), np.array(labels)

In [6]:
X_train, y_train = get_data(glob(join('data', 'train', '*', '*')))
X_test, y_test =   get_data(glob(join('data', 'test', '*', '*'))) 
X_valid, y_valid = get_data(glob(join('data', 'valid', '*', '*'))) 