# Deep Learning for Coders - Chapter 1 [Draft]
> An even more top-down approach of the chapter

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [Deep Learning for Coders, Jupyter]
- image: images/trained_model.png
- author: Nathaniel D'Amours

This post is highly inspired from *Deep Learning for Coders* {% cite howard2020deep %}.

## Deep Learning is for Everyone

In [81]:
#hide
from fastbook import *
from pathlib import Path

In [82]:
#hide
def save_gv(filename, data, is_graph=False):
    if not is_graph:
        graph_data = gv(data)
    else:
        graph_data = data
    my_graph = graphviz.Source(graph_data)
    my_graph.render(filename, format='png', directory="my_icons/dl_for_coders_01/", view=False)

In [83]:
#hide
data = """
digraph {
    subgraph cluster_ai {
        label="Artificial Intelligence";
        "Your calculator";
        subgraph cluster_ml {
            label="Machine Learning";
            "Linear Regression";
            subgraph cluster_backend {
                label="Neural network";
                "Convolutional neural network";
            }
        }
    }
}
"""
save_gv("ai_ml_nn", data, is_graph=True)

![](my_icons/dl_for_coders_01/ai_ml_nn.png "Fig.0 - AI vs ML vs NN")

## What is Machine Learning ?

### A Machine Learning Program

As we learned earlier, machine learning is the science to write programs that learn. Therefore, machine learning could allow you to recognize dogs and cats without telling the program all the characteristics of each one (which is tricky) since the program can learn these. This learning is possible through this model training loop :

In [84]:
#hide
data = '''ordering=in
          Model[shape=box3d width=1 height=0.7]
          Inputs [shape=none, width=0.5 height=0.7]
          Inputs->Model->Predictions;
          Predictions->Performance
          Performance->Model[constraint=false label=updates]'''
save_gv("training", data)

![](my_icons/dl_for_coders_01/training.png "Fig.1 - The training loop")

Let's break down this figure :
1. The model receives **inputs** which are the data (the images of dogs and cats).
2. The model ouputs **predictions** which looks like : "Dog" or "Cat".
3. The **performance** of the predictions is calculated.
4. The **model** is updated according to the performance in order to improve itself.

### Architecture and Parameters

In [85]:
#hide
data = '''ordering=in
          Architecture[shape=box3d width=1 height=0.7]
          # Inputs [shape=none,label="" width=0.5 height=0.7]
          Inputs->Architecture->Predictions; Parameters->Architecture; Predictions->Performance
          Performance->Parameters[constraint=false label=updates]'''
save_gv("training_arch_param", data)

![](my_icons/dl_for_coders_01/training_arch_param.png "Fig.2 - Split the model into architecture and parameters")

You can notice that we split *Model* into *Architecture* and *Parameters*. The architecture is the functional form of the model and the parameters are some variables that defines how the architecture operates. For example, $y = ax + b$ is an architecture with the parameters $a$ and $b$ that change the behavior of the function.

### Labels and Loss

In [86]:
#hide
data = '''ordering=in
          Model[shape=box3d width=1 height=0.7 label=Architecture]
          Inputs->Model->Predictions; Parameters->Model;Labels->Loss; Predictions->Loss
          Loss->Parameters[constraint=false label=updates]'''
save_gv("training_labels_loss", data)

![](my_icons/dl_for_coders_01/training_labels_loss.png "Fig.3 - Split the performance into labels and loss")

We can see that now the *Performance* is split into *Labels* and *Loss*. The labels are the (ground) truth. For example, if an image is a dog the label of the image is *Dog*. Therefore, the labels and the predictions can be compared to determine the performance of the model. Indeed, if the prediction on an image is *Cat* and the label is *Dog*, you would know that the model did bad. The loss is this measure of performance thats compares the labels and the predictions so that we can updates the parameters to perform better.

### Trained Models

In [87]:
#hide
data = '''Model[shape=box3d width=1 height=0.7]
          Inputs->Model->Predictions'''
save_gv("trained_model", data)

![](my_icons/dl_for_coders_01/trained_model.png "Fig.4 - A trained model")

Once a model is trained. You can treat it as a regular program.

### Regular Programming

In [88]:
#hide
data = '''Program[shape=box3d width=1 height=0.7]
          Inputs->Program->Results'''
save_gv("regular_program", data)

![](my_icons/dl_for_coders_01/regular_program.png "Fig.5 - A regular program")

In [89]:
def add(a, b):
    return a + b

add(2, 3)

5

As you can see this program takes some inputs and outputs results. Indeed, the inputs are $2$ and $3$ and the result, $5$.

## What is Deep Neural Network ?

As we learned earlier deep neural network is a kind of machine learning model and "deep" refers to having move than 1 hidden layer (1 input layer → 1+ hidden layer → 1 output layer). This model can solve any problems according to the [*universal approximation theorem*](https://en.wikipedia.org/wiki/Universal_approximation_theorem) by varying the parameters. Therefore, we need a general "mecanism" to modify these parameters for each problem. This "mecanism" already exists and it is called stochastic gradient descent (SGD).

> Tip: SGD and deep neural networks sounds complex, but they aren't !

## How Our Image Recognizer Works

Let's break down the first lines of code of our image recognizer :

In [90]:
from fastai.vision.all import *

This allows us to use all the tools we will need to code a variety of computer vision models.

In [91]:
PATH = untar_data(URLs.PETS)/'images'
PATH

Path('C:/Users/natha/.fastai/data/oxford-iiit-pet/images')

This line downloads a dataset from fast.ai datasets collection (if not previously downloaded), extracts it (if not previously extracted) and returns a Path object with the extracted location

In [92]:
def is_cat(x):
    return x[0].isupper()

Here we define the function `is_cat` to get the label of an image. Indeed, the function returns `True` if the image contains a cat since the dataset's creators set cats's filenames with an upper case at the beginning.

In [93]:
#hide_output
dls = ImageDataLoaders.from_name_func(path=PATH,
                                      fnames=get_image_files(PATH),
                                      valid_pct=0.2,
                                      seed=42,
                                      label_func=is_cat,
                                      item_tfms=Resize(224))

Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `number_workers` is changed to 0 to avoid getting stuck


Our model needs to know the kind and the structure of the dataset it's working with. Therefore, we created a dataloader. Since we are using images, this is an `ImageDataLoaders`. Also, `from_name_func` is used, because we are using the name of the files to label our images.

Let's explain the parameters :
- `path` : where the data is stored
- `fnames` : an object containing the Path objects of the images' filenames
- `valid_pct` : the percentage of data hold out randomly in the validation set (we will talk later about this)
- `seed` : aims to make your code reproductible by always generating the same validation set
- `label_func` : the function use to get the label of the image
- `item_tfms` : a transformation done to each item (in this case each item is resized to 224-pixel square)

<!-- page 42  -->

### Sidebar: Datasets: Food for Models

In machine learning and deep learning, we can’t do anything without data. So, the people that create datasets for us to train our models on are the (often underappreciated) heroes. Most datasets used in this book took the creators a lot of work to build.

Some of the most useful and important datasets are those that become important *academic baselines*; that is, datasets that are widely studied by researchers and used to compare algorithmic changes, such as MNIST, CIFAR-10, and ImageNet.

The datasets used in this book have been selected because they provide great examples of the kinds of data that you are likely to encounter, and the academic literature has many examples of model results using these datasets to which you can compare your work.

## Sets

In order to evaluate the performance of our models, we need to split our data into sets to prevent "cheating" (overfitting). This cut is based on how fully we want to hide it from the model and ourselves: *training* data is fully exposed, the *validation* data is less exposed, and *test* data is totally hidden.

In [94]:
#hide
data = """
    digraph {
        compound=true;
        subgraph cluster_website {
            label="Dataset";
            "Test set";
            "Validation set";
            "Training set";
        }
    }
"""
save_gv("split_dataset", data, is_graph=True)

![](my_icons/dl_for_coders_01/split_dataset.png "How to split you dataset")

### Validation Set

If we train a model with all our data and evaluate the model using that same data, we would not be able to tell how well our model can perform on data it hasn’t seen since the model already has all the answers in the training set. Indeed, it could be 
overfitting.

To avoid this, we split our dataset into two sets: the *training set* and the *validation set* which is used only for evaluation (and not for the training). This lets us test that the model learns lessons from the training data that generalize to new data, the validation data.

### Test Set

However, we as human can also cheat! Indeed, in realistic scenarios we are likely to explore many versions of a model by choosing various *hyperparameters* (parameters about parameters) : network architecture, learning rates, data augmentation strategies, and other factors we will discuss in upcoming chapters. So, just as the automatic training process is in danger of overfitting the training data, we are in danger of overfitting the validation data through human trial and error and exploration.

The solution is to introduce another level of even more highly reserved data, the *test set*. Just as we hold back the validation data from the training process, we must hold back the test set data even ourselves. It cannot be used to improve the model; it can only be used to evaluate the model at the very end of our efforts.

### Use Judgment in Defining Sets

A key property of the validation and test sets is that they must be **representative of the new data you will see in the future**. Therefore, you shouldn't always choose a random subset of your data.

### Exercices

For the following situations, how should you split the training set and the validation set ?

#### 1.

You are using historical data to build a model to [predict the future sales in a chain of Ecuadorian grocery stores](https://www.kaggle.com/c/favorita-grocery-sales-forecasting) as you can see below.

![](my_icons/dl_for_coders_01/timeseries1.png "A time series")

#### 2.

In the Kaggle [distracted driver competition](https://www.kaggle.com/c/state-farm-distracted-driver-detection), the independent variables are pictures of drivers at the wheel of a car, and the dependant variables are categories such as texting, eating, or safely looking ahead. Lots of pictures are of the same drivers in different positions, as we can see in this figure.

![](my_icons/dl_for_coders_01/driver.png "Two pictures from the training data, showing the same driver")

#### 3. 

You are trying to create an algorithm to distinguish dogs and cats for the [Kaggle Dogs vs. Cats competition](https://www.kaggle.com/c/dogs-vs-cats/).

#### 4.

The goal of the [Kaggle fisheries competition](https://www.kaggle.com/c/the-nature-conservancy-fisheries-monitoring) was to identify the species of fish caught by fishing boats in order to reduce illegal fishing of endangered populations. The test set of Kaggle on which you'll do your predictions consisted of boats that didn't appear in the training data. 

### Solutions

#### 1.

![](my_icons/dl_for_coders_01/timeseries3.png "A good training subset")

> You are using historical data to build a model to predict the future sales in a chain

Therefore, we should take a part of the newest data in our validation set in order to be **representative of the new data you will see in the future**. 

Indeed, a random subset is a poor choice (too easy to fill in the gaps, and not indicative of what you'll need in production), as we can see :

![](my_icons/dl_for_coders_01/timeseries2.png "Random training subset")

#### 2.

> Lots of pictures are of the same drivers in different positions.

The validation data should consists of images of people that don't appear in the training set in order to be **representative of the new data you will see in the future**.

Indeed, if you used all the people in training your model, your model might be overfitting to particularities of those specific people, and not just learning the states (texting, eating, etc.).

#### 3.

Randomly is a good answer (since it will keep a good ratio between classes in the sets).

#### 4.

> The test set consisted of boats that didn't appear in the training data. 

This means that you'd want your validation set to include boats that are not in the training set in order to be **representative of the new data you will see in the future**. 

# Questionnaires

Go to this link and learn the flash cards. #TODO

<!-- Empty space -->

<!-- Empty space -->

{% bibliography --cited %}