### Table of content
* [Data](#chapter1)
    * [Data loading](#section_1_1)
    * [Handling mini-batches](#section_1_2)
* [Pieces of a Neural Network](#chapter2)
    * [The Forward Propagation](#fwd_prop)
    * [The Cost function](#cost_fct)

# NumPy Binary Classifier 

In this notebook, I would like to show you what is the main logic behind the common deep learning frameworks. I found this interesting since I usually tweet the hyperparameters framework's functions without really knowing what is changing into the equations.

So, a matrix point of view gives a better overview of *how a Neural Network works*, and once you're familiar with those notions, you will be able to construct deeper and more complex N.N. with the help of deep learning frameworks, knowing what you are doing. Also, it will help you to tune your model, once it has been trained, to improve it.

The goal of this classifier is to detect whether an image is a Pikachu or a Jigglypuff (english version of Rondoudou).

## Data <a class="anchor" id="chapter1"></a>
First we need to work on the dataset to feed the neural network with the right dimensions, types etc. For the purpuse of the exercice, I didn't chose a large dataset, it contains :
- 98 Pikachu images (label 0)
- 76 Jigglypuff [Rondoudou] images (label 1)

So 174 images that we have to randomize and split into train and test sets. 

### Data loading <a class="anchor" id="section_1_1"></a>
We are going to create a function which returns train/val sets and their correspondant labels. This function should let us decide the size of the validation set.

In [1]:
import numpy as np
import glob
import math
from PIL import Image

In [3]:

def load_data(val_size=0.2):
    """
    Converts images to arrays and returns them randomized through training and 
    validation set.

    Parameters
    ----------
    val_size : float, optional
        Part of validation set. The default is 0.2.

    Returns
    -------
    data_train : np.array of shape (nb_img, HEIGHT, WIDTH, nb_chans)
        Training set.
    label_train : np.array of shape (nb_img, 1)
        Labels of the training set.
    data_val : np.array of shape (nb_img, HEIGHT, WIDTH, nb_chans)
        Validation set.
    label_val : np.array of shape (m, 1)
        Labels of the validation set.
    classes : np.array of shape (2,)
        Classe names : Pikachu / Rondoudou. They are encode in bytes.

    """
    list_pikachu = glob.glob('../data/pikachu/*')
    list_rondoudou = glob.glob('../data/rondoudou/*')
    
    HEIGHT = 100
    WIDTH = 100
    CHANNEL = 3
    
    classes = np.array([b'Pikachu', b'Rondoudou'])
    
    # Initialisations
    size_dataset = len(list_pikachu) + len(list_rondoudou)
    dataset_arr = np.zeros((size_dataset, HEIGHT, WIDTH, CHANNEL))
    label = np.zeros((size_dataset, 1), dtype='int')
    
    # Generating a Pikachu array-type dataset
    for k in range(len(list_pikachu)):
        with Image.open(list_pikachu[k]) as im :
            im = im.resize((HEIGHT, WIDTH), resample=Image.BICUBIC)
            im = im.convert("RGB")
        img_arr = np.array(im)
        dataset_arr[k] = img_arr
        
    # Generating a Rondoudou array type dataset
    i=0
    for k in range(len(list_pikachu), len(dataset_arr)):
        with Image.open(list_rondoudou[i]) as im2 :
            im2 = im2.resize((HEIGHT, WIDTH), resample=Image.BICUBIC)
            im2 = im2.convert("RGB")
        img_arr = np.array(im2)
        dataset_arr[k] = img_arr
        label[k] = 1
        i+=1
    
    # Randomizing
    n_samples = dataset_arr.shape[0]
    n_val = int(val_size * n_samples)
    shuffled_indices = np.random.permutation(n_samples)
    train_indices = shuffled_indices[:-n_val] 
    val_indices = shuffled_indices[-n_val:]

    data_train = dataset_arr[train_indices]
    label_train = label[train_indices]
    
    data_val = dataset_arr[val_indices]
    label_val = label[val_indices]
    
    return data_train, label_train, data_val, label_val, classes

### Handling mini-batches <a class="anchor" id="section_1_2"></a>
Using mini-batches is an optimization technique and permits to let gradient descent makes progress *before* finishing of precessing the *entire* training set. So, we need to create a function capable of splitting a training dataset into mini-batches of size `mini_batch_size` with the corresponding labels for each image in each mini-batch.

In [1]:
def random_mini_batches(X, Y, mini_batch_size = 64, seed = 0):
    """
    Creates a list of random minibatches from (X, Y)
    
    Arguments:
    X -- input data, of shape (input size, number of examples)
    Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
    mini_batch_size -- size of the mini-batches, integer
    
    Returns:
    mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
    """
    
    np.random.seed(seed)            # To make your "random" minibatches the same as ours
    m = X.shape[1]                  # number of training examples
    mini_batches = []
        
    # Step 1: Shuffle (X, Y)
    permutation = list(np.random.permutation(m))
    shuffled_X = X[:, permutation]
    shuffled_Y = Y[:, permutation].reshape((1, m))
    
    inc = mini_batch_size

    # Step 2 - Partition (shuffled_X, shuffled_Y).
    # Cases with a complete mini batch size only i.e each of 64 examples.
    num_complete_minibatches = math.floor(m / mini_batch_size) # number of mini batches of size mini_batch_size in your partitionning
    for k in range(0, num_complete_minibatches):
        mini_batch_X = shuffled_X[:, k * inc : (k+1) * inc]
        mini_batch_Y = shuffled_Y[:, k * inc : (k+1) * inc]
    
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    # For handling the end case (last mini-batch < mini_batch_size i.e less than 64)
    if m % mini_batch_size != 0:
        mini_batch_X = shuffled_X[:, (k+1) * inc : ]
        mini_batch_Y = shuffled_Y[:, (k+1) * inc : ]
        
        mini_batch = (mini_batch_X, mini_batch_Y)
        mini_batches.append(mini_batch)
    
    return mini_batches

## Pieces of a Neural Network <a class="anchor" id="chapter2"></a>
In this section, we'll develop functions that are usefull to construct a neural network of size `L` with any number of units for each layer.

### The Forward Propagation <a class="anchor" id="fwd_prop"></a>
A neural network adjusts each hidden-layer units weights dependantly to the previous layer and go from the **input layer** to the last layer called the **output layer**. This step is called the **forward pass**. Each layer has those following variables of interest :
- $W^{[l]}$ the weights of the $l^{th}$ layer
- $b^{[l]}$ the bias of the $l^{[th]}$ layer
- $Z^{[l]}$ the pre-activation value of the $l^{[th]}$ layer : it results of a linear calculus and depends of $W^{[l]}$, $b^{[l]}$, and $A^{[l-1]}$. Note that $A^{[l-1]}$ is actually the features $X$ for the first layer. 
- $A^{[l]}$ the activation value of the $l^{[th]}$ layer computes from a given activation function.


At layer `l`, each neuron is composed of two parts : a linear part which gives `z`, and an activation part to "activate" the neurone, which gives `a`. Using matrix operations to reprensent all the units for a given layer, we got the equations below :


\begin{equation}
    \begin{cases}
      Z^{[l]} = W^{[l]}A^{[l-1]} + b\\
      A^{[l]} = g(Z^{[l]})
    \end{cases}\,.
\end{equation}

Note :
- We use lower cases to talk about a neuron (or a unit) and capital cases to represent a layer of neurons.
- `g` represents a given activation function for this neuron.


### Initialization
The first layer's weights and bias are initialized randomly to "initiate the learning". Here we'll use the HÃ´ technique to initia....

In [None]:
def activation_function(Z, activation):
    """
    Calculate the activation function

    Parameters
    ----------
    Z : np.array
        pre-activation parameter.
    activation : str
        Name of the activation function
    Returns
    -------
    A : np.array of shape Z.shape
        Post-activation parameter

    """
    if activation == 'sigmoid':
        A = 1 / (1 + np.exp(-Z))
    elif activation == 'relu':
        A = np.maximum(0,Z) 
    return A


def forward_pass(A_prev, W, b, activation):
    """
    Implement the forward pass
    LINEAR->ACTIVATION layer

    Parameters
    ----------
    A : np.array of shape (n[l-1], m)
        Activations from previous layer (or input data).
    W : np.array of shape (n[l], n[l-1])
        Weights matrix.
    b : np.array of size (n[l], 1)
        Bias vector.
    activation : str
        Name of activation function.

    Returns
    -------
    A : np.array
        Post-activation value.
    cache : tuple
        containing "linear_cache" and "activation_cache". Storing variable for 
        computing the backward pass efficiently.

    """
    Z = np.dot(W, A_prev) + b
    A = activation_function(Z, activation)

    linear_cache = (A_prev, W, b)
    activation_cache = Z
    cache = (linear_cache, activation_cache)
    return A, cache

### The Cost function<a class="anchor" id="cost_fct"></a>
The cost function computes how much the predicting result (at the output layer) is far from a given ground truth. The aim of a neural network is to maximise the cost function by tweeting the weights and ..

\begin{equation}
    cost = \frac{-1}{m}\sum_{i=1}^{m}\left(Y\log(\hat{Y}) + (1-Y)\log(1-\hat{Y})\right) \\
    {cost}_{L2reg} = \frac{-1}{m}\sum_{i=1}^{m}\left(Y\log(\hat{Y}) + (1-Y)\log(1-\hat{Y})\right) + \frac{\lambda}{2m} \sum_{l=1}^{L} \left\|W^{[l]}  \right\|_{F}
\end{equation}

In [None]:
def compute_cost(yhat, Y, mini_batch_size=None):
    """
    Compute the cost function, sum of the loss function

    Parameters
    ----------
    yhat : np.array of shape (1, m)
        Probability vector corresponding to the label predictions. It's actually
        the activation at the layer L : AL.
    Y : np.array of shape (1, m)
        True "label" vector.
    mini_batch_size : int, optionnal
        Arg used to trigger the normalization of the cost by 1/m if there is no
        mini_batches. Default is None.

    Returns
    -------
    cost : float
        Cross-entropy cost function with or without dividing by number of 
        training examples

    """
    AL = yhat
    cost = np.sum(-np.log(AL)*Y - np.log(1-AL)*(1-Y))
    
    if mini_batch_size is None:
        m = Y.shape[1]
        cost = (1/m) * cost

    return cost


def compute_cost_L2regularization(yhat, Y, layers_dim, parameters, lambd, mini_batch_size=None):
    """
    Compute the cost function, sum of the loss function, with L2 regularization.

    Parameters
    ----------
    yhat : np.array of shape (1, m)
        Probability vector corresponding to the label predictions. It's actually
        the activation at the layer L : AL.
    Y : np.array of shape (1, m)
        True "label" vector.
    parameters : dict
        Output of initialize_parameters_deep().
    lambd : float
        Regularization factor

    Returns
    -------
    cost : float
        Cross-entropy cost.

    """
    AL = yhat
    m = AL.shape[1]
    cross_entropy_cost = compute_cost(AL, Y, mini_batch_size)
    
    somme = 0
    for l in range(1, len(layers_dim)):
        W = parameters["W"+str(l)]
        somme = somme + np.sum(np.square(W))
    
    L2_regularization_cost = (1/m)*(lambd/2) * somme
    cost = cross_entropy_cost + L2_regularization_cost
    
    return cost