<h1>Cours 6 : Un réseau de neurones (TP 10-11)<span class="tocSkip"></span></h1>

Dans ce chapitre, nous allons explorer l'algorithme "vanilla" du réseau de neurones dans un objectif de classification pas à pas. # Tableau

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"></ul></div>

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer

1) Charger les données de la base des femmes ayant un cancer du sein. Stocker dans X les informations sur la femme, et Y la variable cible (cancer/pas cancer). Trouver aussi le nom de toutes les variables et stocker les dans features.

In [17]:
X=load_breast_cancer()["data"]
y=load_breast_cancer()["target"]

In [18]:
features=load_breast_cancer()["feature_names"]

2) Séparer en 80% - 20% X et y.

In [19]:
from sklearn.model_selection import train_test_split

In [20]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)

In [21]:
X_train.shape

(455, 30)

In [22]:
X_test.shape

(114, 30)

In [23]:
y_train.shape

(455,)

3) Normaliser les données du train et du test.

In [24]:
from sklearn.preprocessing import StandardScaler

In [25]:
ss=StandardScaler()
X_train_stand=ss.fit_transform(X_train)
X_test_stand=ss.fit_transform(X_test)

4) Transposer les données pour qu'il soit de la forme (m,n), m : nombre de variables, n : nombre d'observations.

In [28]:
X_train_fin=X_train_stand.T
X_test_fin=X_test_stand.T

In [29]:
X_train_fin.shape

(30, 455)

5) Créer la fonction d'initialisation des paramètres 

In [30]:
def initialize_parameters_deep(layers,seed):
    '''
    layers: une liste qui contient le nombre de neurones par couche
    '''
    np.random.seed(seed)
    params = {}  
    print("{} Layers parameters init".format(len(layers)))

    for l in range(1, len(layers)):
        params["W" + str(l)] = np.random.randn(layers[l],layers[l-1])
        params["b" + str(l)] = np.zeros((layers[l],1))
    return params  

6) Créer les fonctions sigmoid et relu.

In [35]:
def sigmoid(z):
    '''
    element-wise sigmoid 
    '''
    sig = 1/(1+np.exp(-z))
    return sig
def relu(z):
    '''
    element-wise relu
    '''
    rel = max(0,z)
    return rel

7) Créer les fonctions dérivées de sigmoid et relu

In [36]:
def relu_deriv(z):
    if z<=0:
        return 0
    else:
        return 1
def sigmoid_deriv(z):
    x=sigmoid(z)*(1-sigmoid(z))
    return x

8) Créer une fonction qui retourne la nouvelle matrice A à partir de l'ancienne.

In [40]:
#Forward step
def forward(A, W, b, activation="relu"):
    '''
    Implement one step of forward
    cache: contains A,W,b,Z; stored for computing the backward pass efficiently
    '''
    Z_new = np.dot(W,A)+b
    if activation=="relu":
        A_new = relu(Z_new)
    if activation=="sigmoid":
        A_new = sigmoid(Z_new)
        
    cache = ((A,W,b),Z_new)
    
    return A_new, cache

9) Créer la fonction qui le fait pour toutes les couches jusqu'à l'output

In [41]:
def prop_forward(X, params, activation="relu"):
    '''
    Implement forward propagation for the same activation for all layers, and finalizing with sigmoid
    '''
    caches = []
    A = X
    L = len(params)//2 
    
    for l in range(1, L):
        A_old = A
        A, cache = forward(A_old,params["W"+str(l)],params["b"+str(l)],activation=activation)
        caches.append(cache)
        
    A_pred, cache = forward(A,params["W"+str(L)],params["b"+str(L)],activation="sigmoid")
    caches.append(cache)
    
    return A_pred, caches

10) Créer la fonction de coût

In [None]:
#Cost step
def compute_cost(A_pred,y,params,lambd=0):
    '''
    cross-entropy cost
    '''
    m = y.shape[1]
    cost = -(1/m)*(np.dot(np.log(A_pred),y.T)+np.dot(np.log(1-A_pred),(1-y).T))
    cost=np.squeeze(cost)
    return cost

11) Maintenant dans le sens inverse, à partir de l'output AL, créer une fonction pour déterminer les gradients de W et b pour chaque layer.

In [None]:
#Backward Step
def relu_dZ(dA,z_cache):
    dZ=dA*relu_deriv(z_cache)
    return dZ
def sigmoid_dZ(dA,z_cache):
    dZ=dA*sigmoid_deriv(z_cache)
    return dZ

In [48]:
def sub_back(dZ, cache,d_cache,lambd=0,keep_prob=1):
    """
    Compute DWl,dbl,DA_old based on dZl
    """
    A_old, W, b = cache
    m = A_old.shape[0]

    dW = (1/m)*np.dot(dZ,A_old.T)+(lambd/m)*W
    db = (1/m)*np.sum(dZ,axis=1,keepdims=True)
    dA_old = (np.dot(W.T,dZ)*d_cache)/keep_prob
    
    return dA_old, dW, db

In [49]:
def backward(dA, cache, activation,lambd=0,keep_prob=1):
    """
    compute one-step of backward prop
    """
    awb_cache, z_cache,d_cache = cache
    
    if activation == "relu":
        dZ = relu_dZ(dA,z_cache)
        
    elif activation == "sigmoid":
        dZ = sigmoid_dZ(dA,z_cache)
        
    dA_prev, dW, db = sub_back(dZ,awb_cache,d_cache,lambd,keep_prob)
    
    return dA_prev, dW, db

In [50]:
def prop_backward(AL, Y, caches,activation,lambd=0, keep_prob=1):
    
    grads = {}
    L = len(caches) 
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) 

    dAL = -(np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))    
    current_cache = caches[L-1]
    grads["dA"+str(L)], grads["dW"+str(L)], grads["db"+str(L)] = backward(dAL,current_cache,"sigmoid",lambd,keep_prob)

    for l in reversed(range(L-1)):
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = backward(grads["dA"+str(l+2)],current_cache,activation,lambd,keep_prob)
        grads["dA" + str(l + 1)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp

    return grads

12) Créer une fonction pour rafraichir les paramètres W et b par Gradient descendant.

In [51]:
def update_parameters(parameters, grads, learning_rate):  
    L = len(parameters) // 2 
    for l in range(1,L+1):
        parameters["W"+str(l)]=parameters["W"+str(l)]-learning_rate*grads["dW"+str(l)]
        parameters["b"+str(l)]=parameters["b"+str(l)]-learning_rate*grads["db"+str(l)]        
    return parameters

13) Fitter un réseau de neurones à 3 couches.

In [52]:
def Fitting(X,Y,layers,activation,learning_rate = 0.00175, num_iterations = 3000, print_cost=True,seed=10):
    """
    Implements a L-layer neural network
    """
    costs = []                         # keep track of cost
    
    params = initialize_parameters_deep(layers,seed)
    for i in range(0, num_iterations):
        #forward prop
        AL, caches = prop_forward(X,params,activation)
        cost = compute_cost(AL,Y,params)

        #backward prop
        grads = prop_backward(AL,Y,caches,activation)
 
        # Updating parameters
        params = update_parameters(params,grads,learning_rate)
                
        # Print the cost every 100 training example
        if print_cost and i % 500 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
        costs.append(cost)
    
    return params,costs

In [53]:
lr=0.001
activation="relu"
final_params,cost=Fitting(X_train,y_train,[X_train.shape[0],15,4,1],activation,learning_rate=lr,num_iterations=10000)


4 Layers parameters init


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

14) Créer la fonction Predict, 

In [None]:
def Predict(X_test,params):
    return(y_pred_prob,y_pred_classes)

In [None]:
y_pred_prob,y_pred_classes=Predict(X_dev,final_params)

In [None]:
mse_dev=NN.compute_cost(y_pred_prob,y_dev,final_params)
print("MSE on the dev set {}".format(mse_dev))

In [None]:
acc_dev=ST.accuracy(y_pred_classes,y_dev)
print("ACCURACY on the dev set {}".format(acc_dev))

In [None]:
plt.plot(np.squeeze(cost))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(lr))
plt.show()