## MISA (2024-2025)
- Alohan'ny mamerina dia avereno atao Run ny notebook iray manontolo. Ny fanaovana azy dia redémarrena mihitsy ny kernel aloha (jereo menubar, safidio **Kernel$\rightarrow$Restart Kernel and Run All Cells**).

- Izay misy hoe `YOUR CODE HERE` na `YOUR ANSWER HERE` ihany no fenoina. Afaka manampy cells vaovao raha ilaina. Aza adino ny mameno references eo ambany raha ilaina.

## References
* [Segmoid function](https://en.wikipedia.org/wiki/Sigmoid_function)
* [regression logistique](https://fr.wikipedia.org/wiki/R%C3%A9gression_logistique)
* [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent)
* [Log Loss / Cross-Entropy - Wikipedia](https://en.wikipedia.org/wiki/Cross-entropy)
* [Mean squred error](https://en.wikipedia.org/wiki/Mean_squared_error)
* [regression de ridge](https://en.wikipedia.org/wiki/Ridge_regression)

---

In [732]:
from random import randrange
import numpy as np
from sklearn.metrics import mean_squared_error, log_loss
from sklearn.datasets import load_breast_cancer, load_diabetes


def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5, error=1e-9):
    """
    sample a few random elements and only return numerical
    in this dimensions
    """

    for i in range(num_checks):
        ix = tuple([randrange(m) for m in x.shape])

        oldval = x[ix]
        x[ix] = oldval + h  # increment by h
        fxph = f(x)  # evaluate f(x + h)
        x[ix] = oldval - h  # increment by h
        fxmh = f(x)  # evaluate f(x - h)
        x[ix] = oldval  # reset

        grad_numerical = (fxph - fxmh) / (2 * h)
        grad_analytic = analytic_grad[ix]
        rel_error = abs(grad_numerical - grad_analytic) / (
            abs(grad_numerical) + abs(grad_analytic)
        )
        print(
            "numerical: %f analytic: %f, relative error: %e"
            % (grad_numerical, grad_analytic, rel_error)
        )
        assert rel_error < error

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

# Linear regression

In [733]:
data = load_diabetes()
X_train1, y_train1 = data.data, data.target
w1 = np.random.randn(X_train1.shape[1]) * 0.0001
b1 = np.random.randn(1) * 0.0001

In [734]:
def mse_loss_naive(w, b, X, y, alpha=0):
    """
    MSE loss function WITH FOR LOOPs
    
    Returns a tuple of:
    - loss 
    - gradient with respect to weights w
    - gradient with respect to bias b
    """
    loss = 0.0
    dw = np.zeros_like(w)  # Initialiser dw avec la même forme que w
    db = 0.0
    
    n = len(X)  # Nombre d'exemples
    
    # Calcul de la perte MSE et des gradients
    for i in range(n):
        y_pred = 0.0
        # Calcul de la prédiction pour l'exemple i
        for j in range(len(w)):  # Calcul de X[i] . w
            y_pred += X[i][j] * w[j]
        y_pred += b
        
        # Calcul de la perte pour cet exemple
        loss += (y[i] - y_pred) ** 2
        
        # Calcul des gradients pour cet exemple
        for j in range(len(w)):  # Gradient par rapport aux poids
            dw[j] += -2 * X[i][j] * (y[i] - y_pred)
        db += -2 * (y[i] - y_pred)  # Gradient par rapport au biais

    # Moyenne sur tous les exemples
    loss /= n
    dw /= n
    db /= n
    
    # Ajouter la régularisation L2 si alpha > 0
    if alpha > 0:
        for j in range(len(w)):
            loss += alpha * w[j] ** 2  # Ajouter la régularisation
            dw[j] += 2 * alpha * w[j]  # Gradient de la régularisation

    return loss, dw, np.array(db).reshape(1,)  # Retour avec le format correct


### Naive Linear regression loss

In [735]:
loss, dw1, db1 = mse_loss_naive(w1, b1, X_train1, y_train1, alpha=0)

sk_loss = mean_squared_error(X_train1 @ w1 + b1, y_train1)
print("Loss error : ",rel_error(loss, sk_loss))
assert rel_error(loss, sk_loss) < 1e-9

print("Gradient check w")
# Check with numerical gradient w
f = lambda w1: mse_loss_naive(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f, w1, dw1, 15,  error=1e-5)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b1: mse_loss_naive(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f2, b1, db1, 15,  error=1e-5)

  dw[j] += -2 * X[i][j] * (y[i] - y_pred)
  dw[j] += -2 * X[i][j] * (y[i] - y_pred)


Loss error :  3.12815550455564e-16
Gradient check w
numerical: -4.296088 analytic: -4.296087, relative error: 4.037504e-08
numerical: -0.315453 analytic: -0.315453, relative error: 7.845173e-08
numerical: -1.275042 analytic: -1.275043, relative error: 5.000248e-07
numerical: -1.275042 analytic: -1.275043, relative error: 5.000248e-07
numerical: -1.275042 analytic: -1.275043, relative error: 5.000248e-07
numerical: -1.553186 analytic: -1.553188, relative error: 3.444153e-07
numerical: -4.145419 analytic: -4.145418, relative error: 1.134015e-07
numerical: -2.801912 analytic: -2.801913, relative error: 1.067256e-07
numerical: -1.275042 analytic: -1.275043, relative error: 5.000248e-07
numerical: -3.234111 analytic: -3.234109, relative error: 2.558277e-07
numerical: -3.234111 analytic: -3.234109, relative error: 2.558277e-07
numerical: -3.153315 analytic: -3.153316, relative error: 1.573529e-07
numerical: -3.234111 analytic: -3.234109, relative error: 2.558277e-07
numerical: -4.296088 anal

### Naive Ridge regression loss

In [736]:
loss, dw1, db1 = mse_loss_naive(w1, b1, X_train1, y_train1, alpha=1)

print("Gradient check w")
# Check with numerical gradient w
f = lambda w1: mse_loss_naive(w1, b1, X_train1, y_train1, alpha=1)[0]
grad_numerical = grad_check_sparse(f, w1, dw1, 15,  error=1e-5)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b1: mse_loss_naive(w1, b1, X_train1, y_train1, alpha=1)[0]
grad_numerical = grad_check_sparse(f2, b1, db1, 15,  error=1e-5)

Gradient check w


  dw[j] += -2 * X[i][j] * (y[i] - y_pred)


numerical: -3.234130 analytic: -3.234128, relative error: 2.714084e-07
numerical: -4.145478 analytic: -4.145478, relative error: 1.126765e-07
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -1.274901 analytic: -1.274902, relative error: 4.730617e-07
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -1.553317 analytic: -1.553318, relative error: 3.389340e-07
numerical: -1.376010 analytic: -1.376010, relative error: 3.838528e-08
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: 2.891891 analytic: 2.891891, relative error: 5.103281e-08
numerical: -4.296216 analytic: -4.296216, relative error: 4.085456e-08
numerical: 2.891891 analytic: 2.891891, relative error: 5.103281e-08
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -1.553317 analytic: -1.553318, relative error: 3.389340e-07
numerical: -1.376010 analytic: -1.376010, relative error: 3.838528e-08
numerical:

In [737]:
def mse_loss_vectorized(w, b, X, y, alpha=0):
    """
    MSE loss function WITHOUT FOR LOOPs
    
    Returns a tuple of:
    - loss 
    - gradient with respect to weights w
    - gradient with respect to bias b
    """
    # Nombre d'exemples
    n = X.shape[0]
    
    # Calcul des prédictions
    y_pred = X @ w + b
    
    # Calcul de la perte MSE
    loss = np.mean((y_pred - y) ** 2)
    
    # Ajouter la régularisation L2 si alpha > 0
    if alpha > 0:
        loss += alpha * np.sum(w ** 2)  # Ajout de la régularisation

    # Calcul des gradients
    dw = 2 * X.T @ (y_pred - y) / n + 2 * alpha * w # Gradient par rapport aux poids
    db = 2 * np.sum(y_pred - y) / n  # Gradient par rapport au biais
    
    return loss, dw, np.array(db).reshape(1,)


### Vectorised Linear regression loss

In [738]:
loss, dw1, db1 = mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)

sk_loss = mean_squared_error(X_train1 @ w1 + b1, y_train1)
print("Loss error : ",rel_error(loss, sk_loss))
assert rel_error(loss, sk_loss) < 1e-9

print("Gradient check w")
# Check with numerical gradient w
f = lambda w1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f, w1, dw1, 15,  error=1e-5)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f2, b1, db1, 15,  error=1e-5)

Loss error :  0.0
Gradient check w
numerical: -1.376393 analytic: -1.376393, relative error: 5.684287e-08
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: -2.801913 analytic: -2.801913, relative error: 5.557325e-08
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: -3.234109 analytic: -3.234109, relative error: 2.730190e-09
numerical: -0.315453 analytic: -0.315453, relative error: 7.845173e-08
numerical: -0.315453 analytic: -0.315453, relative error: 7.845173e-08
numerical: 2.892060 analytic: 2.892060, relative error: 2.015473e-08
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: 2.892060 analytic: 2.892060, relative error: 2.015473e-08
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: -3.234109 analytic: -3.234109, relative error: 2.730190e-09
numerical: -3.153317 analytic: -3.153316, rela

In [739]:
loss, dw1, db1 = mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)

sk_loss = mean_squared_error(X_train1 @ w1 + b1, y_train1)
print("Loss error : ",rel_error(loss, sk_loss))
assert rel_error(loss, sk_loss) < 1e-9

print("Gradient check w")
# Check with numerical gradient w
f = lambda w1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f, w1, dw1, 15,  error=1e-5)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=0)[0]
grad_numerical = grad_check_sparse(f2, b1, db1, 15,  error=1e-5)

Loss error :  0.0
Gradient check w
numerical: 2.892060 analytic: 2.892060, relative error: 2.015473e-08
numerical: -4.296087 analytic: -4.296087, relative error: 2.313587e-08
numerical: -4.145418 analytic: -4.145418, relative error: 3.702697e-09
numerical: -4.296087 analytic: -4.296087, relative error: 2.313587e-08
numerical: -3.153317 analytic: -3.153316, relative error: 1.570198e-08
numerical: -1.275044 analytic: -1.275043, relative error: 1.419500e-07
numerical: -4.296087 analytic: -4.296087, relative error: 2.313587e-08
numerical: -4.145418 analytic: -4.145418, relative error: 3.702697e-09
numerical: -2.801913 analytic: -2.801913, relative error: 5.557325e-08
numerical: -4.296087 analytic: -4.296087, relative error: 2.313587e-08
numerical: -4.145418 analytic: -4.145418, relative error: 3.702697e-09
numerical: -1.553188 analytic: -1.553188, relative error: 6.548137e-08
numerical: -1.376393 analytic: -1.376393, relative error: 5.684287e-08
numerical: 2.892060 analytic: 2.892060, rela

### Vectorized ridge regression loss

In [740]:
loss, dw1, db1 = mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=1)

print("Gradient check w")
# Check with numerical gradient w
f = lambda w1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=1)[0]
grad_numerical = grad_check_sparse(f, w1, dw1, 15,  error=1e-5)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b1: mse_loss_vectorized(w1, b1, X_train1, y_train1, alpha=1)[0]
grad_numerical = grad_check_sparse(f2, b1, db1, 15,  error=1e-5)

Gradient check w
numerical: -4.296216 analytic: -4.296216, relative error: 2.265445e-08
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -1.274903 analytic: -1.274902, relative error: 1.689841e-07
numerical: -1.553319 analytic: -1.553318, relative error: 7.092821e-08
numerical: -4.145478 analytic: -4.145478, relative error: 2.979284e-09
numerical: -1.553319 analytic: -1.553318, relative error: 7.092821e-08
numerical: -3.153363 analytic: -3.153363, relative error: 1.894784e-08
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -0.315114 analytic: -0.315114, relative error: 4.515557e-08
numerical: -4.296216 analytic: -4.296216, relative error: 2.265445e-08
numerical: -4.145478 analytic: -4.145478, relative error: 2.979284e-09
numerical: -4.145478 analytic: -4.145478, relative error: 2.979284e-09
numerical: -1.274903 analytic: -1.274902, relative error: 1.689841e-07
numerical: -4.296216 analytic: -4.296216, relative error: 2.

# Logistic regression

In [741]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

data = load_breast_cancer()
X_train2, y_train2 = data.data, data.target
w2 = np.random.randn(X_train2.shape[1]) * 0.0001
b2 = np.random.randn(1) * 0.0001

### Naive

In [742]:
def log_loss_naive(w, b, X, y, alpha=0):
    """
    Log loss function WITH FOR LOOPs
    
    Returns a tuple of:
    - loss 
    - gradient with respect to weights w
    - gradient with respect to bias b
    """
    loss = 0.0
    dw = np.zeros_like(w)
    db = 0.0
    
    n = len(X)  # Nombre d'exemples
    
    # Calcul de la perte logistique et des gradients
    for i in range(n):
        # Calcul de la prédiction pour l'exemple i
        y_pred = 0.0
        for j in range(len(w)):  # Calcul de X[i] . w
            y_pred += X[i][j] * w[j]
        y_pred += b
        
        # Application de la fonction sigmoïde
        sigmoid_pred = 1 / (1 + np.exp(-y_pred))
        
        # Calcul de la perte pour cet exemple
        loss += -y[i] * np.log(sigmoid_pred) - (1 - y[i]) * np.log(1 - sigmoid_pred)
        
        # Calcul des gradients pour cet exemple
        for j in range(len(w)):  # Gradient par rapport aux poids
            dw[j] += (sigmoid_pred - y[i]) * X[i][j]
        db += sigmoid_pred - y[i]  # Gradient par rapport au biais

    # Moyenne sur tous les exemples
    loss /= n
    dw /= n
    db /= n
    
    # Ajouter la régularisation L2 si alpha > 0
    if alpha > 0:
        for j in range(len(w)):
            loss += alpha * w[j] ** 2  # Ajouter la régularisation à la perte
            dw[j] += 2 * alpha * w[j]  # Ajouter la régularisation au gradient

    return loss, dw, np.array(db).reshape(1,)  # Retour avec le format correct


In [743]:
y_pred_0 = sigmoid(X_train2 @ w2 + b2)
y_pred = np.vstack([1-y_pred_0, y_pred_0]).T
sk_loss = log_loss(y_train2, y_pred)

loss, dw2, db2 = log_loss_naive(w2, b2, X_train2, y_train2, alpha=0)
print("Loss error : ",rel_error(loss, sk_loss))
assert rel_error(loss, sk_loss) < 1e-9

print("Gradient check w")
# Check with numerical gradient w
f = lambda w2: log_loss_naive(w2, b2, X_train2, y_train2, alpha=0)[0]
grad_numerical = grad_check_sparse(f, w2, dw2, 15, error=1e-4)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b2: log_loss_naive(w2, b2, X_train2, y_train2, alpha=0)[0]
grad_numerical = grad_check_sparse(f2, b2, db2, 15,  error=1e-5)

  dw[j] += (sigmoid_pred - y[i]) * X[i][j]


Loss error :  1.561596465087343e-16
Gradient check w


  "numerical: %f analytic: %f, relative error: %e"


numerical: -0.000697 analytic: -0.000697, relative error: 5.127431e-10
numerical: -0.104564 analytic: -0.104564, relative error: 1.760468e-10
numerical: 0.574468 analytic: 0.574468, relative error: 2.666072e-09
numerical: 0.574468 analytic: 0.574468, relative error: 2.666072e-09
numerical: 0.025315 analytic: 0.025315, relative error: 3.793623e-10
numerical: 142.247903 analytic: 142.249327, relative error: 5.006893e-06
numerical: 0.007182 analytic: 0.007182, relative error: 1.661350e-09
numerical: -0.010610 analytic: -0.010610, relative error: 8.995161e-10
numerical: 9.587780 analytic: 9.587781, relative error: 4.512927e-08
numerical: -0.708580 analytic: -0.708580, relative error: 2.396063e-09
numerical: -0.010610 analytic: -0.010610, relative error: 8.995161e-10
numerical: -0.005324 analytic: -0.005324, relative error: 1.809466e-09
numerical: -0.708580 analytic: -0.708580, relative error: 2.396063e-09
numerical: 0.046509 analytic: 0.046509, relative error: 1.685480e-10
numerical: -0.00

### Naive with regulariztion

In [744]:
loss, dw2, db2 = log_loss_naive(w2, b2, X_train2, y_train2, alpha=1)

print("Gradient check w")
# Check with numerical gradient w
f = lambda w2: log_loss_naive(w2, b2, X_train2, y_train2, alpha=1)[0]
grad_numerical = grad_check_sparse(f, w2, dw2, 15, error=1e-4)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b2: log_loss_naive(w2, b2, X_train2, y_train2, alpha=1)[0]
grad_numerical = grad_check_sparse(f2, b2, db2, 15,  error=1e-5)

  dw[j] += (sigmoid_pred - y[i]) * X[i][j]


Gradient check w


  "numerical: %f analytic: %f, relative error: %e"


numerical: 0.000999 analytic: 0.000999, relative error: 2.094554e-08
numerical: -0.734764 analytic: -0.734764, relative error: 5.414611e-09
numerical: -0.001033 analytic: -0.001033, relative error: 1.134618e-09
numerical: 0.000999 analytic: 0.000999, relative error: 2.094554e-08
numerical: -0.012104 analytic: -0.012104, relative error: 3.639682e-10
numerical: 0.025465 analytic: 0.025465, relative error: 3.681944e-10
numerical: 0.000999 analytic: 0.000999, relative error: 2.094554e-08
numerical: 0.007072 analytic: 0.007072, relative error: 1.988774e-09
numerical: 0.000821 analytic: 0.000821, relative error: 4.288783e-09
numerical: -0.001033 analytic: -0.001033, relative error: 1.134618e-09
numerical: -0.001033 analytic: -0.001033, relative error: 1.134618e-09
numerical: 0.011363 analytic: 0.011363, relative error: 1.160073e-10
numerical: 0.574561 analytic: 0.574561, relative error: 2.666015e-09
numerical: 0.151503 analytic: 0.151503, relative error: 5.871658e-09
numerical: 0.016894 anal

### Vectorized

In [745]:
def log_loss_vectorized(w, b, X, y, alpha=0):
    """
    Log loss function WITHOUT FOR LOOPS

    Arguments:
    w -- poids du modèle (vecteur de taille (d,))
    b -- biais du modèle (scalaire)
    X -- données d'entrée (matrice de taille (n, d))
    y -- valeurs cibles réelles (vecteur de taille (n,))
    alpha -- coefficient de régularisation L2 (scalaire, par défaut 0)
    
    Retourne:
    loss -- perte logistique
    dw -- gradient de la perte par rapport aux poids (vecteur de taille (d,))
    db -- gradient de la perte par rapport au biais (scalaire)
    """
    
    n = X.shape[0]  # Nombre d'exemples
    
    # Calcul des prédictions (logits)
    y_pred = X.dot(w) + b  # Produit matriciel X.w + b
    
    # Application de la fonction sigmoïde
    sigmoid_pred = 1 / (1 + np.exp(-y_pred))
    
    # Calcul de la perte logistique (log loss)
    loss = -np.mean(y * np.log(sigmoid_pred) + (1 - y) * np.log(1 - sigmoid_pred))
    
    # Ajouter la régularisation L2 si alpha > 0
    if alpha > 0:
        loss += alpha * np.sum(w ** 2)  # Régularisation sur les poids
    
    # Calcul des gradients
    dw = X.T.dot(sigmoid_pred - y) / n  # Gradient par rapport aux poids w
    
    # Ajouter la régularisation L2 au gradient des poids
    if alpha > 0:
        dw += 2 * alpha * w  # Régularisation sur les gradients des poids
    
    db = np.sum(sigmoid_pred - y) / n   # Gradient par rapport au biais b
    
    # Retourner la perte et les gradients
    return loss, dw, np.array(db).reshape(1,)


In [746]:
y_pred_0 = sigmoid(X_train2 @ w2 + b2)
y_pred = np.vstack([1-y_pred_0, y_pred_0]).T
sk_loss = log_loss(y_train2, y_pred)

loss, dw2, db2 = log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=0)
print("Loss error : ",rel_error(loss, sk_loss))
assert rel_error(loss, sk_loss) < 1e-9

print("Gradient check w")
# Check with numerical gradient w
f = lambda w2: log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=0)[0]
grad_numerical = grad_check_sparse(f, w2, dw2, 15, error=1e-4)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b2: log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=0)[0]
grad_numerical = grad_check_sparse(f2, b2, db2, 15,  error=1e-5)

Loss error :  0.0
Gradient check w
numerical: 0.151743 analytic: 0.151743, relative error: 5.829276e-09
numerical: -0.000218 analytic: -0.000218, relative error: 4.353361e-09
numerical: -0.708580 analytic: -0.708580, relative error: 2.356892e-09
numerical: 0.016846 analytic: 0.016846, relative error: 2.900927e-11
numerical: 0.001193 analytic: 0.001193, relative error: 9.547380e-10
numerical: -0.734952 analytic: -0.734952, relative error: 5.468686e-09
numerical: -0.734952 analytic: -0.734952, relative error: 5.468686e-09
numerical: -0.004158 analytic: -0.004158, relative error: 4.316839e-10
numerical: -0.000218 analytic: -0.000218, relative error: 4.353361e-09
numerical: -0.104564 analytic: -0.104564, relative error: 9.762140e-12
numerical: 0.340628 analytic: 0.340628, relative error: 1.038116e-10
numerical: 1.646976 analytic: 1.646977, relative error: 1.547210e-07
numerical: 0.011274 analytic: 0.011274, relative error: 1.246484e-10
numerical: 0.046509 analytic: 0.046509, relative error

### Vectorized with regularization

In [747]:
loss, dw2, db2 = log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=1)

print("Gradient check w")
# Check with numerical gradient w
f = lambda w2: log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=1)[0]
grad_numerical = grad_check_sparse(f, w2, dw2, 15, error=1e-4)

print("Gradient check bias")
# Check with numerical gradient b
f2 = lambda b2: log_loss_vectorized(w2, b2, X_train2, y_train2, alpha=1)[0]
grad_numerical = grad_check_sparse(f2, b2, db2, 15,  error=1e-5)

Gradient check w
numerical: -0.012104 analytic: -0.012104, relative error: 3.239534e-10
numerical: 74.653275 analytic: 74.653701, relative error: 2.858174e-06
numerical: 0.016894 analytic: 0.016894, relative error: 1.179457e-10
numerical: 0.000821 analytic: 0.000821, relative error: 2.468534e-09
numerical: -0.005174 analytic: -0.005174, relative error: 6.118390e-10
numerical: 0.046413 analytic: 0.046413, relative error: 2.175943e-12
numerical: 0.000573 analytic: 0.000573, relative error: 8.446796e-09
numerical: -0.104496 analytic: -0.104496, relative error: 6.201417e-13
numerical: 142.248015 analytic: 142.249440, relative error: 5.006889e-06
numerical: 0.021053 analytic: 0.021053, relative error: 1.738152e-10
numerical: 0.025465 analytic: 0.025465, relative error: 6.778570e-11
numerical: 0.016894 analytic: 0.016894, relative error: 1.179457e-10
numerical: 0.000999 analytic: 0.000999, relative error: 1.282120e-09
numerical: -0.005288 analytic: -0.005288, relative error: 1.629794e-10
num

# Gradient descent for Linear models

In [748]:
import numpy as np

class LinearModel():
    def __init__(self):
        self.w = None
        self.b = None

    def train(self, X, y, learning_rate=1e-3, alpha=0, num_iters=100, batch_size=200, verbose=False):
        N, d = X.shape
        
        if self.w is None:  # Initialisation
            self.w = 0.001 * np.random.randn(d)
            self.b = 0.0

        # Run stochastic gradient descent to optimize w
        loss_history = []
        for it in range(num_iters):
            # Tirage d'un mini-lot aléatoire
            indices = np.random.choice(N, batch_size, replace=False)
            X_batch = X[indices]
            y_batch = y[indices]
                                                               
            # Evaluer la perte et les gradients en utilisant la fonction de perte correspondante
            loss, dw, db = self.loss(X_batch, y_batch, alpha)
            loss_history.append(loss)

            # Mettre à jour les paramètres
            self.w -= learning_rate * dw
            self.b -= learning_rate * db
            
            # Afficher la perte toutes les 10000 itérations si verbose est True
            if verbose and it % 10000 == 0:
                print(f"iteration {it} / {num_iters}: loss {loss:.6f}")
                
        return loss_history

    def predict(self, X):
        """ Méthode générique pour la prédiction, utilisée dans les sous-classes """
        return X.dot(self.w) + self.b

    def loss(self, X_batch, y_batch, reg):
        """ Méthode à implémenter dans les sous-classes """
        raise NotImplementedError()


class LinearRegressor(LinearModel):
    """ Régression linéaire """

    def loss(self, X_batch, y_batch, alpha):
        """ Utilisation de la fonction mse_loss_vectorized pré-existante """
        return mse_loss_vectorized(self.w, self.b, X_batch, y_batch, alpha)
    
    def predict(self, X):
        """ Prédiction pour la régression linéaire """
        return X.dot(self.w) + self.b


class LogisticRegressor(LinearModel):
    """ Régression logistique """

    def loss(self, X_batch, y_batch, alpha):
        """ Utilisation de la fonction log_loss_vectorized pré-existante """
        return log_loss_vectorized(self.w, self.b, X_batch, y_batch, alpha)
    
    def predict(self, X):
        """ Retourner un vecteur de labels 0 ou 1 basé sur la régression logistique """
        y_pred = X.dot(self.w) + self.b
        sigmoid_pred = 1 / (1 + np.exp(-y_pred))
        return (sigmoid_pred >= 0.5).astype(int)


## Linear regression with gradient descent

In [749]:
from sklearn.linear_model import LinearRegression

sk_model = LinearRegression(fit_intercept=True)
sk_model.fit(X_train1, y_train1)
sk_pred = sk_model.predict(X_train1)
sk_mse = mean_squared_error(sk_pred, y_train1)

model = LinearRegressor()
model.train(X_train1, y_train1, num_iters=75000, batch_size=64, learning_rate=1e-2, verbose=True)
pred = model.predict(X_train1)
mse = mean_squared_error(pred, y_train1)

print("MSE scikit-learn:", sk_mse)
print("MSE gradient descent model :", mse)
assert mse - sk_mse < 100

iteration 0 / 75000: loss 25639.985777
iteration 10000 / 75000: loss 3589.225234
iteration 20000 / 75000: loss 3085.272527
iteration 30000 / 75000: loss 2515.969871
iteration 40000 / 75000: loss 3384.985293
iteration 50000 / 75000: loss 3280.957417
iteration 60000 / 75000: loss 2418.706388
iteration 70000 / 75000: loss 2859.704411
MSE scikit-learn: 2859.6963475867506
MSE gradient descent model : 2884.254180528585


## Logistc regression with gradient descent

In [750]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train2 = scaler.fit_transform(X_train2)

sk_model = LogisticRegression(fit_intercept=True)
sk_model.fit(X_train2, y_train2)
sk_pred = sk_model.predict(X_train2)
sk_log_loss = log_loss(sk_pred, y_train2)

model = LogisticRegressor()
model.train(X_train2, y_train2, num_iters=75000, batch_size=64, learning_rate=1e-3, verbose=True)
pred = model.predict(X_train2)
model_log_loss = log_loss(pred, y_train2)

print("Log-loss scikit-learn:", sk_log_loss)
print("Log-loss gradiet descent model :", model_log_loss)
print("Error :", rel_error(sk_log_loss, model_log_loss))
assert rel_error(sk_log_loss, model_log_loss) < 1e-7

iteration 0 / 75000: loss 0.693999
iteration 10000 / 75000: loss 0.103628
iteration 20000 / 75000: loss 0.073980
iteration 30000 / 75000: loss 0.071440
iteration 40000 / 75000: loss 0.079441
iteration 50000 / 75000: loss 0.033326
iteration 60000 / 75000: loss 0.047670
iteration 70000 / 75000: loss 0.034414
Log-loss scikit-learn: 0.44341928598210933
Log-loss gradiet descent model : 0.44341928598210933
Error : 0.0
