# Régression logistique

## 1 Régularisation de de Tikhonov

### Question 1.1

$$
f_1: (w_0, w) \mapsto \frac{1}{n}\sum_{i=1}^nlog(1 + e^{-y_i(x_i^Tw + w_0)}) + \frac{\rho}{2}\|w\|_2^2
$$

#### Gradient de $f_1$

$$
\frac{\partial f_1}{\partial w_0}(w_0, w) = \frac{1}{n}\sum_{i=1}^n\frac{-y_ie^{-y_i(x_i^Tw+w_0)}}{1 + e^{-y_i(x_i^T+w_0)}} = \frac{1}{n}\sum_{i=1}^n\frac{-y_i}{1 + e^{y_i(x_i^T+w_0)}}
$$

$$
\frac{\partial f_1}{\partial w}(w_0, w) = \frac{1}{n}\sum_{i=1}^n\frac{-y_ie^{-y_i(x_i^Tw+w_0)}}{1 + e^{-y_i(x_i^T+w_0)}}x_i + \rho w = \frac{1}{n}\sum_{i=1}^n\frac{-y_i}{1 + e^{y_i(x_i^T+w_0)}}x_i + \rho w
$$

$$
\nabla f_1(w_0, w) = \begin{pmatrix}\frac{1}{n}\sum_{i=1}^n\frac{-y_i}{1 + e^{y_i(x_i^T+w_0)}} \\ \frac{1}{n}\sum_{i=1}^n\frac{-y_i}{1 + e^{y_i(x_i^T+w_0)}}x_i + \rho w\end{pmatrix}
$$

#### Matrice hessienne de $f_1$

$$
\frac{\partial^2f_1}{\partial w_0^2}(w_0, w) = \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}
$$

$$
\frac{\partial^2f_1}{\partial w^2}(w_0, w) = \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_ix_i^T + \rho I_p
$$

$$
\frac{\partial^2{f_1}}{\partial w_0\partial w}(w_0, w) = \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_i^T
$$ $$
\frac{\partial^2{f_1}}{\partial w\partial w_0}(w_0, w) = \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_i
$$

$$
\text{Soit H la matrice hessienne de $f_1$ : }H(f_1) = \begin{bmatrix}\frac{\partial^2f_1}{\partial w_0^2} & \frac{\partial^2{f_1}}{\partial w_0\partial w} \\ \frac{\partial^2{f_1}}{\partial w\partial w_0} & \frac{\partial^2f_1}{\partial w^2}\end{bmatrix} = \begin{bmatrix}\frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2} & \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_i^T \\ \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_i & \frac{1}{n}\sum_{i=1}^ny_i^2\frac{e^{y_i(x_i^Tw+w_0)}}{(1 + e^{y_i(x_i^T+w_0)})^2}x_ix_i^T + \rho  I_p\end{bmatrix}
$$

#### Convexité de $f_1$

$\|.\|_2^2$ est convexe.  
Soit $f: x \mapsto log(1 + e^x)$ ; $\frac{\partial^2{f}}{\partial x^2}(x) = \frac{e^x}{(1 + e^x)^2} > 0$ donc $f$ est convexe. De plus, $(w_0, w) \mapsto x_i^Tw + w_0$ est une fonction affine donc convexe.

Ainsi, $f_1$ est convexe.

### Question 1.2

In [1]:
import numpy as np
from numpy.linalg import norm
from scipy.optimize import check_grad

In [2]:
def load_diabetic_retinopathy(filename, minidata=False):
    """
    Cette fonction lit le fichier filename, par exemple
    filename = 'diabeticRetinopathy.csv'
    Elle retourne
    X : une matrice de caracteristiques
    y : un vecteur des classes tel que si y[i] = 1, la tumeur est maligne
        et si y[i] = -1, la tumeur est benigne

    Pour plus d'infos sur la base de donnees,
    http://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set
    """

    data = np.loadtxt(filename, delimiter=',')

    y = data[:, -1] * 2 - 1
    X = data[:, :-1]

    # Standardisation de la matrice
    X = X - np.mean(X, axis=0)
    X = X / np.std(X, axis=0)

    return X, y

In [26]:
import numpy as np
from numpy.linalg import norm
from scipy.optimize import check_grad

def f1(W, X, y):
    """
    Return the value of the function, the gradient, and the Hessian.
    Input:
    W: (w0, w) parameter for the regression
    X: data input
    y: data output
    Output:
    val: value of f1
    grad: gradient
    hes: Hessian
    """

    n, p = X.shape
    rho = 1/n
    # Vectorization convenience.
    X1 = np.column_stack([np.ones(n), X])
    # Compute only once.
    e = np.exp(y*(W.dot(X1.T)))
    r = e / ((1 + e)**2)

    val = np.log(1 + np.exp(-y*(W.dot(X1.T)))).sum()/n + rho*norm(W[1:])**2/2
    grad = np.concatenate((
        [-y.dot(1/(1+e))/n],
        (-y/(1+e)).dot(X)/n + rho*W[1:]))
    hes = np.column_stack([
        np.concatenate(([(y**2).dot(r)/n], ((y**2)*r).dot(X)/n)),
        np.row_stack([((y**2)*r).dot(X)/n,
            ((y**2)*r*X.T).dot(X)/n + rho*np.eye(p)])])

    return val, grad, hes

Test du calcul du gradient :

In [27]:
from scipy.optimize import check_grad
X, y = load_diabetic_retinopathy('diabeticRetinopathy.csv')
n, p = X.shape
W = np.ones(p+1)
print(check_grad(lambda W: f1(W, X, y)[0], lambda W: f1(W, X, y)[1], W))

8.10381131131e-08
