In [4]:
import IPython.display
IPython.display.display_latex(IPython.display.Latex(filename="macros.tex"))

# Линейная регрессия как пример модели. Аналитеческое решение и градиентный спуск как пример метода обученя.

Создаем выборку из 10 объектов, у каждого по 3 признака<br>

In [2]:
import numpy as np
X = np.around(np.random.normal(0, 1, [10, 3]), 3)
print(X)
teta_real = np.random.normal(0, 1, 4)
print(teta_real)

[[ 0.136  0.759  1.252]
 [-0.809 -0.811 -1.168]
 [ 0.788  0.614 -1.172]
 [ 1.33   0.83   0.537]
 [ 0.997  0.753 -1.171]
 [-0.77  -0.307  0.521]
 [ 0.086  1.501 -0.701]
 [-2.303 -0.902  1.811]
 [-0.424  0.212 -1.029]
 [ 1.524 -0.094  0.341]]
[ 0.78450144 -2.278559    1.96637999 -0.84678869]


Для линйной регрессии у нас модель выглядит следующим образом:
$$\model = \{g(x, \params)\:|\:\params\in\setParams\}$$
$$where\: g(x, \params) = \params_0 + \params_1 * x_1 + \params_2 * x_2 + \params_3 * x_3 \:and\: \params = [\params_0, \params_1, \params_2, \params_3]$$

In [5]:
ones = np.ones((X.shape[0], 1))
X_ones = np.append(ones, X, axis=1)
X_ones

array([[ 1.   ,  0.136,  0.759,  1.252],
       [ 1.   , -0.809, -0.811, -1.168],
       [ 1.   ,  0.788,  0.614, -1.172],
       [ 1.   ,  1.33 ,  0.83 ,  0.537],
       [ 1.   ,  0.997,  0.753, -1.171],
       [ 1.   , -0.77 , -0.307,  0.521],
       [ 1.   ,  0.086,  1.501, -0.701],
       [ 1.   , -2.303, -0.902,  1.811],
       [ 1.   , -0.424,  0.212, -1.029],
       [ 1.   ,  1.524, -0.094,  0.341]])

In [6]:
Y = np.dot(X_ones, teta_real) + np.random.normal(0, 0.1, 10)

In [7]:
X, Y

(array([[ 0.136,  0.759,  1.252],
        [-0.809, -0.811, -1.168],
        [ 0.788,  0.614, -1.172],
        [ 1.33 ,  0.83 ,  0.537],
        [ 0.997,  0.753, -1.171],
        [-0.77 , -0.307,  0.521],
        [ 0.086,  1.501, -0.701],
        [-2.303, -0.902,  1.811],
        [-0.424,  0.212, -1.029],
        [ 1.524, -0.094,  0.341]]),
 array([ 0.9354391 ,  2.0098984 ,  1.29538372, -1.12025842,  1.06426372,
         1.47306719,  4.11192593,  2.71233127,  2.86615713, -3.09155124]))

In [8]:
X.shape

(10, 3)

In [9]:
ones = np.ones((X.shape[0], 1))
X_ones = np.append(ones, X, axis=1)
X_ones

array([[ 1.   ,  0.136,  0.759,  1.252],
       [ 1.   , -0.809, -0.811, -1.168],
       [ 1.   ,  0.788,  0.614, -1.172],
       [ 1.   ,  1.33 ,  0.83 ,  0.537],
       [ 1.   ,  0.997,  0.753, -1.171],
       [ 1.   , -0.77 , -0.307,  0.521],
       [ 1.   ,  0.086,  1.501, -0.701],
       [ 1.   , -2.303, -0.902,  1.811],
       [ 1.   , -0.424,  0.212, -1.029],
       [ 1.   ,  1.524, -0.094,  0.341]])

_We have_:<br>
$\sample * \params = \answers_{predict}$<br>
_We need_(OLS):<br>
$ER\rightarrow min\Rightarrow (\answers_{predict} - \answers)^T * (\answers_{predict} - \answers)\rightarrow min$<br>
$(\sample * \params - \answers)^T * (\sample * \params - \answers)\rightarrow min$

**analytical solution**
$$\sample*\params = \answers$$
$$(\sample^T * \sample)*\params = \sample^T * \answers\Rightarrow\params = (\sample^T * \sample)^{-1}*\sample^T * \answers$$

$(\sample^T * \sample)^{-1}*\sample^T$ - pseudoinverse matrix

In [12]:
X_ones_tr = np.transpose(X_ones)

teta_analitics = np.dot(
    np.dot(
        np.linalg.inv( 
            np.dot(X_ones_tr, X_ones) 
        ),
        X_ones_tr), 
    Y
)

Y_pred = np.dot(X_ones, teta_analitics)

Analytical solution is hard for compute.<br>
So we have **gradient descent**<br>
iterative method:<br>
$$\params^{(j+1)}_k:=\params^{(j)}_k - \learningRate * \frac{\partial}{\partial \params^{(j)}_k}\empericRisk$$

For OLS:<br>
$$\empericRisk = \frac{1}{2l} \sum_{i = 1}^l ((\alg(x_i) - y_i)^2$$
$$\frac{\partial}{\partial_k \params}\empericRisk =  \sum_{i = 1}^l ((\alg(x_i) - y_i)* x_{ik}$$
<br>
repeat until it converges:
$$\params^{(j+1)}_k:=\params^{(j)}_k - \learningRate *  \sum_{i = 1}^l ((\alg(x_i) - y_i)* x_{ik}$$
in matrix form:
$$\params^{(j+1)}:=\params^{(j)} - \learningRate\frac{1}{m}*(\sample^T*(\sample*\params^{(j)} - Y))$$

In [None]:
teta_grad_start = np.zeros(4)
learning_rate = 0.001
epsilon = 0.00001
max_iter = 100000

def step(teta): 
    return teta - learning_rate * (1.0/10) * np.dot(np.transpose(X_ones), (np.dot(X_ones, teta) - Y)) 

teta_GD = teta_grad_start
for i in range(max_iter):
    teta_GD_new = step(teta_GD) 
    if np.sum(np.abs(teta_GD_new - teta_GD)) <= epsilon:
        break 
    else: 
        teta_GD = teta_GD_new
        
print(i)
print(teta_GD)

In [None]:
Y_predGD = np.dot(X_ones, teta_GD)

In [None]:
sum((Y_predGD - Y)**2)

In [None]:
sum((Y_pred - Y)**2) - sum((Y_predGD - Y) ** 2)

In [None]:
### Specifics of gradient descent

<center>**normalize features**</center>
<img src="images/GD_normal.PNG">

<center>**choosing learning rate**</center>
<img src="images/GD_learningRate.PNG">

<center>**several minimus**</center>
<img src="images/GD_severalMinimums.PNG">