# Exploring gradient descent
Step formula:
$$
\eta_k = \lambda(\frac{s_0}{s_0 + k})^p
$$
You don't need to set $s_0$ or $p_0$, you can use default $1$ and $0.5$, but you should adjust $\lambda$

In this task we use MSE loss function:
$$
Q(w) = \frac{1}{l}\sum\limits_{i=1}^l (a_w(x_i) - y_i)^2
$$


In [None]:
#!source<mlpractice.gradient_descent.BaseDescent>

## Gradient descent

$$
w_{k+1} = w_k - \eta_k \nabla_w Q(w_k)
$$

In [None]:
#!source<mlpractice.gradient_descent.GradientDescent>

In [None]:
# from mlpractice.tests.gradient_descent.test_gradient_descent import test_all
# GradDesc = GradientDescent() # TODO
# test_all(GradDesc)

## Stochastic Descent

$$
w_{k+1} = w_k - \eta_k \nabla_w q_{i_k}(w_k)
$$
where $\nabla_w q_{i_k}(w_k)$ - gradient estimation for batch with randomly selected objects

In [None]:
#!source<mlpractice.gradient_descent.StochasticDescent>

In [None]:
# from mlpractice.tests.gradient_descent.test_stochastic_descent import test_all
# StochDesc = StochasticDescent() # TODO
# test_all(StochDesc)

## Momentum Descent

$$
h_0 = 0, h_{k+1} = \alpha h_k + \eta_k \nabla_w Q(w_k),\\
w_{k+1} = w_k - h_{k + 1}
$$

In [None]:
#!source<mlpractice.gradient_descent.MomentumDescent>

In [None]:
# from mlpractice.tests.gradient_descent.test_momentum_descent import test_all
# MomhDesc = MomentumDescent() # TODO
# test_all(MomDesc)

## Adagrad

$$
G_0 = 0, G_{k+1} = G_k + (\nabla_w Q(w_k))^2,\\
w_{k+1} = w_k - \frac{\eta_k}{\sqrt{\varepsilon + G_{k+1}}} \nabla_k Q(w_k)
$$

In [None]:
#!source<mlpractice.gradient_descent.Adagrad>

In [None]:
# from mlpractice.tests.gradient_descent.test_adagrad import test_all
# Adagr = Adagrad() # TODO
# test_all(Adagr)

# Gradient Descent in action
## Linear Regression
To see how gradient descent can provide minimizing loss, we propose the implementation of linear regression, that studying with using gradient descent.

In [None]:
#!source<mlpractice.gradient_descent.LinearRegression>

In [None]:
# from mlpractice.tests.gradient_descent.test_linear_regression import test_all
# regression = LinearRegression() # TODO
# test_all(regression)

# Regularization
$$
G(w) = \frac{1}{l}\sum\limits_{i=1}^l (a_w(x_i) - y_i)^2 + \frac{\mu}{2}||w||^2
$$

In [None]:
#!source<mlpractice.gradient_descent.GradientDescentReg>

In [None]:
# from mlpractice.tests.gradient_descent.test_gradient_reg import test_all
# GradReg = GradientDescentReg() # TODO
# test_all(GradReg)

In [None]:
#!source<mlpractice.gradient_descent.StochasticDescentReg>

In [None]:
# from mlpractice.tests.gradient_descent.test_stochastic_reg import test_all
# StochReg = StochasticDescentReg() # TODO
# test_all(StochReg)

In [None]:
#!source<mlpractice.gradient_descent.MomentumDescentReg>

In [None]:
# from mlpractice.tests.gradient_descent.test_momentum_reg import test_all
# MomReg = MomentumDescentReg() # TODO
# test_all(MomReg)

In [None]:
#!source<mlpractice.gradient_descent.AdagradReg>

In [None]:
# from mlpractice.tests.gradient_descent.test_adagrad_reg import test_all
# AdagradReg = AdagradDescentReg() # TODO
# test_all(AdagradReg)