<center><img src="Fig/Ensimag.png" width="30%" height="30%"></center>
<center><h3>Ensimag 2A</h3></center>
<hr>
<center><h1>Optimisation Numérique</h1></center>
<center><h2>TP3: Proximal Algorithms (2x1.5h)</h2></center>

# Structure of an optimization program

An optimization program can be practically divided into three parts:
* the *run* environment, in which you test, run your program, and display results.
* the *problem* part, which contains the function oracles, problem constraints, etc.
* the *algorithmic* part, where the algorithms are coded.

The main interest of such division is that these parts are interchangeable, meaning that, for instance, the algorithms of the third part can be used of a variety of problems. That is why such a decomposition is widely used.

In the present lab, you will use this division:
* `TP3_Proximal_algorithms.ipynb` will be the *run* environment
* `logreg.py` will be the considered *logistic regression problem* for this lab
* `prox.py` will contain the proximal *algorithms* studied in this lab


In [None]:
%load_ext autoreload
%autoreload 2

---

# Composite minimization for machine learning.

In this lab, we will investigate optimization algorithms over composite functions composed of a smooth and a non-smooth part using the proximal gradient algorithm over a practical problem of machine learning: binary classification using logistic regression.</br>

> + Read the file [`logistic_regression_2.ipynb`](logistic_regression_2.ipynb) containing the problem explanation and simulators.
>
> + Implement the proximal operation linked to $\ell_1$ norm in the regularization ($g$ function).
>
> + Implement the proximal gradient algorithm in the file [`src/prox.py`](src/prox.py) and test your algorithm below. Remember to tune the stepsize in the variable `step` bellow, according to what you learned in class about L-smooth functions and GD algorithms. Notice that the $f$ part of the loss is $L$-smooth (find $L$ in the code).


## Quick recap of the proximal gradient function

For minimizing a function $F:\mathbb{R}^n \to \mathbb{R}$ equal to $f+g$ where $f$ is differentiable and the $\mathbf{prox}$ of $g$ is known, given:
* the function to minimize `F`
* a 1st order oracle for $f$ `f_grad` 
* a proximity operator for $g$ `g_prox` 
* an initialization point `x0`
* the sought precision `PREC` 
* a maximal number of iterations `ITE_MAX` 
* a display boolean variable `PRINT` 

these algorithms perform iterations of the form
$$ x_{k+1} = \mathbf{prox}_{\gamma g}\left( x_k - \gamma \nabla f(x_k) \right) := \text{arg}\min_{z}\left\lbrace \gamma g(z) + \frac{1}{2}\|z - x_k + \gamma \nabla f(x_k)\|_2^2\right\rbrace $$
where $\gamma$ is a stepsize to choose.

In [None]:
from src.prox import Proximal
from src.logistic import SimulatorStudentsDataset

import numpy as np

lreg = SimulatorStudentsDataset("Logistic regression")
#### Parameter we give at our algorithm
PREC    = 1e-5                     # Sought precision
ITE_MAX = 1000                      # Max number of iterations
x0      = np.zeros(lreg.n)              # Initial point
step    = ...                      # FILL HERE

##### gradient algorithm
proxalg = Proximal(ITE_MAX, lreg, x0, step, prec=PREC)



> Investigate the decrease of the algorithm.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

rcode, x_tab = proxalg.run()
lreg.plot_loss(x_tab)

> Plot, with the following command, the support of the vector $x_k$ (i.e. one point for every non-null coordinate of $x_k$) versus the iterations. 

> What do yo notice? Was it expected?

In [None]:
lreg.plot_support(x_tab, period=20)

---

# Regularization path.


We saw above that the algorithm *selected* some coordinates as the other get to zero. Considering our machine learning task (see `logistic_regression_2.ipynb`), this translates into the algorithm selecting a subset of the features that will be used for the prediction step.  

> Change the parameter $\lambda_1$ of the problem (`lreg.lam1`) in the code above and investigate how it influences the number of selected features.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline


#### Parameter we give at our algorithm (see algoGradient.ipynb)
PREC    = 1e-5                     # Sought precision
ITE_MAX = 500                      # Max number of iterations
x0      = np.zeros(pb.n)              # Initial point
step = ...  # FILL HERE

# FILL THERE #######
reg_l1 = ...

lreg = SimulatorStudentsDataset(f"Logistic regression $\\lambda_1={reg_l1}$")
proxalg = Proximal(ITE_MAX, lreg, x0, step, prec=PREC)

retcode, x_tab = proxalg.run()
lreg.plot_support(x_tab)

In order to quantify the influence of this feature selection, let us consider the *regularization path* that is the support of the final points obtained by our minimization method versus the value of $\lambda_1$.

> For $\lambda_1 = 2^{-12},2^{-11}, .. , 2^{1}$, run the proximal gradient algorithm on the obtained problem and store the support of the final point, the prediction performance on the *training set* (`lreg.prediction_train(...)`) and on the *testing set* (`lreg.prediction_test(...)`).

In [None]:
# FILL THERE #######

> Plot the *regularization path* and look at the feature signification (file `student.txt` or `logistic_regression_2.ipynb`) to see which are the most important features of the dataset.

> (Bonus: you can do some text manipulation to put the labels on the plot as well).

In [None]:
# FILL THERE #######

> Plot the *training* and *testing* accuracies versus the value of $\lambda_1$.

In [None]:
log_lam = np.arange(-11, 2)
# FILL HERE ####
# ##############
plt.legend()
plt.show()

> Explore the proximal algorithm or propose ideas (cite your sources if you use pieces of litterature) to change it or compare it to something else. Send the results to your favorite TA by zipping/tarballing/... your work and either:
> * sending it directly via an email.
> * sending and email with an invitation to a PRIVATE repository (github, gitlab, bitbucket, etc) containing your work.
>
> ### Guidelines:
> Write your own code, do not try to throw LLM nonsense to your TA.
> 
> Be original.
> 
> Write every idea you have in the notebook, as verbosely as possible.
>
> Write clean code. E.g. go see python's pep8. (Notice that I did not follow it myself for the template code, as some variables are linked to mathematical notations which do not respect pep8).