<a target="_blank" href="https://colab.research.google.com/github/RodrigoAVargasHdz/CHEM-4PB3/blob/w2024/Course_Notes/Week%203/Week_3_Linear_Regression.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

## Introduction to Linear Regression ##

$$
    f(x) = w_0 + w_1x_1 + w_2x_2 + \ldots + w_dx_d = \begin{bmatrix}
    w_0 & w_1 & \cdots & w_p \\
    \end{bmatrix} \begin{bmatrix}
    1 \\
    x_1 \\
    \vdots \\
    x_d
      \end{bmatrix}
$$

* **d** ->  is the number of dimensions or features in $\mathbf{x}$
* $\mathbf{w}$ -> parameters of the linear model

# Loss function #
* Quantify the accuracy of the linear model.
  $$
    \cal{L}(\mathbf{w}) = \frac{1}{2} \sum_{i=1}^{N} \left( f(\mathbf{x}_i) - y_i \right)^2 = \frac{1}{2} \sum_{i=1}^{N} \left( \mathbf{w}^\top \mathbf{x}_i - y_i \right)^2 
  $$
  $$
    \cal{L}(\mathbf{w}) = \frac{1}{2} \left (\mathbf{y} - \mathbf{X} \mathbf{w} \right)^\top \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right)
  $$
What is $\mathbf{X}$ and $\mathbf{y}$?
$$
   \mathbf{X} = \begin{pmatrix}
[ x_1^0, & x_1^1, & \cdots,& x_1^{d-1}, & x_1^d ] \\ 
[ x_2^0, & x_2^1, & \cdots,& x_2^{d-1}, & x_2^d]  \\ 
  & & \cdots & &  \\
[ x_{N-1}^0, & x_{N-1}^1, & \cdots,& x_{N-1}^{d-1}, & x_{N-1}^d ] \\
[ x_{N}^0, & x_{N-1}^1, & \cdots,& x_{N}^{d-1}, & x_{Nt}^d ] 
\end{pmatrix} =  \begin{pmatrix}
\mathbf{x}_1^\top\\
\vdots \\
\mathbf{x}_n^\top\\
\end{pmatrix}
$$

$$
   \mathbf{y} =  \begin{pmatrix}
y_1\\
\vdots \\
y_n\\
\end{pmatrix}
$$

In [None]:
# generate random data over f(x) = sin(x) + x - 1
def get_data(N,bool_biased=True):
    x = np.linspace(-1.,1.,N) #This creates an array x of N linearly spaced values between -1 and 1.
    y = np.sin(.5*x) + x -1.
    y = y + np.random.uniform(low = 0.,high=0.5,size=x.shape) #Adds random noise to each y value.
    if bool_biased:
        X = np.column_stack((np.ones_like(x),x))
    else:
        X = x[:,None]
    return X,y

X,y = get_data(10) 
print(X)
print(y)

Let's look closer to the term $\mathbf{X}\mathbf{w}$,
$$
 \mathbf{X}\mathbf{w}  =  \begin{pmatrix}
\mathbf{x}_1^T\\ 
\vdots \\
\mathbf{x}_N^T
\end{pmatrix} \mathbf{w} = \begin{pmatrix}
[ x_1^0, & \cdots,& x_1^d ] \\ 
\cdots  & \cdots & \cdots \\
[ x_N^0, & \cdots, & x_N^d ] 
\end{pmatrix} \begin{pmatrix} 
w_0\\
\vdots \\
w_d
\end{pmatrix}
$$

In [None]:
# random parameters
def get_random_params(d):
    theta_random = np.random.uniform(low=-2., high=2., size=(d))
    return theta_random

In [None]:
w = get_random_params(2)
print('Parameters: ', w)

# what operation is Xw?


## Exact Solution of Linear Regression ##

* **Gradient of a function equal to zero means a maxima or minima**
  
$$
    \nabla {\cal L}(\mathbf{w}) \Big\rvert_{\mathbf{w}^{*}} = \frac{1}{2} \nabla_{\mathbf{w}} \left [ \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right)^\top \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right) \right ]= 0
$$

To solve for $\mathbf{w}^*$, let's expand $ \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right)^\top \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right)$,

$$
    \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right)^\top \left (\mathbf{y} - \mathbf{X}\mathbf{w} \right) ={\color{red} \mathbf{y}^\top \mathbf{y}}  - {\color{blue}\mathbf{y}^\top \mathbf{X}\mathbf{w}} -  {\color{blue}\mathbf{w}^\top\mathbf{X}^\top\mathbf{y}} +   {\color{blue}\mathbf{w}^\top\mathbf{X}^\top \mathbf{X}\mathbf{w}}
$$
$$
    \nabla_{\mathbf{w}} {\cal L}(\mathbf{w}) = \frac{1}{2}\left(  -2 \mathbf{X}^\top\mathbf{y} + 2\mathbf{X}^\top\mathbf{X}\mathbf{w} \right) = 0
$$

**Extra:**
1. Homework, Proof the above equations.
2. [Equations from Sections 2.4.1 and 2.4.2](https://www2.imm.dtu.dk/pubdb/edoc/imm3274.pdf)

Solving for $\mathbf{w}$,
$$
\begin{align}
\nabla_{\mathbf{w}} {\cal L}(\mathbf{w}) &= \frac{1}{2}\left(  -2 \mathbf{X}^\top\mathbf{y} + 2{\color{blue}\mathbf{X}^\top\mathbf{X}} \mathbf{w} \right) = 0 \\
 {\color{blue}\mathbf{X}^\top\mathbf{X}} \mathbf{w}  &=  \mathbf{X}^\top\mathbf{y} \\
  \mathbf{w}^* &= \left ( {\color{blue}\mathbf{X}^\top\mathbf{X}}\right ) ^{-1} \mathbf{X}^\top\mathbf{y}
\end{align}
$$

**What is $\mathbf{X}^\top\mathbf{X}$ ?**

$$
 \mathbf{X}^\top\mathbf{X} = \begin{pmatrix}
 x_1^0 &  x_2^0 & \cdots& x_N^0  \\ 
  x_1^1 &  x_2^1 & \cdots& x_N^1  \\ 
\vdots &  \vdots & \vdots& \vdots  \\ 
  x_{1}^{d-1} &  x_{2}^{d-1}& \cdots& x_{N}^{d-1}  \\ 
    x_{1}^{d} &  x_{2}^{d}& \cdots& x_{N}^{d} 
\end{pmatrix} \begin{pmatrix}
[ x_1^0, & x_1^1, & \cdots,& x_1^{d-1}, & x_1^d ] \\ 
[ x_2^0, & x_2^1, & \cdots,& x_2^{d-1}, & x_2^d]  \\ 
  & & \cdots & &  \\
[ x_{N-1}^0, & x_{N-1}^1, & \cdots,& x_{N-1}^{d-1}, & x_{N-1}^d ] \\
[ x_{N}^0, & x_{N-1}^1, & \cdots,& x_{N}^{d-1}, & x_{Nt}^d ] 
\end{pmatrix} = \begin{pmatrix}
\mathbf{x}_1 & \cdots &  \mathbf{x}_N \end{pmatrix}  \begin{pmatrix}
\mathbf{x}_1^\top\\ 
\vdots \\
\mathbf{x}_N^\top
\end{pmatrix} 
$$

In [None]:
# code X^t X

## Optimal parameters ##

$$
 \mathbf{w}^* = \left ( \mathbf{X}^\top \mathbf{X}^\top \right ) ^{-1} \mathbf{X}^\top \mathbf{y}
$$

### Operations ###
1. $\mathbf{X}^\top$ -> matrix transpose 
2. $\mathbf{X}^\top\mathbf{y}$ -> matrix-vector multiplication 
3. $\left (\mathbf{X}^\top\mathbf{X}\right ) ^{-1}$ -> matrix inversion

In [None]:
# inclass excercise
def linear_model_solver(X,y):
    
    return w # optmal parameters

In [None]:
# test our model
X, y = get_data(25)
w = linear_model_solver(X,y)

x_grid = np.linspace(-1.,1.,250)
X_grid = np.column_stack((np.ones_like(x_grid), x_grid))
y_pred = #complete

plt.scatter(X[:,-1], y, label='data')
plt.plot(x_grid, y_pred,label='prediction')
plt.xlabel(r'$x$', fontsize=18)
plt.ylabel(r'$f(x)$', fontsize=18)
plt.ylim(-3., 2.)
plt.legend()

## Beyond Linear Models ##

Let's revise polynomials:

How many terms if we have a second-order polynomial and $d=3$?
$$
\begin{align}
(1+x_1+x_2+x_3)^3 &= (1+x_1+x_2+x_3)(1+x_1+x_2+x_3)^2 \\
&= 1+3x_1+3x^2_1+x^3_1+3x_2+6x_1x_2+3x^2_1x_2 \\ 
& +3x_2^2+3x_1x_2^2+x_2^3 +3x_3+6x_1x_3+3x_1^2x_3 \\
& +6x_2x_3+6x_1x_2x_3+3x_2^2x_3+3x_3^2 \\
& +3x_1x_3^2+3x_2x_3^2+x_3^3
\end{align}
$$

This is simply a new representation of $x$
$$
\phi(\mathbf{x}) = [1, x_1, x_2, x_3, \cdots, x_i x_j, \cdots, x_i^{ m} x_j^{p}, \cdots, x_i^{ m} x_j^{p}x_{\ell}^{r}]
$$

**Linear models on basis-set expansion**

$$
    f(\mathbf{x},\mathbf{w}) = \sum_{i=0}^d w_i \phi(\mathbf{x}) = \mathbf{w}^\top \phi(\mathbf{x})
$$


* **Loss function**,
$$
    \begin{align}
    {\cal L}(\mathbf{w}) &= \frac{1}{2}\sum_i^N (y_i - f(\mathbf{x}_i,\mathbf{w}))^2 = \frac{1}{2}\sum_i^N (y_i - \mathbf{w}^\top \phi(\mathbf{x}_i))^2 \\
    &= \frac{1}{2} \left (\mathbf{y} - \Phi(\mathbf{x})\mathbf{w} \right)^\top \left (\mathbf{y} -  \Phi(\mathbf{x})\mathbf{w} \right)
    \end{align}
$$
Homework, proof the above equations.


1. What is $\Phi(\mathbf{x})$?
2. What is the form of the **optimal** parameters $\mathbf{w}^*$?

In [None]:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import KFold

In [None]:
def polynomial_features(X,deg):
    poly = PolynomialFeatures(deg) 
    Phi = poly.fit_transform(X)
    return Phi

In [None]:
def polynomial_model_solver(X,y,deg):
    
    return w

Let's check with last lecture's example.


In [None]:
N = 10
X, y = get_data(N)
x_grid = np.linspace(-2.,2.,250)

fig, ax = plt.subplots(figsize=(6, 5))

w_ = [] # to save the number of parameters
p_ = np.arange(1, 13, 1, dtype=np.int32) # polynomial degrees
for p in p_:  # loop over different degrees
    w = polynomial_model_solver(X,y,p)
    Phi = polynomial_features(X,p)
    y_pred = #complete prediction use x_grid
    
    plt.plot(x_grid, y_pred, label='p=%s' % p)
    w_.append(np.pad(w, (0, 13-w.shape[0]),
              mode='constant', constant_values=0))
plt.scatter(X[:,-1], y, s=75, label='data')
plt.legend(fontsize=5)
plt.xlabel(r'$x$', fontsize=18)
plt.ylabel(r'$f(x)$', fontsize=18)
# plt.savefig('Figures/polyfit_2.png',dpi=1800)

fig, ax0 = plt.subplots(1, 1)
c = ax0.pcolor(np.abs(np.array(w_)), edgecolors='k', linewidths=4)
fig.colorbar(c, ax=ax0, label=r'$|w_i|$')
ax0.set_xlabel(r'$w_i$')
ax0.set_yticks(np.arange(p_.shape[0])+0.5, p_)
ax0.set_ylabel('Poly degree')
fig.tight_layout()
plt.show()

## Extra ##

Let's revise the operation $\mathbf{X}\mathbf{w}$. 

$$
 \mathbf{X}\mathbf{w}  =  \begin{pmatrix}
\mathbf{x}_1^T\\ 
\vdots \\
\mathbf{x}_N^T
\end{pmatrix} \mathbf{w} = \begin{pmatrix}
[ x_1^0, & \cdots,& x_1^d ] \\ 
\cdots  & \cdots & \cdots \\
[ x_N^0, & \cdots, & x_N^d ] 
\end{pmatrix} \begin{pmatrix} 
w_0\\
\vdots \\
w_d
\end{pmatrix}
$$
