# Multiple Linear regression without Scikit Learn

There's nothing better to understand the gradient descent algorithm than to code it from scratch. What? you have heard this before ? This time we are switching to gradient descent for multiple linear regression!

Don't hesitate to come back to your Machine Learning course on linear regression to refresh your memory. 

Our goal will be to code a multiple linear regression such as : 

$f(x) = \beta \times x + \beta_0 = \beta_1 \times x_1 + \dots + \beta_p \times x_p + \beta_0$

* Import the following libraries: 
  * Numpy 

In [45]:
import numpy as np 
from sklearn import datasets
import plotly.express as px


* Define a `Model` class that will take two methods: 
  1. `__init__(self, data)`, where `data` will be the dataset containing the training variables. It's the class builder which will allow you to define an attribute $\beta_0$ (`beta_0` in your code) and an attribute $\beta$ (`beta` in your code). These attributes represent the coefficients/parameters of the model an we will be initialize them randomly using Numpy (cf: `np.random.randn`).
`beta` will have to contain a number of random values equal to the number of training variables.
  2. `__call__(self, x)`, a special method that will turn our class into a callable which will return $\beta \times x + \beta_0$ when called. 
  
  ⚠️ we are now working with matrices and vectors, therefore you will need to use operations that work for these objects ⚠️

In [46]:
bob = np.array([1, 2, 3])
marcel = np.array([1, 8, 0])
# bob.dot(marcel)
bob @ marcel.T


17

In [76]:
bob = np.array([1, 2, 3])                             # vecteur colonne
marcel = np.array([[1, 8], [10,20], [100,200]])       # 3 lignes, 2 colonnes

print(bob)
print(bob.shape)
print()

print(marcel)
print(marcel.shape)
print()

print("Fait le produit scalaire du vecteur colonne bob et de chaque colonne marcel")
print("Retourne un vecteur colonne de 2 lignes")
print(bob @ marcel)
print((bob @ marcel).shape)

print()
print(bob.dot(marcel))
print((bob.dot(marcel)).shape)


[1 2 3]
(3,)

[[  1   8]
 [ 10  20]
 [100 200]]
(3, 2)

Fait le produit scalaire du vecteur colonne bob et de chaque colonne. Retourne un vecteur colonne de 2 lignes
[321 648]
(2,)

[321 648]
(2,)


In [58]:
print(bob.dot(marcel))

[21 48]


In [48]:
class Model():
  def __init__(self,data):
    np.random.seed(42)
    
    # nb_features = len(data)
    nb_features = data.shape[1]               # !!!! Pas len

    self.beta = np.random.randn(nb_features)
    self.beta_0 = np.random.randn(1)

    print("beta : ", self.beta)
    print(type(self.beta))
    print(self.beta.shape)

    print()
    print("beta 0: ", self.beta_0)
    print(type(self.beta_0))
    print(self.beta_0.shape)
  


  # C'est la version vectorielle de : self.beta_1*x + self.beta_0
  # Voir exo précédent
  
  # beta est un n-vect. x est un n-vect OU une matrice 442 lignes x n colonnes
  # pour faire le produit matriciel faut que x soit un vect colonne => transposé
  # Si on considere qu'on a 2 vecteurs on peut faire dot
  def __call__(self, x):
    return self.beta @ x.T + self.beta_0
    # return self.beta @ x.transpose() + self.beta_0
    # return self.beta.dot(x.transpose()) + self.beta_0
    
    # Je garde la notation produit scalaire des 2 vecteurs + b0
    # return self.beta.dot(x) + self.beta_0

* Import `sklearn.datasets`
  * Use the `load_diabetes()` function to load the diebetes dataset in an object called `diabetes`.
  * Print the `DESCR` attribute of the diabetes object
  * Save the content of the `data` attribute in an object named `diabetes_data`
  * Save the content of the `target` attribute in an object named `y`

In [49]:
df = datasets.load_diabetes()
diabetes_data = df.data
y = df.target


In [50]:
print(len(diabetes_data))
print(diabetes_data.shape)

442
(442, 10)


In [51]:
import pandas as pd

diabetes_df = pd.DataFrame(diabetes_data)
diabetes_df.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641


In [52]:
print(type(diabetes_data))
print(diabetes_data.shape)
print(y.shape)

<class 'numpy.ndarray'>
(442, 10)
(442,)


* Create an instance of your class `Model` and display `beta_0` and `beta`

In [53]:
# Va calculer beta0 et le vecteur beta pour la taille de diabetes_data
model = Model(diabetes_data)



beta :  [ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
  1.57921282  0.76743473 -0.46947439  0.54256004]
<class 'numpy.ndarray'>
(10,)

beta 0:  [-0.46341769]
<class 'numpy.ndarray'>
(1,)


* Try doing a first "regression" by running `model(diabetes_data[0,:])`. 
NB: If you don't have the same values as this notebook in output, this is normal since you have initialized your values randomly. 

In [54]:

# ! BIEN VOIR : on passe la premiere ligne (10 valeurs)
model(diabetes_data[0, :])

array([-0.44918067])

In [55]:
# BIEN VOIR : on passe la matrice entière (442 lignes, 10 colonnes)

model(diabetes_data)

array([-0.44918067, -0.45589502, -0.45771637, -0.61984002, -0.44881756,
       -0.550924  , -0.55308199, -0.31717212, -0.50309643, -0.58730845,
       -0.57130368, -0.54082745, -0.44635102, -0.48387345, -0.40174933,
       -0.45171479, -0.27968973, -0.41108348, -0.57390173, -0.53677669,
       -0.57906058, -0.49211392, -0.49011248, -0.37743941, -0.51895877,
       -0.66365533, -0.55673145, -0.45351227, -0.61043403, -0.28764662,
       -0.47033211, -0.58379631, -0.37631679, -0.30774322, -0.57850968,
       -0.38171861, -0.36152731, -0.51942106, -0.31431692, -0.60903986,
       -0.43776635, -0.66967106, -0.60605254, -0.38472769, -0.47960819,
       -0.46721094, -0.58847992, -0.70037719, -0.39755505, -0.5734833 ,
       -0.42387308, -0.42767024, -0.47684303, -0.30892654, -0.38646623,
       -0.41907887, -0.55687358, -0.52939462, -0.21300692, -0.42595264,
       -0.49266141, -0.63212634, -0.45536384, -0.65455112, -0.50557262,
       -0.46546158, -0.58509762, -0.5022422 , -0.52900267, -0.47

* This value corresponds to a random prediction of your model. But we don't have any data yet. This time, let's use `sklearn` to import data. 

* Visualize `y` against the predictions using `plotly`.

In [35]:
fig = px.scatter(x=y, y=model(diabetes_data))     # 
fig.add_scatter(x=y, y=y)                         # rouge parfaite
fig.show()


* Now we need to define a cost function. For a linear regression, we could use MSE : 

`np.mean((model(input) - y)**2)`

  * Create a function which we'll call `mse` (for mean square error). This function will take two arguments `y_pred` & `y_true`.

In [322]:
def mse(y_pred, y_true):
  return np.mean((y_pred - y)**2)

def rmse(y_pred, y_true):
  return np.sqrt(mse(y_pred, y_true))

* Test your function by inserting `model(diabetes_data)` & `y` as arguments. 
* Calculate the rmse as well

In [323]:
print(mse(model(diabetes_data), y))
print(rmse(model(diabetes_data), y))

29324.489790449898
171.24394818635167


* We're going to need to compute the gradients for our variable `model.beta` and our constant `model.beta_0`. To do this, we're going to need to review our derivative formulas. Since we're not here to do math, we're going to give you these formulas. 
  * `derive_model_beta = 2/len(y_pred)*(x.transpose() @ (y_pred - y_true))`
  * `derive_model_beta_0 = 2/len(y_pred)*(np.sum(y_pred - y_true))`

  * Feel free to read this article if you want to know more about the calculation of the derivative: [Gradient Descent Derivation](https://mccormickml.com/2014/03/04/gradient-descent-derivation/)


  * So using the above formulas, code the first function `derivative_mse_beta` that will take the arguments: 
    * `x` --> the values for your variable / `y_pred` --> the values predicted by your model / `y_true` --> the values of the target variable


In [324]:
def derivative_mse_beta(y_pred, y_true, x):
  return 2/len(y_pred)*(x.transpose() @ (y_pred - y_true))           # !!! @ = produit matriciel

In [325]:
derivative_mse_beta(model(diabetes_data), y, diabetes_data)

array([-1.37029187, -0.31058722, -4.29496067, -3.23293097, -1.55270832,
       -1.27470899,  2.89022792, -3.15134777, -4.14321887, -2.80549582])

* Test you function

* So using the above formulas, now code the `derivative_mse_beta_0` function which will take the arguments :
    * `y_pred` --> the values predicted by your model / `y_true` --> the actual values to predict

In [326]:
def derivative_mse_beta_0(y_pred, y_true):
  return 2/len(y_pred)*(np.sum(y_pred - y_true))

* Test you function

In [327]:
derivative_mse_beta_0(model(diabetes_data), y)

-305.9129684066066

* We will try to see if we can minimize our cost function using the two gradients above. To update our variables, we need to subtract their respective gradients. Ex: 
  * `param = param - learning_rate * gradient`

  * Set a `learning_rate` to 0.1
  * Try to apply your formula on `model.beta` and `model.beta_0`.

In [328]:
learning_rate = 0.1

b0 = model.beta_0   # !!! Copie AVANT d'aller modifier model.beta_0 & model.beta_1
b  = model.beta   # Bien voir que model(diabetes_X) utilise model.beta_0 & model.beta_1

model.beta_0 -= learning_rate * derivative_mse_beta_0(model(diabetes_data), y)
model.beta -= learning_rate * derivative_mse_beta(model(diabetes_data), y, diabetes_data)

In [329]:
model.beta

array([ 1.69861882,  1.13555971,  0.83071733,  0.12902708,  0.31098027,
       -0.46414237, -0.04682821,  0.89997688,  0.91862996, -1.58321362])

In [330]:
model.beta_0

array([29.7682968])

We see that the values of the two parameters have changed, let's see how it affected the predictions of the model. 
Visualize y vs the model's predictions.

In [331]:
fig = px.scatter(x=y, y=model(diabetes_data))     # 
fig.add_scatter(x=y, y=y)                         # rouge parfaite
fig.show()

We notice the predictions got a little closer to our real data
* Recalculate your MSE

In [332]:
print(mse(model(diabetes_data), y))


20894.226165988213


* Our MSE has dropped a lot! This is good news but the process of gradient descent is iterative. So you'll have to do it several times before arriving at accurate predictions. 
  * By making a loop, try to repeat the process from above 10,000 times. 
  * Display every 1000 epochs: mse, model.beta & model.beta_0 

In [333]:
learning_rate = 0.1

for i in range(10_000):
  model.beta_0  -= learning_rate * derivative_mse_beta_0(model(diabetes_data), y)
  model.beta    -= learning_rate * derivative_mse_beta(model(diabetes_data), y, diabetes_data)
  if(i%1000 == 0):
    print(f"epoch : {i} mse = {mse(model(diabetes_data), y):.3f}")
    # print(f"y = {model.beta:0.3f} x  + {model.beta_0:0.3f} ")
    print(model.beta.round(1))
    print()

epoch : 0 mse = 15496.069
[ 1.8  1.2  1.3  0.5  0.5 -0.3 -0.3  1.2  1.3 -1.3]

epoch : 1000 mse = 3407.307
[  49.3  -32.3  259.6  180.8   36.3   10.1 -148.1  134.5  230.1  127.2]

epoch : 2000 mse = 3079.412
[  38.9  -93.8  368.4  243.7    7.2  -37.3 -186.   143.9  312.4  142.5]

epoch : 3000 mse = 2965.906
[  24.8 -141.9  428.3  274.3  -17.   -72.8 -199.1  137.9  357.2  136.5]

epoch : 4000 mse = 2919.212
[  14.1 -175.2  464.8  291.8  -32.2  -94.5 -204.5  132.3  386.7  126.6]

epoch : 5000 mse = 2899.015
[   6.8 -197.5  487.8  302.5  -41.6 -106.8 -206.9  128.3  407.6  116.6]

epoch : 6000 mse = 2889.886
[   2.  -212.1  502.5  309.3  -47.7 -113.5 -207.9  125.5  422.9  107.7]

epoch : 7000 mse = 2885.547
[  -1.2 -221.5  512.   313.7  -52.1 -116.7 -208.   123.4  434.3  100.1]

epoch : 8000 mse = 2883.356
[  -3.2 -227.6  518.1  316.5  -55.6 -117.9 -207.8  121.9  443.    93.9]

epoch : 9000 mse = 2882.163
[  -4.5 -231.4  522.   318.4  -58.6 -117.7 -207.4  120.6  449.8   88.8]



* Using `plotly`, view your model and actual values again

In [334]:
fig = px.scatter(x=y, y=model(diabetes_data))     # 
fig.add_scatter(x=y, y=y)                         # rouge parfaite
fig.show()

**We've got a nice regression this time!** 