# Linear Regression from Scratch

| | Egg price  | Gold price    | Oil price   | GDP   |
|---:|:-------------|:-----------|:------|:------|
| 1 | 3  | 100       | 4   | 21   |
| 2 | 4  | 500    | 7   | 43     |

### Notations and Definitions

In [133]:
import numpy as np

#sample 1  $x^1$
x1 = np.array([3, 100, 4])
y1 = np.array([21])

#what's the idea of prediction?  What is machine learning?
#- find the weights that can bring you from x1 to y1

#first sample
#3 * w1 + 100 * w2 + 4 * w3 = 21
#3 * 1  + 100 * 1  + 4 * 1  = 107
#3 * 7  + 100 * 1  + 4 * -25  = 21

#machine learning is trying to find the `best` weights

#2nd sample
#4 * w1 + 500 * w2 + 7 * w3   = 43
#4 * 7  + 500 * 1  + 7 * -25  = 353 

#machine learning is trying to find the `best` weights ACROSS all samples....


In [134]:
#Definition of terms and notations

#2 samples
#3 features - egg price, gold price, oil price
    #features are the variables used for predicting the label
    #factors, independent variables, predictors, X

#egg price - x_1 --> always a vector,  e.g., [3, 4]
#gold price - x_2 --> always a vector, e.g., [100, 500]
#oil price - x_3 --> always a vector, e.g., [4, 7]
#we call egg price + gold price + oil price - whole `feature matrix` --> \mathbf{X}
    
#1 label - gdp
    #label is the variable that we want to predict....
    #target, outcome, y
    #y_1 = y = a vector of labels, e.g., [21, 43]
    
#Tips: small and big
# small mean

Math notations:

- normal a -> scalar (one number)
- bold  $\mathbf{a}$  --> vector (a 1D numpy array)
- bold  $\mathbf{A}$  --> matrix (a 2D numpy array....)

- $\mathbf{x}_1^2$  --> feature 1, second sample

### How dot product works?

In [135]:
X = np.array([  [3, 100, 4] , [4, 500, 7]  ])
X.shape  #(2, 3) means 2 samples = m, 3 features = n

(2, 3)

In [136]:
#weights = theta = params
theta = np.array([7, 1, -25])
theta.shape  #weights must be the sample shape as X.shape[1]

(3,)

In [137]:
# X.dot(theta)
#to be able to dot, the number should be same in the close pair
#(2, 3)  @ (3, ) = (2, )
#(4, 6)  @ (6, 1) = (4, 1)
#(4, 6, 1) @ (1, 2) = (4, 6, 1, 2)
X @ theta

array([ 21, 353])

In [138]:
X[0][0] * theta[0] + X[0][1] * theta[1] + X[0][2] * theta[2]

21

### Steps for linear regression / gradient descent

Step 1: Randomize your weight
  - weight.shape (n, )

Step 2: Use this inital weight to predict
  - you will get errors

Step 3: Find the derivative

$\mathbf{X}^\top (\mathbf{\hat{y}} - \mathbf{y})$

Step 4: Change the weight

$\mathbf{w} = \mathbf{w} - \alpha * \mathbf{X}^\top (\mathbf{\hat{y}} - \mathbf{y})$

Step 5:  Repeat Step 2, 3, 4, until you either (1) reach the max iteration, or (2) your validation loss does not decrease anymore

### Let's code

In [139]:
## step 0. load data

In [233]:
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
X= diabetes.data
y = diabetes.target
X.shape,y.shape

((442, 10), (442,))

In [234]:
X[0],y[0]

(array([ 0.03807591,  0.05068012,  0.06169621,  0.02187235, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990842, -0.01764613]),
 151.0)

In [235]:
X.ndim,y.ndim

(2, 1)

In [236]:
diabetes.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

#

In [237]:
# y label is blood glucose
m = X.shape[0]
n = X.shape[1]

### step 1 train test split

In [238]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=112)

In [239]:
X_train.shape[0] == y_train.shape[0]

True

#### step 2 standardization

In [240]:
import numpy as np

In [241]:
from sklearn.preprocessing import StandardScaler as SS
sc = SS()
X_train_temp = X_train.copy()

In [242]:
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#### step 4 add intercept to X


In [243]:
# X_train = np.insert(X_train,0,1,axis = 0)
if X_train_temp.shape[1] == X_train.shape[1]:
    X_train = np.insert(X_train, 0, 1, axis=1)
    X_test = np.insert(X_test, 0, 1, axis=1)


In [244]:
X_train.shape

(309, 11)

#### step 5 Fitting!!!! gradient descent

In [284]:
theta = np.ones((n+1,))

In [285]:
theta.shape,theta

((11,), array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))

In [286]:
# y_hat = X_train @ theta
# y_hat[:5]

In [320]:
def _predict(X,theta):
    return X @ theta
def _grad(X,y_hat,y):
    return X.T @ (y_hat - y)
def _mse(y_hat,y):
    return ((y_hat-y)**2).sum()

In [321]:
import random

In [322]:
def fit(n,X,theta,y):
    for i in range(n):
        y_hat = _predict(X,theta)
        deri = _grad(X,y_hat,y)
        # if random.randint(1,100) > 99:
        #     theta = theta - .0001 * deri
        theta = theta - .0001 * deri
        if i % 100 == 0:
            print(f'{i}   {_mse(y_hat,y)}')
    return theta


In [324]:
print(theta := fit(1000,X_train,theta,y_train))

0   8749610.83541844
100   875930.551084125
200   859934.5319026619
300   859084.3358702151
400   858359.8717008021
500   857679.5879588674
600   857038.5414813091
700   856434.2198352742
800   855864.4276162357
900   855327.14204586
[151.64401294   1.38942999  -7.77449288  26.57372282  17.1334062
 -13.69775588   3.21753493  -2.59255342   6.93919947  27.32725322
   1.03648346]


#### step 6

In [325]:
yhat_test = _predict(X_test,theta)
print(_mse(yhat_test,y_test))

426741.53948540566
