# Linear Regression 

### Assumptions
1) Data and features must exhibit linear relationship
2) No/low  Mult-collinearity i.e features shouldn't be correlated with each other
3) Homoscedasticity- Error variance must be constant

# Mathematical Fourmulation
We model the relationship as:
 $$ \hat{y} = wx + b$$
 <br>
where

$y$ : Dependent variable(what we are trying to predict)<br>
$x$ : Independent varible(input matrix)<br>
$w$ : weight parameter that needs to be learned(slope)<br>
$b$ : bias parameter (intercept)<br>

# Loss function fourmulation

We use Mean squared error as it provides a smooth convex surface for optimization which allows for one global minimum and smooth gradient descent

$$
MSE =L(w,b) = \frac{1}{n} \sum_{i = 1}^{n}{(y - \hat{y})^2}
$$
<br>

Vectorized form:
$$
L(w,b) = \frac{1}{n} || y - (Xw + b)||^2
$$

# Gradient Descent

Gradient with respect to weights(w):

$$
\frac{\partial L}{\partial w} = - \frac{2}{n} X (y-\hat{y})
$$


Gradient with respect to bias(b):

$$
\frac{\partial L}{\partial b} =  \frac{2}{n}(y - \hat{y})
$$

# Coding

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Create synthentic dataset
np.random.seed(42)

X = 2 * np.random.rand(100,1)
y = 4 + 3*X + np.random.randn(100,1)

In [3]:
# Initialize paramters
w = np.random.randn(1,1)
b = 0.0

## Define functions

### Prediction

In [4]:
def predict(X,w,b):
    return X@w + b

### Loss

In [6]:
def MSE(y,y_hat):
    return (np.mean(y-y_hat)** 2)