# Linear Regression

In linear regression we try to fit a line(hyperplane).  
$ \hat{y}(w,b) = wx + b $ (in case of <b>one</b> independent variable)  
or  
$ \hat{y} = w_{1}x_{1} + w_{2}x_{2} + ..... + w_{n}x_{n} + b$ (in case of <b>multiple</b> independent variables)

Main idea is to <i><b>minimize value cost function</b></i> by adjusting model parameters w and b using <i><b>gradient descent optimization</b></i>   
this would ensure we get the best fit line

## Cost function




In this case we are using MSE (mean squared error) as cost function  

  
$ L(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} (y_i  - \hat{y}_i)^{2} $

Here $ y $ is true values and $ \hat{y} $ is predicted value and n is total number of values  
  
$ \hat{y}=wx + b $

### partial derivatives of cost function (can be easily calculated using chain rule or as you wish) 



w.r.t weight,
    
$ d(L)_w = -(2/n) \sum_{i=1}^{n} x_{i}(y_i  - \hat{y}_i) $   
  
w.r.t bias,    
  
  $ d(L)_w = -(2/n) \sum_{i=1}^{n} (y_i  - \hat{y}_i) $

## Gradient descent


it is an <b>optimaization algorithm</b> , used to <b>minimize the cost function</b>. it updates model parameters.  
Gradient means derivative and descent means decreasing or downward.  
  
$ w = w - R * d(L)_w $  
$ b = b - R * d(L)_b $

here w is weight, b is bias, R is learning rate, $ d(L)_w $  is derivative of cost function wrt to weight and $ d(L)_b $ is derivative of cost function wrt to bias  
Learning rate is a tuning parameter, it denotes how much jump to take per iteration   
Negative sign denotes that we are going down the slope

## Note: Since Loss function contains a squared term $ (y - \hat{y})^2 $ so its graph is parabolic, hence after initally selecting random weight and bias we need to constanly go down to achieve minima which is the point where our loss is minimum and giving us best values of w and b.

In [14]:
import numpy as np

class LinearRegression:

    def __init__(self, learning_rate: float , no_of_iterations: int):
        self.R = learning_rate
        self.n = no_of_iterations
    
    def fit(self, X_train: np.array, Y_train: np.array):
        self.rows, self.columns = X_train.shape
        self.w = np.ones(self.columns)
        self.b = 1
        self.X = X_train
        self.Y = Y_train
        for i in range(self.n):
            self.update_weights()
            print(f"for iternation {i+1} weight is {self.w} and bias is {self.b}")

    def update_weights(self):

        self.w -= (self.R)*(-2/self.rows)*(((self.X.T).dot((self.Y-self.predict(self.X)))))
        self.b -= (self.R)*(-2/self.rows)*(np.sum((self.Y-self.predict(self.X))))

    def predict(self, X_test: np.array):
        return X_test.dot(self.w) + self.b

In [15]:
import pandas as pd
df = pd.read_csv("D:/Ml/part1/data/salary_data.csv")



In [16]:
df.head()

Unnamed: 0,YearsExperience,Salary
0,1.1,39343
1,1.3,46205
2,1.5,37731
3,2.0,43525
4,2.2,39891


In [17]:
X = df.drop(columns=["Salary"]).values
Y = df['Salary'].values
type(X)

numpy.ndarray

In [18]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,test_size=0.2, random_state=42)

In [19]:
X_train.shape

(24, 1)

In [20]:
Y_train.shape

(24,)

In [27]:
model = LinearRegression(0.00005,1000)
model.fit(X_train=X_train, Y_train=Y_train)

for iternation 1 weight is [47.31396304] and bias is 8.396118381672135
for iternation 2 weight is [93.46101209] and bias is 15.767558369810306
for iternation 3 weight is [139.44174768] and bias is 23.114408707362582
for iternation 4 weight is [185.25676818] and bias is 30.43675781800018
for iternation 5 weight is [230.90666979] and bias is 37.73469380726617
for iternation 6 weight is [276.39204658] and bias is 45.008304463719995
for iternation 7 weight is [321.7134905] and bias is 52.25767726007794
for iternation 8 weight is [366.87159133] and bias is 59.48289935434946
for iternation 9 weight is [411.86693676] and bias is 66.68405759096943
for iternation 10 weight is [456.70011235] and bias is 73.86123850192631
for iternation 11 weight is [501.37170157] and bias is 81.01452830788625
for iternation 12 weight is [545.88228577] and bias is 88.14401291931318
for iternation 13 weight is [590.23244421] and bias is 95.24977793758487
for iternation 14 weight is [634.42275409] and bias is 102.3

In [28]:
#calculating predicted values
Y_pred = model.predict(X_test=X_test)

In [29]:
#actual answers
Y_test

array([112635,  67938, 113812,  83088,  64445,  57189])

In [30]:
mean_absolute_error = np.sum(abs(Y_pred-Y_test))/len(Y_test)
mean_absolute_error

np.float64(11427.225194686946)

In [31]:
mean_squared_error = np.sum((Y_pred-Y_test)**2)/len(Y_test)
mean_squared_error

np.float64(162336980.44131398)

inferences from this learning:
- overfitting isn't possible in linear regression
- adjusting learning rate and no. of iterations improves a lot
- do standardization of the data for good results
- use only in case of linear relationships
- tuning parameters like learning rate and no. of iterations also effect model parameters