<h1 align="center">Linear Regression</h1> 

If you want to start machine learning, Linear regression is the best place to start. Linear Regression is a regression model, meaning, it’ll take features and predict a continuous output, eg : stock price,salary etc. Linear regression as the name says, finds a linear curve solution to every problem.

## Basic Theory :
LR allocates weight parameter, $\Large \beta_i$ (beta) for each of the training features $\Large X_i$. The predicted output ($\Large Y_i$) will be a linear function of features $\Large X_i$ and $\Large \beta_i$ coefficients.

<img src="https://miro.medium.com/max/1400/1*GSAcN9G7stUJQbuOhu0HEg.png" width="500">


During the start of training, each theta is randomly initialized. But during the training, we correct the $\beta_i$ corresponding to each feature such that, the loss (metric of the deviation between expected and predicted output) is minimized. [Gradient descend algorithm](https://en.wikipedia.org/wiki/Gradient_descent) will be used to align the $\beta_i$ values in the right direction. In the below diagram, each blue dots represent the training data and the blue line shows the derived solution.

<img src="https://scipy-lectures.org/_images/sphx_glr_plot_linear_regression_001.png" width="500">

### Loss function :
In LR, we use [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) as the metric of loss. The deviation of expected and actual outputs will be squared and sum up. Derivative of this loss will be used by gradient descend algorithm.

### Advantages :
- Easy and simple implementation.
- Space complex solution.
- Fast training.
- Value of $\beta_i$ coefficients gives an assumption of feature significance.

### Disadvantages :
- Applicable only if the solution is linear. In many real life scenarios, it may not be the case.
- Algorithm assumes the input residuals (error) to be normal distributed, but may not be satisfied always.
- Algorithm assumes input features to be mutually independent(no co-linearity).

### Hyperparameters :
- Regularization parameter $(λ)$ : Regularization is used to avoid over-fitting on the data. Higher the $(λ)$, higher will be regularization and the solution will be highly biased. Lower the $(λ)$, solution will be of high variance. An intermediate value is preferable.
- learning rate $(α)$ : it estimates, by how much the $\beta_i$ values should be corrected while applying gradient descend algorithm during training. $\beta_i$ should also be a moderate value.

### Assumptions for LR :
- Linear relationship between the independent and dependent variables.
- Training data to be homoskedastic, meaning the variance of the errors should be somewhat constant.
- Independent variables should not be co-linear.

### Colinearity & Outliers :
Two features are said to be colinear when one feature can be linearly predicted from the other with somewhat accuracy.
- colinearity will simply c the standard error and causes some significant features to become insignificant during training. Ideally, we should calculate the colinearity prior to training and keep only one feature from highly correlated feature sets.
Outlier is another challenge faced during training. They are data-points that are extreme to normal observations and affects the accuracy of the model.

- outliers inflates the error functions and affects the curve function and accuracy of the linear regression. Regularization (especially L1 ) can correct the outliers, by not allowing the θ parameters to change violently.
-During Exploratory data analysis phase itself, we should take care of outliers and correct/eliminate them. Box-plot can be used for identifying them.

In [101]:
# Import needed moduls

import pandas as pd
import numpy as np

import sklearn.datasets
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

We will use simple dataset in scikit learn. If you already have an idea of the dataset you would like to use from the package, you can specify it. In the following example, we will import the diabetes dataset. This dataset contains data from diabetic patients and contains certain features such as their bmi, age , blood pressure and glucose levels which are useful in predicting the diabetes disease progression in patients.

In [88]:
# Loading dataset
X, y = datasets.load_diabetes(return_X_y = True , as_frame = True)

# Checking value distributions
X.describe()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
count,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0
mean,-2.511817e-19,1.23079e-17,-2.245564e-16,-4.79757e-17,-1.3814990000000001e-17,3.9184340000000004e-17,-5.777179e-18,-9.04254e-18,9.293722000000001e-17,1.130318e-17
std,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905,0.04761905
min,-0.1072256,-0.04464164,-0.0902753,-0.1123988,-0.1267807,-0.1156131,-0.1023071,-0.0763945,-0.1260971,-0.1377672
25%,-0.03729927,-0.04464164,-0.03422907,-0.03665608,-0.03424784,-0.0303584,-0.03511716,-0.03949338,-0.03324559,-0.03317903
50%,0.00538306,-0.04464164,-0.007283766,-0.005670422,-0.004320866,-0.003819065,-0.006584468,-0.002592262,-0.001947171,-0.001077698
75%,0.03807591,0.05068012,0.03124802,0.03564379,0.02835801,0.02984439,0.0293115,0.03430886,0.03243232,0.02791705
max,0.1107267,0.05068012,0.1705552,0.1320436,0.1539137,0.198788,0.1811791,0.1852344,0.1335973,0.1356118


In [96]:
# Checking fist 5 sample on target column

y.head()

0    151.0
1     75.0
2    141.0
3    206.0
4    135.0
Name: target, dtype: float64

As we see all values is normalised (in range from -1 to 1). This is prefer to do with linear models for better working

In [89]:
# Spliting dataset on Train and test subsets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [90]:
# Creating a model

model = LinearRegression()

In [91]:
# Fitting the model on train subset

model.fit(X_train, y_train);

In [92]:
# Checking score (R^2)

model.score(X_test, y_test)

0.5153335792084508

In [93]:
# Creating more understandable score function

from sklearn.metrics import mean_squared_error
def score_rmse(y_test, y_pred):
    return np.sqrt(mean_squared_error(y_test, y_pred))

score_rmse(y_test, model2.predict(X_test))

54.05646134525746

In [94]:
# Checking score (Root Mean Squared Error)

y_preds = model.predict(X_test)
score_rmse(y_test, y_preds)

52.34911173998902

We see that we are wrong on average by `52`