## Linear Regression
Linear regression is a machine learning model where it gives a relationship between the independent features and the dependent features in the form of a straight line. The output or the loss is reduced by minimizing the mse using gradient descent.

loss --> loss can be defined as the difference between the original output and the model predicted output.

loss function --> During the model training if we are passing a single datapoint from the dataset and the loss which we get is called the loss function

cost function --> During the model training if we are passing the whole dataset, the loss which we get is called cost function

gradient descentt --> Its a quadratc graph which is employed to decrease the lost function/cost function using different hyperparameter techniques or the regularization techniques to increase the efficiency of the model.


## Problem statement we are solving:-
Implementing Simple linear regression ML model using a basic self created dataset of different height and weight, where the model will get trained on different heights and based on it it will give us respective weights to it. 

#### Importing necessary libraries for operation

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import Ridge, Lasso, ElasticNet, LassoCV

%matplotlib inline

In [15]:
df = pd.read_csv('height-weight.csv')
df.head()

Unnamed: 0,Weight,Height
0,45,120
1,58,135
2,48,123
3,60,145
4,70,160


## Data preprocessing steps:
1) data cleaning --> This is done to clean the noisy data, deal with inconsistent data, handling missing value and handling the outliers.
Outliers == These are the unusual datapoints present in a dataset which does not follow the usual or predictive stats of a dataset 

2) data integration --> This is used to integrate multiple datas which are bought from different data sources to make a clear dataset.

3) data selection --> This is used to select the data, on which we are keen to work and find the hidden patterns.

4) data transformation --> This is done to scale down the values into a particular range so that the model training will be efficient.

5) data reduction --> This is done to remove the less important or highly co related attributes.

what excatly is the meaning of high co relation == For this we need to learn about the co relation, co relation gives us the strenght or how closely two attribute is related to each other, so a highly co related attribute means these attributes are highly related to each other and it will give a numeric value of nearer to 1, and for negatively co related it will tell how much it is not related to the other attribute. It will give value away from 1 i.e 0.


## Current context of the working data

In context of a dataset we are not having any outliers, redundant datas, inconsistent data, missing values so we are done with the data cleaning part

Also the data which we have created is integrated already. So now integration step is done. We have to deal with this piece of data so the selection step is done.

Since we are only having two attributes to focus on so we will not drop any of the attributes, so we are aborting the data reduction step

## Spliting data
we will split the dataset into two parts i.e the tain and test dataset, we will be using train-test split where we will perform 80-20 split, The 80% data will be used to train the data and 20% will be used to test the data

In [16]:
train_df, test_df = train_test_split(df,test_size=0.2,random_state=42)
train_df.shape, test_df.shape

((18, 2), (5, 2))

## Scaling the dataset
Here we will use standard scaling technique to scale the dataset. Also we will separating the independent and dependent varaibales

independent variables --> In a datset the data which is used to train the model is called the independent variables

dependent variables --> In a datset the data which is used to predict as per the trained data is called the dependent variable or labels.

In [17]:
scaler = StandardScaler()
X_train = scaler.fit_transform(train_df['Height'].values.reshape(-1, 1))
x_test = scaler.transform(test_df['Height'].values.reshape(-1, 1))
Y_train = train_df['Weight'].values
y_test = test_df['Weight'].values

## Model Training using Hyper parameter techniques
Basic method --> Here we will be using the linear regression model for model training. While training the model we will be giving it X_train and y_train dataset.

Hyperparameter method --> There are two types of hyper parameter techniques, i.e GridSearchCV and RandomizedSearchCV. To deal with bigger datasets we will be using the RandomizedSearchCV. or else it will be fine using the GridSearchCV

In [10]:
params = {
    'fit_intercept': [True, False],
    'copy_X': [True, False]
}

model = LinearRegression()
grid = GridSearchCV(estimator = model, param_grid = params, scoring = 'neg_mean_squared_error', cv = 5, verbose = 1, n_jobs = -1)
grid.fit(X_train, Y_train)
print("Best parameters found: ", grid.best_params_)


Fitting 5 folds for each of 4 candidates, totalling 20 fits
Best parameters found:  {'copy_X': True, 'fit_intercept': True}


In [12]:
grid

## Model prediction
For predicting the model we will be using the X_test dataset

In [11]:
y_pred = grid.predict(x_test)
y_pred


array([92.36074777, 85.86942521, 39.50283548, 96.99740674, 75.66877547])

## Model Evaluation
Model evaluation is a technique where we will be evaluating the model performance on the basis of the its predicted data and the y_test data. For this we are having different techniques such as MSE(Mean Squared Error), MAE(Mean Absolute Error), r2_score, RMSE(Root Mean Square Error)

In [13]:
mse = mean_squared_error(y_test,y_pred)
mae = mean_absolute_error(y_test,y_pred)
r2 = r2_score(y_test,y_pred)

mse, mae, r2

(83.23803021031198, 8.078703941181505, 0.6981504561563969)

## Regularization techniques
Apart from hyperparameter techniques we are having different regularization techniques. Some of the techniques are Ridge, Lasso, LassoCV, Elastic net.

Ridge --> Its is also known as L2 regularization. Here it will improve the efficiency by tunning the mse by adding a factor of alpha times beta^2.

Lasso --> Its is also known as L1 regularization. Here it will improve the efficiency by tunning the mse by adding a factor of alpha times beta

LassoCV --> Here the alpha will be selected by using the concept of cross validation to increase more efficiency

ElasticNet --> It is the combination of Ridge and lasso

In [43]:
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, Y_train)
ridge_pred = ridge.predict(x_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
print('Ridge Regression Results:')
print('MSE:', ridge_mse, 'MAE:', ridge_mae, 'R2:', ridge_r2)

print('=====================================================================')
print('\n')

lasso = Lasso(alpha=0.1)
lasso.fit(X_train, Y_train)
lasso_pred = lasso.predict(x_test)
lasso_mse = mean_squared_error(y_test, lasso_pred)
lasso_mae = mean_absolute_error(y_test, lasso_pred)
lasso_r2 = r2_score(y_test, lasso_pred)
print('Lasso Regression Results:')
print('MSE:', lasso_mse, 'MAE:', lasso_mae, 'R2:', lasso_r2)

print('=====================================================================')
print('\n')

lasso_cv = LassoCV(alphas=[0.01, 0.1, 1.0, 10.0], cv=5)
lasso_cv.fit(X_train, Y_train)
lasso_cv_pred = lasso_cv.predict(x_test)
lasso_cv_mse = mean_squared_error(y_test, lasso_cv_pred)    
lasso_cv_mae = mean_absolute_error(y_test, lasso_cv_pred)
lasso_cv_r2 = r2_score(y_test, lasso_cv_pred)
print('Lasso CV Results:')
print('MSE:', lasso_cv_mse, 'MAE:', lasso_cv_mae, 'R2:', lasso_cv_r2)

print('=====================================================================')
print('\n')

elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X_train, Y_train)
elastic_net_pred = elastic_net.predict(x_test)
elastic_net_mse = mean_squared_error(y_test, elastic_net_pred)
elastic_net_mae = mean_absolute_error(y_test, elastic_net_pred)
elastic_net_r2 = r2_score(y_test, elastic_net_pred)
print('Elastic Net Results:')
print('MSE:', elastic_net_mse, 'MAE:', elastic_net_mae, 'R2:', elastic_net_r2)



Ridge Regression Results:
MSE: 71.77088406759333 MAE: 7.146491453049137 R2: 0.7397342469263369


Lasso Regression Results:
MSE: 81.81243314675805 MAE: 7.972742618046524 R2: 0.7033201583015737


Lasso CV Results:
MSE: 81.81243314675805 MAE: 7.972742618046524 R2: 0.7033201583015737


Elastic Net Results:
MSE: 72.1945095015539 MAE: 7.1848158218546185 R2: 0.7381980363303093
