<a href="https://colab.research.google.com/github/SHAZAN01/Machine-Learning/blob/main/Multiple_Linear_Regression(Diabetes).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Objective -

### Multiple Linear Regression on Diabetes Dataset

#### Description
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

The datasets consists of several medical predictor variables and one target variable, Outcome.

* Preg: Number of times pregnant
* Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
* BloodPressure: Diastolic blood pressure (mm Hg)
* SkinThickness: Triceps skin fold thickness (mm)
* Insulin: 2-Hour serum insulin (mu U/ml)
* BMI: Body mass index (weight in kg/(height in m)^2)
* DiabetesPedigreeFunction: Diabetes pedigree function
* Age: Age (years)
* Outcome: Class variable (0 or 1)

In [None]:
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
import pandas as pd
import numpy as np

In [None]:
# Loading the diabetes dataset
diabetes = pd.read_csv("diabetes.csv")

diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
diabetes_X = diabetes[["BloodPressure","Age"]]
diabetes_y = diabetes["Glucose"]

print(diabetes_X.shape, diabetes_y.shape)

(768, 2) (768,)


To learn the linear-regression model from the training data, and predict the values for the test data, we will  perform the train-test split.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
 X_train, X_test, y_train, y_test = train_test_split(diabetes_X, diabetes_y, test_size=0.2)

In [None]:
 X_train.shape, X_test.shape, y_train.shape, y_test.shape

((614, 2), (154, 2), (614,), (154,))

There are a few ways to find the best fit line. One of the approaches is the Ordinary Least Squares (OLS) method which is an intutive mathematical method.


In [None]:
# Create a linear regression object
regr = linear_model.LinearRegression()

# Training the model using the training sets
regr.fit(X_train, y_train)

# Make predictions using the testing set
y_pred = regr.predict(X_test)

In [None]:
# The coefficients
print('Coefficients: ', regr.coef_)

# The Intercept
print('Intercept: ', regr.intercept_)

Coefficients:  [0.1498018  0.64459259]
Intercept:  90.07204608831839


In [None]:
mse = mean_squared_error(y_pred,y_test)
rmse=np.sqrt(mean_squared_error(y_pred,y_test))
mae = mean_absolute_error(y_pred,y_test)
r2 = r2_score(y_pred,y_test)
print("Linear Regression Performance Metrics:\n\tMean squared error:\t{}\n\tRoot Mean squared error:{}\n\tMean absolute error:\t{}\n\tR2-score:\t\t{}".format(mse,rmse,mae,r2))

Linear Regression Performance Metrics:
	Mean squared error:	726.5272501431301
	Root Mean squared error:26.9541694389408
	Mean absolute error:	20.72914248757465
	R2-score:		-7.819525934539019


## Elastic Net Regression

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV

alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100]
max_iter = [1000, 10000]
l1_ratio = np.arange(0.0, 1.0, 0.1)
tol = [0.5]

elasticnet_gscv = GridSearchCV(estimator=ElasticNet(),
                                param_grid={'alpha': alpha,
                                            'max_iter': max_iter,
                                            'l1_ratio': l1_ratio,
                                            'tol':tol},
                                scoring='r2',
                                cv=5)

In [None]:
en.fit(X_train,y_train)

In [None]:
en_y_pred=en.predict(X_test)

In [None]:
en_mse = mean_squared_error(en_y_pred,y_test)
en_rmse=np.sqrt(mean_squared_error(en_y_pred,y_test))
en_mae = mean_absolute_error(en_y_pred,y_test)
en_r2 = r2_score(en_y_pred,y_test)
print("ElasticNet Performance Metrics:\n\tMean squared error:\t{}\n\tRoot Mean squared error:{}\n\tMean absolute error:\t{}\n\tR2-score:\t\t{}".format(en_mse,en_rmse,en_mae,en_r2))

ElasticNet Performance Metrics:
	Mean squared error:	726.675685557829
	Root Mean squared error:26.956922776122443
	Mean absolute error:	20.74043514382057
	R2-score:		-7.966821504030607
