# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint

## Learning Objective 

At the end of the experiment, you will be able to understand:

*  Linear Regression
* Diabetes dataset

## Dataset

### Description

The dataset chosen for this experiment is diabetes dataset.  The dataset contains 442 records of diabetes patients. There are 10 columns. The first 9 columns are numeric predictive values and the 10th column  is a quantitative measure of disease progression one year after baseline (target variable).


#### Attribute Information

* Age
* Sex
* Body mass index (bmi)
* Average blood pressure (bp)
* S1
* S2
* S3
* S4
* S5
* S6


** Note ** Each of these 10 feature variables have been mean centered and scaled by the standard deviation times n_samples (i.e. the sum of squares of each column totals 1).

#### Expected time to complete this experiment is : 60 min

### Setup Steps

#### Importing required Packages 

In [0]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

In [0]:
# Load the diabetes dataset from sklearn datasets package
diabetes = datasets.load_diabetes()


# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

To learn the linear-regression model from the training data, and predict the values for the test data, we will  perform the train-test split.

In [0]:
from sklearn.model_selection import train_test_split

In [0]:
 X_train, X_test, y_train, y_test = train_test_split(diabetes_X, diabetes.target, test_size=0.33, random_state=42)

There are a few ways to find the best fit line. One of the approaches is the Ordinary Least Squares (OLS) method which is an intutive mathematical method. 

In future you will  learn another approach called 'Gradient Descent'.  Refer to the following article which explains both the approaches.  
https://towardsdatascience.com/linear-regression-simplified-ordinary-least-square-vs-gradient-descent-48145de2cf76

In [0]:

# Create a  linear regression object
regr = linear_model.LinearRegression()

# Training the model using the training sets
regr.fit(X_train, y_train)


In [0]:
# Make predictions using the testing set
diabetes_y_pred = regr.predict(X_test)

In [0]:
# The coefficients
print('Coefficients: \n', regr.coef_)
# Calculating the mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(y_test, diabetes_y_pred))



In [0]:
# Plotting the test data
plt.scatter(X_test, y_test,  color='black')

# Plotting the predicted line
plt.plot(X_test, diabetes_y_pred, color='blue', linewidth=3)
plt.show()