# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
## Not for grading

## Learning Objective 

At the end of the experiment, you will be able to understand:

*  how to implement linear regression for multiple variables using Scikit-Learn

## Dataset

#### Description
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

The datasets consists of several medical predictor variables and one target variable, Outcome. 

* Preg: Number of times pregnant
* Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
* BloodPressure: Diastolic blood pressure (mm Hg)
* SkinThickness: Triceps skin fold thickness (mm)
* Insulin: 2-Hour serum insulin (mu U/ml)
* BMI: Body mass index (weight in kg/(height in m)^2)
* DiabetesPedigreeFunction: Diabetes pedigree function
* Age: Age (years)
* Outcome: Class variable (0 or 1)

In [None]:
!wget https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/diabetes.csv

Importing required Packages 

In [None]:
from sklearn import linear_model   #LinearRegression
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np

In [None]:
!pip install -q gradio

[K     |████████████████████████████████| 2.0 MB 4.2 MB/s 
[K     |████████████████████████████████| 1.9 MB 48.1 MB/s 
[K     |████████████████████████████████| 206 kB 48.5 MB/s 
[K     |████████████████████████████████| 63 kB 2.1 MB/s 
[K     |████████████████████████████████| 3.5 MB 43.0 MB/s 
[K     |████████████████████████████████| 961 kB 53.9 MB/s 
[?25h  Building wheel for ffmpy (setup.py) ... [?25l[?25hdone
  Building wheel for flask-cachebuster (setup.py) ... [?25l[?25hdone


Load the Dataset

In [None]:
# Loading the diabetes dataset
diabetes = pd.read_csv("diabetes.csv")

diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
diabetes_X = diabetes[["BloodPressure", "Age"]]  # Selecting specific columns using iloc is diabetes.iloc[:,[2,7]]
diabetes_y = diabetes["Glucose"]                # Selecting glucose column using iloc is diabetes.iloc[:, 1]

print(diabetes_X.shape, diabetes_y.shape)

(768, 2) (768,)


To learn the linear-regression model from the training data, and predict the values for the test data, we will  perform the train-test split.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(diabetes_X, diabetes_y, test_size=0.2)

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((614, 2), (154, 2), (614,), (154,))

There are a few ways to find the best fit line. One of the approaches is the Ordinary Least Squares (OLS) method which is an intutive mathematical method. 

**Note:** Refer the following [link](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) for Linear Regression from sklearn


In [None]:
# Create a linear regression object
regr = linear_model.LinearRegression()

# Training the model using the training sets
regr.fit(X_train, y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(X_test)   #y^

In [None]:
# For retrieving the slope use 'regr.coef_'. As we have taken two variables, we will get two coefficients m1 and m2
print('Coefficients: ', regr.coef_)

# To retrieve the intercept
print('Intercept: ', regr.intercept_)

# Calculating the root mean squared error
print("Root mean squared error: %.2f" % np.sqrt(mean_squared_error(y_test, diabetes_y_pred)))

Coefficients:  [0.16307013 0.58444931]
Intercept:  89.94430557780285
Root mean squared error: 31.60


In [None]:
print(regr.score(X_test, y_test))

0.028635023658051484


In [None]:
ypred = regr.predict([[72,50]])
print(ypred)

[130.90782034]


  "X does not have valid feature names, but"


In [None]:
def pred_value(bp,age):
  p = regr.predict([[bp,age]])
  return {'Glucose is ':p[0]}

In [None]:
import gradio as gr
bp = gr.inputs.Textbox(placeholder = 'Enter BP')
age = gr.inputs.Textbox(placeholder = 'Ente age')
iface = gr.Interface(fn=pred_value,inputs = [bp,age],outputs='text')
iface.launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Running on public URL: https://16443.gradio.app

This share link will expire in 72 hours. To get longer links, send an email to: support@gradio.app


(<Flask 'gradio.networking'>,
 'http://127.0.0.1:7861/',
 'https://16443.gradio.app')

In [None]:
df1 = pd.DataFrame({'Actual Values':y_test,'Predicted Values ':diabetes_y_pred})
df1

Unnamed: 0,Actual Values,Predicted Values
666,145,144.227508
127,118,112.844707
663,145,126.367888
668,98,124.533693
118,97,112.586398
...,...,...
438,97,113.632650
27,97,113.564819
164,131,122.996855
103,81,115.712138


### Please answer the questions below to complete the experiment:

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}


In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]


In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook  { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Please answer Complexity Question
