# Objective
* Describe when linear regression is the appropriate analysis technique
* Use scikit-learn to perform Linear Regression and Multiple Linear Regression

***

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# regression module
from sklearn.linear_model import LinearRegression
from sklearn import datasets

## 1. Gather and Prepare Data
We will be using sklearn's `diabetes` dataset for this example. In order to load an sklearn dataset, we imported the `datasets` module from the `sklearn` library. We can then use the [`load_diabetes`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html#sklearn.datasets.load_diabetes) function to load the diabetes data. There are plenty more datasets we can play around with using [`sklearn.datasets`](https://scikit-learn.org/stable/datasets/toy_dataset.html).

In [1]:
# Load the diabetes dataset


The `load_diabetes` function returns a dataset in a `bunch` type object. Let's see what that object contains!

If looks like there are a lot of attributes we can inspect. Let's inspect the `feature_names` attribute first.

Now, let's take a look at the DataFrame using the `frame` attribute.

Let's set the features (X) to the `data` attribute and the targets (Y) to the `target` attribute.

I want to see if there is a relationship between BMI and the target (diabetes progression). Make `X` only contain the `bmi` column.

In [2]:
# Use only one feature


Right now, `X` is simply one column of our data. SciKit Learn's Linear Regression requires a matrix of features, so we must transform our BMI column into a one dimensional matrix using the code below.

In [129]:
X_matrix = X.values.reshape(-1,1)

## 2. Choose Model
Create an instance of the `LinearRegression` object.

In [3]:
# linear regression 


## 3. Train Model
Fit the model using our `X_matrix` and `y`.

## 4. Evaluate Model
Let's check out the score of our model (range from 0-1). We can do so by using the `score` attribute of our `reg` (LinearRegression) object. We should also take a look at our intercept and coefficient for the formula  of our line.

In [4]:
# Check score


In [5]:
# Check intercept


In [6]:
# Check coefficient


## 6. Make Predictions
Let's make predictions and plot the line!

In [7]:
# plot the data


# using the parameters from the model to create the regression line (y = b0 + b1*x)


# plotting regression line


# labelling axes


## Multiple Linear Regression
We are going to go through the same steps as before, but this time we will be using **all** of the features instead of just using `bmi`.

In [8]:
# Load the diabetes dataset


In [9]:
# Linear Regression and fit the model


In [10]:
# Check score


In [11]:
# Check intercept


In [12]:
# Check coefficient


This is what the data looks like in its original state.

In [141]:
pd.read_csv('https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data', delimiter='\t')

Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
0,59,2,32.1,101.00,157,93.2,38.0,4.00,4.8598,87,151
1,48,1,21.6,87.00,183,103.2,70.0,3.00,3.8918,69,75
2,72,2,30.5,93.00,156,93.6,41.0,4.00,4.6728,85,141
3,24,1,25.3,84.00,198,131.4,40.0,5.00,4.8903,89,206
4,50,1,23.0,101.00,192,125.4,52.0,4.00,4.2905,80,135
...,...,...,...,...,...,...,...,...,...,...,...
437,60,2,28.2,112.00,185,113.8,42.0,4.00,4.9836,93,178
438,47,2,24.9,75.00,225,166.0,42.0,5.00,4.4427,102,104
439,60,2,24.9,99.67,162,106.6,43.0,3.77,4.1271,95,132
440,36,1,30.0,95.00,201,125.2,42.0,4.79,5.1299,85,220
