# Regression and Prediction

## Simple Linear Regression

Simple linear regression models the relationship between the magnitude of one variable X and a second Y. The difference between SLR and correlation is that while correlation measures the strength of an association between two variables, regression quantifies the nature of the relationship.

## Regression equation

Simple linear regression estimates how much Y will change when X changes. We are trying to predict the Y variable from X using a linear relationship.

Y=b0+b1X

### Using a set of data to build linear regression model


In [35]:
# Load the Pandas libraries with alias 'pd' 
import pandas as pd 

# Read data from file 'filename.csv' 
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later) 
df = pd.read_csv("sleepdata.csv") 

# Preview the first 5 lines of the loaded data 
df.head()

Unnamed: 0,Start,End,Sleep quality,Time in bed,Wake up,Sleep Notes,Heart rate,Activity (steps)
0,2017-11-27 23:11:00,2017-11-28 06:09:09,62,6:58,,,0,1564
1,2017-11-29 00:41:04,2017-11-29 07:09:51,59,6:28,,Aerial:Worked out,0,8144
2,2017-11-29 23:53:11,2017-11-30 07:27:05,91,7:33,,,0,6707
3,2017-11-30 23:07:29,2017-12-01 05:58:18,79,6:50,,,0,4778
4,2017-12-01 23:19:21,2017-12-02 05:58:20,77,6:38,,Pole fitness:Worked out,0,7111


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 339 entries, 0 to 338
Data columns (total 8 columns):
Start                339 non-null object
 End                 339 non-null object
 Sleep quality       339 non-null int64
 Time in bed         339 non-null object
 Wake up             339 non-null object
 Sleep Notes         339 non-null object
 Heart rate          339 non-null int64
 Activity (steps)    339 non-null int64
dtypes: int64(3), object(5)
memory usage: 21.3+ KB


## Interpreting my sleep data 

I want to know what my sleep quality is related to. First, I am going to import linear regression from sci-kit learn module. Then I am doing to drop the Sleep quality column as I want only the parameters as my X values. I'm going to store linear regression objects in a variable called lm.

In [43]:
from sklearn.linear_model import LinearRegression

#This creates a LinearRegression object
lm = LinearRegression()

#Since I want to test my sleep quality, I will remove it. I will also remove 
#all of the variables not useful to me.
X= df.drop([' Sleep quality','Start', ' End', ' Wake up', ' Sleep Notes', ' Heart rate', ' Time in bed'], axis=1)
Y= df.drop(['Start', ' End', ' Wake up', ' Sleep Notes', ' Heart rate', ' Time in bed'], axis=1)



Unnamed: 0,Activity (steps)
0,1564
1,8144
2,6707
3,4778
4,7111


### Inside the linear regression object

**lm.fit()** - > fits a linear model

**lm.predict()** -> predicts Y using the linear model with estimated coefficients

**lm.score()** -> Returns the coefficient of determination (R^2)

**lm.coef_** -> Estimated coefficients

**lm.intercept_** -> Estimated intercept

## Fitting a Linear Model


In [38]:
lm.fit(X,Y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

I am going to print the intercept and number of coefficients

In [40]:
print('Estimated intercept coefficient:', lm.intercept_)
print('Number of coefficients:', len(lm.coef_))

Estimated intercept coefficient: [ 7.65484750e+01 -2.72848411e-12]
Number of coefficients: 2


I then construct a data frame that contains features and estimated coefficients

In [46]:
pd.DataFrame(zip(X.columns, lm.coef_),columns = ['features', 'estimatedCoefficients'])

AttributeError: 'LinearRegression' object has no attribute 'coef_'