# Chapter 3: Linear Regression

## Data

Boston Housing data is available at https://archive.ics.uci.edu/ml/datasets/Housing

### Import packages

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import seaborn as sns

from sklearn.preprocessing import scale
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf

pd.set_option('display.notebook_repr_html', False)

%matplotlib inline
plt.style.use('seaborn-white')

### Load the housing data

In [6]:
housing = pd.read_csv('Data/Boston.csv', names =["crim", "zn", "indus", "chas", "nox", "rm", "age", "dis", "rad", "tax", "ptratio", "black", "lstat","medv"])
housing.to_csv('Data/Boston_modified.csv', mode = 'w', index=False)

housing.info()
housing[:5]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
crim       506 non-null float64
zn         506 non-null float64
indus      506 non-null float64
chas       506 non-null int64
nox        506 non-null float64
rm         506 non-null float64
age        506 non-null float64
dis        506 non-null float64
rad        506 non-null int64
tax        506 non-null int64
ptratio    506 non-null float64
black      506 non-null float64
lstat      506 non-null float64
medv       506 non-null float64
dtypes: float64(11), int64(3)
memory usage: 55.4 KB


      crim    zn  indus  chas    nox     rm   age     dis  rad  tax  ptratio  \
0  0.00632  18.0   2.31     0  0.538  6.575  65.2  4.0900    1  296     15.3   
1  0.02731   0.0   7.07     0  0.469  6.421  78.9  4.9671    2  242     17.8   
2  0.02729   0.0   7.07     0  0.469  7.185  61.1  4.9671    2  242     17.8   
3  0.03237   0.0   2.18     0  0.458  6.998  45.8  6.0622    3  222     18.7   
4  0.06905   0.0   2.18     0  0.458  7.147  54.2  6.0622    3  222     18.7   

    black  lstat  medv  
0  396.90   4.98  24.0  
1  396.90   9.14  21.6  
2  392.83   4.03  34.7  
3  394.63   2.94  33.4  
4  396.90   5.33  36.2  

## Lab

### 3.6.1 - *Libraries*

sklearn.datasets contain Boston housing data and various other datasets.  
You can use it by simply importing it. i.e.  
`from sklearn.datasets import load_boston
boston = load_boston()`

### 3.6.2 - Simple Linear Regression

The LinearRegression object supports several methods:
1. fit(): fits a linear model
2. predict(): predicts Y using the linear model's estimated coeffs
3. score(): returns the coef of determination R^2
4. get_params():
5. mro():
6. register():
7. set_params():

### Least Squared Fit

In [21]:
# sns.regplot(housing.lstat, housing.medv, order=1, ci=None, scatter_kws={'color':'r'})

In [17]:
# Regression coefficients ( Ordinary Least Squares)
regr = skl_lm.LinearRegression()
x = scale(housing.lstat, with_mean=True, with_std=False).reshape(-1,1)
y = housing.medv

regr.fit(x,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [24]:
print "Estimated intercept coeff: ", regr.intercept_
print "Number of coeffs: ", len( regr.coef_ ) 
print "Coeffs = ", regr.coef_

Estimated intercept coeff:  22.5328063241
Number of coeffs:  1
Coeffs =  [-0.95004935]
