# Ch2. Linear Regression

## Overview
[TOC]
- Simple linear regression
- Evaluating the model
- Multiple linear regression
- Polynomial regression
- Regularization
- Applying linear regression
- Fitting models with gradient descent
- Summary

## Simple linear regression

Training Data 
- used to estimate the **parameters** of a model. 
- Past Observations of **explanatory variables** + **respose variables**

Prediction (using model)
- using explanatory variables (that have not been previously observes)
- estimate response variable

Goal in Regression Problem:
- predict the value of a continuous response variable

Simple Linear Regression
- linear relationship betw. 1 response variable and 1 explanatory variable

### Ex: Pizza

In [1]:
import matplotlib.pyplot as plt
X = [[6], [8], [10], [14],   [18]]
y = [[7], [9], [13], [17.5], [18]]
plt.figure()
plt.title('Pizza price plotted against diameter')
plt.xlabel('Diameter in inches')
plt.ylabel('Price in dollars')
plt.plot(X, y, 'k.')
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

### Actual Linear Regression Example

In [2]:
from sklearn.linear_model import LinearRegression
# Training data
X = [[6], [8], [10], [14],   [18]]
y = [[7], [9], [13], [17.5], [18]]
# Create and fit the model
model = LinearRegression()
model.fit(X, y)
print 'A 12" pizza should cost: $%.2f' % model.predict([12])[0]

A 12" pizza should cost: $13.68


### Hyperplane
- assumption: linear relationship exists betw. response var. and explanatory var.
- this relationship is modeled as a linear surface, **hyperplane**
- subspace that's 1 dim. less than the ambient space that contains it

### Estimator
- eg: **`sklearn.linear_model.LinearRegression`** class
- predicts a value based on the *observed* data
- important methods (all estimator implements)

name | role
-|-
`fit()` | learns the parameters of a model 
`predict()` | predict the value of response variable, using learned parameters

```
    y = α + βx
```

name | desc.
-|-
y | predicted value of the response var. 
x | explanatory variable
α,β | intercept term / coefficient - paramters of the model (learned by the learning algorithm)


In [3]:
plt.xlabel('Diameter in inches'); plt.ylabel('Price in dollars');
X2 = [[0], [10], [14], [25]]
plt.plot(X2, model.predict(X2), color='blue', linewidth=2)
plt.plot(X,y, 'k.')
plt.grid(True)
plt.show()

### Ordinary Least Squares (or Linear Least Squares)
- model that produces the best fitting model

** model that fits the training data? **

### Evaluating the fitness of a model with a cost function

**what's criteria for best-fitting regression?**

#### Cost Function (aka Loss Function)

Used to define & measure the error of a model

- **residual** (or training error, 잔차): differences betw. predicted values & observed values in the training set
- **prediction error / test error**: diff. in the test set

**Residual sum of squares** cost function

<img src="http://ww2.tnstate.edu/ganter/BIO-311-Ch12-Eq5a.gif" width=200></img>


In [4]:
import numpy as np
print "Residual sum of squares: %.2f" % np.mean((model.predict(X) - y) ** 2)

Residual sum of squares: 1.75


### Solving ordinary least squares for simple linear regression

#### β: calculate **variance** & **covariance** of x

<img src='https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSgqtL6JZb3CgHdXki2I-JimIc3hv5JH9jwN7KYEpYUiqrdN4tUDQ' width=200></img>


<img src='http://education.howthemarketworks.com/wp-content/uploads/2013/09/Covariance-Formula.jpg' width=250></img>

```
β = cov(x,y) / var(x)
```
#### α

```
α = mean(y) - β mean(x)
```

## Evaluating the Model

### r-squared

how well the observed values of the response variables are predicted by the model
- proportion of the variance in the response variable that is explaned by the model

<img src='https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQzfE5dS4wf2D645dxy8qrgQfrXkzijU0oGUa0iosrGlqMel2_oTg' width=200></img>



In [5]:
from sklearn.linear_model import LinearRegression

X = [[6], [8], [10], [14], [18]]; y = [[7], [9], [13], [17.5], [18]]
X_test = [[8],  [9],  [11], [16], [12]]; y_test = [[11], [8.5], [15], [18], [11]]

model = LinearRegression()
model.fit(X, y)
print 'R-squared: %.4f' % model.score(X_test, y_test)

R-squared: 0.6620


## Multiple Linear Regiression 

### Formal Definition

> multiple linear regression uses a coefficient for **each** of an arbitrary number of **explanatory variables**

<img src='http://www.saedsayad.com/images/MLR_1b.png' width=500></img>

in matrix form: 

<img src='http://wiki.stat.ucla.edu/socr/uploads/math/3/0/f/30fe0ab6589c9992f21487bd912085e7.png' height=18></img>

<img src='http://wiki.stat.ucla.edu/socr/uploads/math/8/9/7/89764b9d5fee46cedb3301257d28aa51.png' width=400></img>

### estimation of β (using matrix notation)
<img src='http://wiki.stat.ucla.edu/socr/uploads/math/2/d/7/2d74063fcebd6bbebe99ace856d0207b.png' height=18/>

In [6]:
from numpy.linalg import inv; from numpy import dot, transpose
X = [[1, 6, 2], [1, 8, 1], [1, 10, 0], [1, 14, 2], [1, 18, 0]]; y = [[7],    [9],    [13],    [17.5],  [18]]
print dot(inv(dot(transpose(X), X)), dot(transpose(X), y))

[[ 1.1875    ]
 [ 1.01041667]
 [ 0.39583333]]


or, use least square function that `NumPy` provides:

In [7]:
from numpy.linalg import lstsq
print lstsq(X, y)[0]

[[ 1.1875    ]
 [ 1.01041667]
 [ 0.39583333]]


Let's predict `y` (with second explanatory variable)! 

In [8]:
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]; y = [[7],    [9],    [13],    [17.5],  [18]]
model = LinearRegression()
model.fit(X, y)
X_test = [[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]; y_test = [[11],   [8.5],  [15],    [18],    [11]]
predictions = model.predict(X_test)

for i, prediction in enumerate(predictions):
	print 'Predicted: %s, Target: %s' % (prediction, y_test[i])
print 'R-squared: %.2f' % model.score(X_test, y_test)

Predicted: [ 10.0625], Target: [11]
Predicted: [ 10.28125], Target: [8.5]
Predicted: [ 13.09375], Target: [15]
Predicted: [ 18.14583333], Target: [18]
Predicted: [ 13.3125], Target: [11]
R-squared: 0.77


### Analysis

- additional explanatory variable improved the **performance** of the model
- multiple linear regression model performs **significantly better** than the simple LR model. 


## Polynomial Regression 

### Quadratic Regression

cf. `PolynomialFeatures` transformer can be used to add polynomial features to a feature representation

cf. **Feature**: tuple of explanatory variables

In [18]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X_train = [[6], [8], [10], [14], [18]]
y_train = [[7],    [9],    [13],    [17.5],  [18]]
X_test = [[6],  [8],   [11], [16]]
y_test = [[8],   [12],  [15], [18]]

regressor = LinearRegression()
regressor.fit(X_train, y_train)
xx = np.linspace(0, 26, 100)
yy = regressor.predict(xx.reshape(xx.shape[0], 1))
plt.plot(xx, yy)

featurizer = PolynomialFeatures(degree=2)
X_train_quad = featurizer.fit_transform(X_train)
X_test_quad = featurizer.transform(X_test)

regressor_quad = LinearRegression()
regressor_quad.fit(X_train_quad, y_train)
xx_quad = featurizer.transform(xx.reshape(xx.shape[0], 1))

plt.plot(xx, regressor_quad.predict(xx_quad), c='r', linestyle='--')
plt.title('Pizza price regressed on diameter (linear, quadratic)')
plt.xlabel('Diameter in inches')
plt.ylabel('Price in dollars')
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.scatter(X_train, y_train)
plt.show()

print X_train
print X_train_quad
print X_test
print X_test_quad

print 'Simple LR R^2', regressor.score(X_test, y_test)
print 'Quadratic Regression R^2', regressor_quad.score(X_test_quad, y_test)


[[6], [8], [10], [14], [18]]
[[  1   6  36]
 [  1   8  64]
 [  1  10 100]
 [  1  14 196]
 [  1  18 324]]
[[6], [8], [11], [16]]
[[  1   6  36]
 [  1   8  64]
 [  1  11 121]
 [  1  16 256]]
Simple LR R^2 0.809726797708
Quadratic Regression R^2 0.867544365635


### Overfitting
- instead of induce a general rule,
- memorized the inputs and outputs from the training data
- so **performs poorly** on **test data**

## Regularization

Collection of techniques that can be used to **prevent overfitting**

**Penalty** against **complexity**


### Ridge Regression 

using L2 norm of the coefficients

### Least Absolute Shrinkage and Selection Operator (LASSO)

using L1 norm of the coefficients

### Elastic Net Regularization

linearly combines L1 and L2 penalties(norms)

## Applying LR

## Fitting Models with Gradient Descent

### Gradient Descent

an optimization algorithm that can be used to estimate the local minimum of a function

**learning rate** : controls the size of the steps
- too small: too long time for convergence
- too large: overstepping -> could oscillate around the optimal values

type | training data coverage | result
-|-|-
Batch GD | all | deterministic
Stochastic GD (SGD) | one / iteration | random

In [1]:
import numpy as np
import random
from sklearn.datasets import load_boston
from sklearn.linear_model import SGDRegressor
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
data = load_boston()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)
print 'length of training set: ', len(y_train)
print 'length of test set: ', len(y_test) # train:test = 3:1
# random.seed(len(y_train))
X_scaler = StandardScaler()
y_scaler = StandardScaler()
X_train = X_scaler.fit_transform(X_train)
y_train = y_scaler.fit_transform(y_train)
X_test = X_scaler.transform(X_test)
y_test = y_scaler.transform(y_test)
regressor = SGDRegressor(loss='squared_loss')
scores = cross_val_score(regressor, X_train, y_train, cv=5)
print 'Cross validation r-sqaured scores:', scores
print 'Average cross validation r-squared score:', np.mean(scores)
regressor.fit_transform(X_train, y_train)
print 'Test set r-squared score', regressor.score(X_test, y_test)


length of training set:  379
length of test set:  127
Cross validation r-sqaured scores: [ 0.67522661  0.69081592  0.76359336  0.63759407  0.85035132]
Average cross validation r-squared score: 0.723516256254
Test set r-squared score 0.623785816774


## Summary

### 3 Cases of LR
- Simple LR
- Multiple LR
- Polynomial Regression

### Generalized Linear Model
- framework for modeling linear relationships

### How to minimize cost function (solve the model, find parameters, ...)
- solve analytically (using Linear Algebra)
- use gradient descent 

### Next Chapter
- how to create features for different types of explanatory variables
- especially, categorical variables, texts, images