# Linear Regression

In this Notebook we would learn how to use linear models in regression problems.

- simple linear regression : a response variable and single explanatory variable
- multiple linear regression: can support more than one explanatory variable
- simple polynomial regression: model nonlinear relationship using one varibale 
- Multiple polynomial regression: model nonlinear relationship using multiple variables 

## Simple Linear Regression
Simple
linear regression can be used to model a linear relationship between one response
variable and one explanatory variable

### Let's predict the cost of the Pizza 

<img src="Images/Pizza.jpg" width="70%">

Suppose that if you would think of designing the application to predict the cost of the Pizza from its size. Our initial impression is, larger the pizza higher the cost. But what if someone asked you to predict exact cost of the pizza if the exact size is given. As we are interested in value, this is an example of regression analysis.

#### Supervised Learning 
Regression analysis is a supervised machine learning techique. So we need a training data. Lets assume we have a toy data:
<img src="Images/Toy_data.jpg" width="70%">

In [None]:
import warnings
warnings.filterwarnings('ignore')

As the data set is of very small size, lets hardcode the values of X and y

In [None]:
X = [[6], [8], [10], [14], [18]]
y = [[7], [9], [13], [17.5], [18]]

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
plt.figure()
plt.title('Pizza price plotted against diameter')
plt.xlabel('Diameter in inches')
plt.ylabel('Price in dollars')
plt.plot(X, y, 'k.')
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

# 1. Simple Linear Regression with one variable 

In [None]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

### Let's train the model

In [None]:
model.fit(X, y)

### Plotting Regression Line 

In [None]:
plt.figure()
plt.title('Pizza price plotted against diameter')
plt.xlabel('Diameter in inches')
plt.ylabel('Price in dollars')
plt.plot(X, y, 'k.')
plt.plot(X, model.predict(X),'g')
plt.axis([0, 25, 0, 25])
plt.grid(True)
plt.show()

## Your first Machine Learning App is ready..!  

In [None]:
from IPython.html import widgets
from IPython.html.widgets import interact
from IPython.display import display
import warnings
warnings.filterwarnings('ignore')

In [None]:
def PrzzaPricePredictor(dia):
    print ('A %d " pizza should cost: $%.2f'%(dia,model.predict([dia][0])))

In [None]:
i = interact(PrzzaPricePredictor, dia=(0,100))

## How good is your model ?

<img src="Images/estimating_coefficients.png" width="80%">





In [None]:
import numpy as np
print ('Residual sum of squares: %.2f' % np.mean((model.predict(X)- y) ** 2))

### Evaluating the model on Test Dataset 

Suppose that you applied this model to a test dataset as shown below:
<img src="Images/Toy_data_test.jpg" width="70%">



In [None]:
X_test = [[8], [9], [11], [16], [12]]
y_test = [[11], [8.5], [15], [18], [11]]


In [None]:
print ('Residual sum of squares: %.2f' % np.mean((model.predict(X_test)- y_test) ** 2))

# 2. Multiple linear regression
Using your personal experience, you might have some intuitions that the cost of the Pizza is not oly dependent on the size but also on other factors like 'nunber of toppings'. So, lets assume that you asked for more data.

<img src="Images/Toy_data2.jpg" width="80%">

Now, the X would have two columns.

In [None]:
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]
y = [[7], [9], [13], [17.5], [18]]

### Here we would use the same model designed earlier, only difference is your X is having two columns now 

In [None]:
model.fit(X, y)

In [None]:
def PrzzaTopPricePredictor(dia,Top):
    print ('A %d " with %d Topping pizza should cost: $%.2f'%(dia, Top,model.predict([dia,Top])))

### New App would look like this

In [None]:
i = interact(PrzzaTopPricePredictor, dia=(0,100), Top=(0,5))

### Evaluating the fitness of a model with a cost function

In [None]:
print ('Residual sum of squares: %.2f' % np.mean((model.predict(X)- y) ** 2))

# 3. Polynomial regression with One Variable 

In [None]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

In [None]:
def PolynomialRegression(degree):
    
    X = [[6], [8], [10], [14], [18]]
    y = [[7], [9], [13], [17.5], [18]]
    
    # Simple linear regression first
    regressor = LinearRegression()
    regressor.fit(X, y)
    xx = np.linspace(0, 26, 100)
    yy = regressor.predict(xx.reshape(xx.shape[0], 1))
    
    quadratic_featurizer = PolynomialFeatures(degree)
    X_quadratic = quadratic_featurizer.fit_transform(X)
    
    regressor_quadratic = LinearRegression()
    regressor_quadratic.fit(X_quadratic, y)
    xx_quadratic = quadratic_featurizer.transform(xx.reshape(xx.shape[0], 1))
    
    print ('Residual sum of squares: %.2f' % np.mean(( regressor_quadratic.predict(X_quadratic)- y) ** 2))
    
    plt.plot(xx, yy)
    plt.plot(xx, regressor_quadratic.predict(xx_quadratic), c='r',linestyle='--')
    plt.title('Pizza price regressed on diameter')
    plt.xlabel('Diameter in inches')
    plt.ylabel('Price in dollars')
    plt.axis([0, 25, 0, 25])
    plt.grid(True)
    plt.scatter(X,y)
    plt.show()

    
    print(X_quadratic)

In [None]:
i = interact(PolynomialRegression, degree=(0,10))

# Test Error 

In [None]:
def PolyTestError(degree):
    
    X = [[6], [8], [10], [14], [18]]
    y = [[7], [9], [13], [17.5], [18]]
    
    quadratic_featurizer = PolynomialFeatures(degree)
    X_quadratic = quadratic_featurizer.fit_transform(X)
    
    regressor_quadratic = LinearRegression()
    regressor_quadratic.fit(X_quadratic, y)
    
    
    X_test = [[8], [9], [11], [16], [12]]
    y_test = [[11], [8.5], [15], [18], [11]]
    quadratic_features = PolynomialFeatures(degree)
    X_quadratic_test=quadratic_features.fit_transform(X_test)
    print ('Residual sum of squares: %.2f' % np.mean((regressor_quadratic.predict(X_quadratic_test)- y_test) ** 2))
    

In [None]:
i = interact(PolyTestError, degree=(0,5))

# 4. Polynomial Regression with Multiple Variables 

In [None]:
X = [[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]
y = [[7], [9], [13], [17.5], [18]]

In [None]:
quadratic_features = PolynomialFeatures(2)
X_quadratic=quadratic_features.fit_transform(X)

In [None]:
X_quadratic[0]

In [None]:
model=LinearRegression()
model.fit(X_quadratic,y)

# Train Error 

In [None]:
print ('Residual sum of squares: %.2f' % np.mean((model.predict(X_quadratic)- y) ** 2))

# Test Error

In [None]:
X_test = [[8,0], [9,0], [11,0], [16,0], [12,0]]
y_test = [[11], [8.5], [15], [18], [11]]

In [None]:
quadratic_features = PolynomialFeatures(2)
X_quadratic_test=quadratic_features.fit_transform(X_test)

In [None]:
print ('Residual sum of squares: %.2f' % np.mean((model.predict(X_quadratic_test)- y_test) ** 2))

# Which Model to Use ?

<img src="Images/Stay_Tuned.jpg" width="80%">