<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practicing loss functions and regression metrics

_Instructor: Aymeric Flaisler_

---


Below you will look at loss functions and at regression metrics related to these loss functions.

## 1. Load packages

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.regression.quantile_regression import QuantReg

import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

%matplotlib inline

  from pandas.core import datetools


## 2. Plotting functions

Here are functions to make visualizing your regressions easier:

In [2]:
def plot_regression(x, y, model):
    plt.figure(figsize=(6,4))
    axes = plt.gca()
    
    intercept = model.params[0]
    slope = model.params[1]

    for x_, y_ in zip(x, y):    
        plt.plot((x_, x_), (y_, x_*slope + intercept),'k-', ls='dashed', lw=1)
        
    plt.scatter(x, y, s=70, c='steelblue')
    
    x_points = np.linspace(axes.get_xlim()[0], axes.get_xlim()[1], 100)
    
    regline_x = x_points
    regline_y = x_points*slope + intercept

    plt.plot(regline_x, regline_y, c='darkred', lw=3.5)

    plt.show()
    
def plot_residuals(model,loss='leastsquares'):
    
    resids = model.resid
    
    resid_lim = np.max([abs(np.min(resids)), abs(np.max(resids))]) + 1
    
    resid_points = np.linspace(-resid_lim, resid_lim, 200)
    
    plt.figure(figsize=(6,4))

    if loss == 'leastsquares':
        for r in resids:

            plt.plot((r, r), (0, r**2), 'k-', ls='dashed', lw=2)
        
        plt.plot(resid_points, resid_points**2, c='gold', alpha=0.7)
        plt.show()
        
    elif loss == 'lad':
        for r in resids:

            plt.plot((r, r), (0, np.abs(r)), 'k-', ls='dashed', lw=2)
        
        plt.plot(resid_points, np.abs(resid_points), c='green', alpha=0.7)
        plt.show()
    
    elif loss == 'both':
        for r in resids:
            
            plt.plot((r, r), (0, r**2), 'k-', ls='dashed', lw=2)
            plt.plot((r, r), (0, np.abs(r)), 'k-', ls='dashed', lw=2)
            
        plt.plot(resid_points, resid_points**2, c='gold', alpha=0.7)
        plt.plot(resid_points, np.abs(resid_points), c='green', alpha=0.7)
        plt.show()
    
    else:
        print 'Options are leastsquares or lad.'

## 3. Load the autostats dataset

In [1]:
df = pd.read_csv('./datasets/Auto.csv')

In [2]:
df.shape

(397, 9)

In [3]:
df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


## 4. Choose a continuous response variable and predictor variable from the dataset

You might have to clean the data.

## 5. Build a least squares regression model predicting your response from your predictors

Hint: use the code from the lesson

## 6. Plot the least squares regression

You can use the ```plot_regression(x, y, model)``` function provided.

## 7. Build a least absolute deviation quantreg model on the same sample

Here are some information about the LAD: http://data.library.virginia.edu/getting-started-with-quantile-regression/

## 8. Plot the LAD regression

## 9. Calculate the RMSE and the MAE between your response and the predicted response

**RMSE** is the **root mean squared error**. It is a metric for the performance of your regression related to the least squared loss. 

$$\text{RMSE} = \sqrt{\frac{\sum_{i}{\left(\hat{y}_i - y_i \right)^2}}{n}}$$

https://en.wikipedia.org/wiki/Root-mean-square_deviation

**MAE** is the **mean absolute error**. It is a metric for regression performace when the loss function is the least absolute deviation.

$$\text{MAE} = \frac{\sum_{i}{|\hat{y}_i - y_i |}}{n}$$

https://en.wikipedia.org/wiki/Average_absolute_deviation

In [6]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

## 10. [BONUS] Create a quantile regression with q = 0.25 and plot it. What exactly is this regression predicting?

When q=0.25, the quantile regression is **predicting the 25th percentile of the response variable given the predictors**.