# Demo on Evaluation Metrics for Regression and Classification Modeling


## Part I. Regression Metrics

There are three common metrics on regression models, i.e.,  
- **Mean of Absolute Errors**
- **Mean of Squared Errors**
- **R_2 Score**

sklearn has all of them:

`from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error`

For the demonstration purpose, I will use the built-in `Boston Housing dataset` in sklearn. Follow the **10 STEPS** below.

[STEP 1: Import the related packages and dataset.](#reg_step1)

**STEP 1: Read in the dataset and set up the training and testing data that will be used for the rest of this task.**

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

boston = load_boston()
y = boston.target
X = boston.data

X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.33, random_state=42)

In [4]:
# Get to a little bit know about the dataset.
print(boston['DESCR'])

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu

**STEP 2: Import FOUR packages for regression modeling from sklearn**

Here, we choose 
- `RandomForestRegressor` and `AdaBoostRegressor` in `ensemble` methods; 
- `LinearRegression`
- `DecisionTreeRegressor`

In [10]:
# Notice: be sure to choose the regressor version (not the classifier version)
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

# Import the built-in metrics for regression models in sklearn
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

**STEP 3: Instantiate each of the four regressors and use the defaults for all the hyperparameters.**

In [6]:
tree_mod = DecisionTreeRegressor()
rf_mod = RandomForestRegressor()
ada_mod = AdaBoostRegressor()
reg_mod = LinearRegression()

**STEP 4: Fit the instantiated models on the training data.**

In [7]:
tree_mod.fit(X_train, y_train)
rf_mod.fit(X_train, y_train)
ada_mod.fit(X_train, y_train)
reg_mod.fit(X_train, y_train)

LinearRegression()

**STEP 5: Use each of the fitted models to predict on the testing data.**

In [11]:
preds_tree = tree_mod.predict(X_test) 
preds_rf = rf_mod.predict(X_test)
preds_ada = ada_mod.predict(X_test)
preds_reg = reg_mod.predict(X_test)

**STEP 6: Self-define these THREE metrics to understand how they will be calculated in sklearn under the hood.**