# Exercise: Follow Up on Advertising Data Set; Cross Validation and Parameter Optimization

We have discussed the importance of cross validation (CV) and parameter optimization 
for evaluation and model tuning.

The goal of this exercise is to familiarize yourself with the corresponding utilities provided 
in scikit-learn.

Answer below questions.
The code snippeds already contained in the notebook will provide you with hints.

- **For each question, give the answer by adding it to this cell.**
- **Submit your answers through this [form](https://forms.gle/8aXAk1oMB4Kn4tDb8).**


## Questions
### Advertising Follow Up
1. Why did we decide to scale the inputs?
1. We observed a positive coefficient for TV but a negative for TV^2.
   What does this mean?
1. Using one of the models, describe and interpret the result of spending
   - no money
   - and an ever increasing amount of money
   
   on sales. Is this behaviour reasonable?

### Parameter Optimization I
1. What is the optimal degree you obtain after performing a cross-validated grid search?
1. What coefficients do you obtain after performing a cross-validated grid search?

### Parameter Optimization II
1. What does the decision region plot tell you?
1. What is the optimal parameter setting you obtain for the `DecisionTreeClassifier` after performing a cross-validated grid search?
1. How do the decision regions differ between the default and the cross-validated model?
1. Bonus: How does the decision region differ when using a `RandomForestClassifier`? Any ideas why it looks different?

## Answers
### Parameter Optimization I
1. TBA
1. TBA
1. TBA

### Parameter Optimization II
1. TBA
1. TBA
1. TBA
1. TBA

## Examples

In [None]:
from sklearn.datasets import make_moons
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import numpy as np

plt.rcParams['figure.figsize'] = (10, 6)

%matplotlib inline

### Parameter Optimization I

In [None]:
# given data
x = np.array([-1.76128841, -1.63158024, -1.53752642, -1.51748534, -1.29819298,
       -1.27003308, -1.09259419, -1.08694708, -0.99817854, -0.85544266,
       -0.82514381, -0.78351684, -0.75095511, -0.73085807, -0.70816434,
       -0.62894466, -0.62728794, -0.55284538, -0.43152993, -0.40782298,
       -0.34069515, -0.33191116, -0.30757416, -0.29667884, -0.29459477,
       -0.27654895, -0.26519531, -0.24571102, -0.07627239, -0.06786294,
       -0.02525961,  0.0073467 ,  0.04168935,  0.07794048,  0.1262055 ,
        0.12731035,  0.18027203,  0.20525908,  0.41224051,  0.44409404,
        0.4515781 ,  0.49581181,  0.5239045 ,  0.53760383,  0.72520306,
        0.73931895,  0.78587674,  0.87787588,  0.88977353,  0.8978213 ,
        0.91619883,  0.95198162,  1.3053632 ,  1.39772718,  1.46523663,
        1.50182737,  1.57355665,  1.77664007,  1.92305679,  1.94223914]).reshape((-1, 1))

y = np.array([12.52002464, 10.19783868,  9.30437687,  9.37500655,  7.94905528,
        3.81413555,  5.74304708,  3.4380946 ,  2.76820418,  3.46115356,
        1.73343419,  4.08605565,  3.12061054,  3.3085446 ,  1.19416302,
        2.51087828,  2.41009238,  1.27328361,  1.09926486,  1.96093725,
        1.24933231,  1.7335095 ,  1.49818064,  1.27902629,  1.90791481,
        1.75213245,  1.20290468,  0.87234103,  1.02297036,  1.04514318,
        1.04662249,  1.00726388,  0.93594893,  1.00951318,  0.93474991,
        0.98445663,  0.59479965,  0.74519815,  0.26032586,  0.38746046,
        0.68397116,  0.95859012, -0.21909301, -0.19769223, -0.09708284,
        0.88095766,  1.435611  ,  0.40325439,  1.45902219, -0.22903704,
       -0.51728928, -0.46341346, -2.54312459,  0.6652953 , -2.40960325,
       -1.31908844, -1.49930838, -3.07503961, -8.16209329, -0.7027068 ])

In [None]:
# optimize the degree parameter in PolynomialFeatures
model = Pipeline([('polynomial_features', PolynomialFeatures()), 
                  ('regression', LinearRegression(fit_intercept=False))])
# by evaluating the grid
param_grid = {'polynomial_features__degree': [1, 2, 3, 4, 5]}
# using a cross-validated search

grid_search_cv = GridSearchCV(#TODO, 
                              #TODO, 
                              cv=5, 
                              refit=True)
# some missing pieces are marked as #TODO, others you have to fill in on your own

In [None]:
grid_search_cv.best_params_

In [None]:
best_model = grid_search_cv.best_estimator_
best_model.named_steps.regression.coef_

### Parameter Optimization II

In [None]:
## given data
X, y = make_moons(n_samples=200, noise=0.3, random_state=123)

fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c=y)
ax.set_xlabel('x1')
ax.set_ylabel('x2')

In [None]:
# train test split for demo purposes
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [None]:
# model fit for demo purposes
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

In [None]:
# example
fig, ax = plt.subplots()
plot_decision_regions(X_test, y_test, model, ax=ax)
ax.set_xlabel('x1')
ax.set_ylabel('x2')

In [None]:
# similar to above
# use GridSearchCV to optimize the model parameters max_depth and min_samples_split
grid_search_cv = GridSearchCV(#TODO, 
                              #TODO, 
                              cv=5)

In [None]:
# retrieve best parameter combination as
grid_search_cv.best_params_

In [None]:
# fit a new model on the training data using the best parameters from above
best_model = grid_search_cv.best_estimator_