# Learning Unit 6 - Hyper-parameters - Example

Almost done, keep up your courage! I hear there is beer after this. Anyway. 

You might have noticed that we keep printing `cross val scores` for each set of hyper parameters we try. This has probably risen a certain twitch in the engineering minded among us: _"Why am I clicking away at things? I can search this space programatically!"_. 

Yes. Yes you can. Let's go do that then. 

In [None]:
cd ..

In [None]:
from sklearn.tree import DecisionTreeClassifier
from bokeh.plotting import figure, output_notebook
from utils import load_data, visualizations
from sklearn.model_selection import GridSearchCV
output_notebook()

Again, let's start with our ``Ying-Yang```

In [None]:
data = load_data.get_ying_yang(n_points=1000)

This time, let's fit a tree to it. Let's choose some parameters without thinkign too much about it. 

In [None]:
t = DecisionTreeClassifier(max_depth=2, min_samples_split=6)
visualizations.plot_data(model=t, 
                         data=data, 
                         feature1='a', 
                         feature2='b', 
                         target='c', 
                         out_of_sample=True)

Well, we didn't do a particularly great job. We can definitely do better than this. 

Let's create a parameter grid of ``max_depth``, and ``min_samples_split``:

In [None]:
param_grid = {
    'max_depth': [1, 2, 4, 8, 16, 32, 64], 
    'min_samples_split': [2, 4, 8, 16, 32, 64, 128, 256]
}

Now, we can [Grid Search](scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) it. There are definitely smarter ways to do this (following gradients, etc etc), but let's go for the more intuitive version: brute force search. 

In [None]:
gs = GridSearchCV(estimator=t,              # <-- this object will behave as if it were a classifier 
                  param_grid=param_grid)

gs.fit(data[['a', 'b']], data['c']);        # <-- here it runs on our grid of possible hyper params,
                                            # and stores the cross val scores

Let's look at the result:

In [None]:
visualizations.plot_hyper_parameters(gs)

Looks as thought the best results are about the ``max_depth==8``  and ``min_samples_split==32``. 

Let's try that: 

In [None]:
t = DecisionTreeClassifier(max_depth=8, min_samples_split=32)
visualizations.plot_data(model=t, 
                         data=data, 
                         feature1='a', 
                         feature2='b', 
                         target='c', 
                         out_of_sample=True, 
                         probabilities=False)

It's easy to underestimate the importance of observing our results. After all, couldn't we just programatically use the best estimator based on a certain metric? While that is possible, generally the combination of human "this doesn't look right" type intuition with the hard optimization metrics works better than each of the parts. 

And now, to our final exercise, and then beers!