<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Tweaking-&amp;-Adjusting-Your-Model" data-toc-modified-id="Tweaking-&amp;-Adjusting-Your-Model-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Tweaking &amp; Adjusting Your Model</a></span></li><li><span><a href="#Grid-Search:-Find-the-best-for-us!" data-toc-modified-id="Grid-Search:-Find-the-best-for-us!-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Grid Search: Find the best for us!</a></span></li><li><span><a href="#Basic-Grid-Search" data-toc-modified-id="Basic-Grid-Search-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Basic Grid Search</a></span></li><li><span><a href="#BAD-Grid-Search!" data-toc-modified-id="BAD-Grid-Search!-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>BAD Grid Search!</a></span><ul class="toc-item"><li><span><a href="#Example-of-leaking-information" data-toc-modified-id="Example-of-leaking-information-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Example of leaking information</a></span></li><li><span><a href="#Example-of-Grid-Search-with-no-leakage" data-toc-modified-id="Example-of-Grid-Search-with-no-leakage-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Example of Grid Search with no leakage</a></span></li></ul></li></ul></div>

# Tweaking & Adjusting Your Model

A big factor in whether a machine learning model will perform well is a lot of tweaking...

![Pile of data to stir (https://xkcd.com/1838/)](images/machine_learning_xkcd.png)

You can think of hyperparameters as little dials to adjust to make it easier for the machine learning model to learn

![](images/dials.png)

But how do we know what to adjust them to?!

# Grid Search: Find the best for us!

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

A way for us to search over multiple hyperparameters for the given model(s)

# Basic Grid Search

In [None]:
parameters = {
    'kernel': ['linear', 'rbf'],
    'C':[1, 10, 50]
}

clf_sv = SVC()
clf = GridSearchCV(clf_sv, parameters)
clf.fit(X_train, y_train)

# BAD Grid Search!

Note we still have to be careful in performing a grid search!

We can accidentally "leak" information by doing transformations with the **whole data set**, instead of just the **training set**!

## Example of leaking information

This will leak information when doing **cross-validation**:

In [None]:
scaler = StandardScaler()
# Scales over all of the X-train data! (validation set will be considered in scaling)
scaled_data = scaler.fit_transform(X_train)

parameters = {
    'kernel': ['linear', 'rbf'],
    'C':[1, 10]
}

clf_sv = SVC()
clf = GridSearchCV(clf_sv, parameters)
clf.fit(X_train, y_train)

## Example of Grid Search with no leakage

We can help prevent leaking by using a **pipeline** to encapsulate the transformation with a _Transformer_ & _Predictor_ (to form a new *Estimator*)

In [None]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', SVC())
])

parameters = {
    'scaler__with_mean': [True, False]
    'clf__kernel': ['linear', 'rbf'],
    'clf__C':[1, 10]
}

cv = GridSearchCV(pipeline, param_grid=parameters)

cv.fit(X_train, y_train)
y_pred = cv.predict(X_test)