## Pipeline: Tune hyperparameters

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will tune the hyperparameters for the basic model we fit in the last section.

### Read in data & create train/validation/test set

![Tune Hyperparameters](../../img/tune_hyperparameters.png)

In [None]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV

titanic = pd.read_csv('../../../titanic_cleaned.csv')

features = titanic.drop('Survived', axis=1)
labels = titanic['Survived']

X_train, X_val, y_train, y_val = train_test_split(features, labels, test_size=0.4, random_state=42)
X_test, X_val, y_test, y_val = train_test_split(X_val, y_val, test_size=0.5, random_state=42)

### Hyperparameter tuning

![Hyperparameters](../../img/hyperparameters.png)

In [None]:
def print_results(results):
    print('BEST PARAMS: {}\n'.format(results.best_params_))

    means = results.cv_results_['mean_test_score']
    stds = results.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

In [None]:
rf = RandomForestClassifier()
parameters = {
    'n_estimators': [],
    'max_depth': []
}