# HyperparameterHunter Extended Example

In this example, we'll try to simulate a miniature project and go over some of the things you should expect when starting out with HyperparameterHunter and some of the things you might want to adjust.

In [1]:
import warnings
warnings.filterwarnings('ignore')
# import os
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
from sklearn.model_selection import StratifiedKFold
from xgboost import XGBClassifier

from hyperparameter_hunter import Environment, CrossValidationExperiment
from hyperparameter_hunter.utils.learning_utils import get_breast_cancer_data
from hyperparameter_hunter.utils.file_utils import print_tree

Using TensorFlow backend.


Above, we're importing the cross-validation scheme with which we want to start, along with the first algorithm we'll test (`StratifiedKFold` and `XGBClassifier`, respectively). Then, from `hyperparameter_hunter`, we import `Environment`, and `CrossValidationExperiment`, which are at the core of any project. We also import the following utility functions: `get_breast_cancer_data`, which puts the result of `sklearn.datasets.load_breast_cancer` into a DataFrame; and `print_tree`, which prints out a directory's contents so we can easily see what files are being created by completed Experiments.

Below, we declare the directory to store hyperparameter_hunter's results, and print out what it looks like. The astute readers will notice that nothing is printed because that directory doesn't exist yet (unless you're running this with a populated directory, in which case the rest of this example won't make much sense).

In [3]:
example_assets = 'HyperparameterHunterAssets/'
print_tree(example_assets)

### Environment

In [5]:
env = Environment(
    train_dataset=get_breast_cancer_data(),
    root_results_path=example_assets,
    metrics_map=['roc_auc_score'],
    cross_validation_type=StratifiedKFold,
    cross_validation_params=dict(n_splits=3, shuffle=True, random_state=32),
)

Cross-Experiment Key: PgKvtmL_kiUe-_uoOzS8ar6x5CgOHDB8_iGryURR_EI=


We begin by instantiating an `Environment`, and giving it the following: 
* Our `train_dataset`,
* Our `root_results_path` declared above,
* A `metrics_map`, with 'roc_auc_score', because the Wisconsin Breast Cancer dataset is a classification problem,
* Our `cross_validation_type` imported earlier,
* And `cross_validation_params`, a dict containing all the arguments to pass to `cross_validation_type`

Notice that upon instantiation, our `Environment` logs the cross-experiment key produced by the provided parameters. This is important because it determines when two `Experiment`s can be properly compared; we'll go over this more later.

### First Experiment