# Setting MLPipeline Hyperparameters

In this short guide we will see how to modify the hyperparameters
of an MLPipeline in order to modify its behavior or performance.

Note that some steps are not explained for simplicity. Full details
about them can be found in the previous parts of the tutorial.

We will:

1. Load a dataset and a Pipeline.
2. Explore the pipeline hyperparamters.
3. Reload the pipeline with different hyperparameters.
4. Evaluate the pipeline performance on the dataset.
5. Set different pipeline hyperparameters.
6. Re-evaluate the pipeline performance on the dataset.

## Load the Dataset and the Pipeline

The first step will be to load the dataset and the pipeline that we will be using.

In [1]:
from mlprimitives.datasets import load_dataset

dataset = load_dataset('census')
X_train, X_test, y_train, y_test = dataset.get_splits(1)

In [2]:
from mlblocks import MLPipeline

primitives = [
    'mlprimitives.custom.preprocessing.ClassEncoder',
    'mlprimitives.custom.feature_extraction.CategoricalEncoder',
    'sklearn.impute.SimpleImputer',
    'xgboost.XGBClassifier',
    'mlprimitives.custom.preprocessing.ClassDecoder'
]
pipeline = MLPipeline(primitives)

## Explore the Pipeline Hyperparameters

Once we have loaded the pipeline, we can see the hyperparameters that it is using by
calling its `get_hyperparameters` method.

In [3]:
pipeline.get_hyperparameters()

{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
 'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'keep': False,
  'copy': True,
  'features': 'auto',
  'max_unique_ratio': 0,
  'max_labels': 0},
 'sklearn.impute.SimpleImputer#1': {'missing_values': nan,
  'fill_value': None,
  'verbose': False,
  'copy': True,
  'strategy': 'mean'},
 'xgboost.XGBClassifier#1': {'n_jobs': -1,
  'n_estimators': 100,
  'max_depth': 3,
  'learning_rate': 0.1,
  'gamma': 0,
  'min_child_weight': 1},
 'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}

This will return us a dictionary that contains one entry for each step in the pipeline.
Each entry will also be a dictionary, indicating the names and the values of the hyperparameters of that step.

**NOTE** that here we see the names of the pipeline steps, which are the primitive names with a numerical suffix that allows us to tell the difference between multiple steps that use the same primitive. 

Alternatively, for better compatibility with tuning systems like [BTB](https://github.com/MLBazaar/BTB)
that work with flat, one-level, dictionaries, the argument `flat=True` can be passed.

In [4]:
pipeline.get_hyperparameters(flat=True)

{('mlprimitives.custom.feature_extraction.CategoricalEncoder#1',
  'keep'): False,
 ('mlprimitives.custom.feature_extraction.CategoricalEncoder#1', 'copy'): True,
 ('mlprimitives.custom.feature_extraction.CategoricalEncoder#1',
  'features'): 'auto',
 ('mlprimitives.custom.feature_extraction.CategoricalEncoder#1',
  'max_unique_ratio'): 0,
 ('mlprimitives.custom.feature_extraction.CategoricalEncoder#1',
  'max_labels'): 0,
 ('sklearn.impute.SimpleImputer#1', 'missing_values'): nan,
 ('sklearn.impute.SimpleImputer#1', 'fill_value'): None,
 ('sklearn.impute.SimpleImputer#1', 'verbose'): False,
 ('sklearn.impute.SimpleImputer#1', 'copy'): True,
 ('sklearn.impute.SimpleImputer#1', 'strategy'): 'mean',
 ('xgboost.XGBClassifier#1', 'n_jobs'): -1,
 ('xgboost.XGBClassifier#1', 'n_estimators'): 100,
 ('xgboost.XGBClassifier#1', 'max_depth'): 3,
 ('xgboost.XGBClassifier#1', 'learning_rate'): 0.1,
 ('xgboost.XGBClassifier#1', 'gamma'): 0,
 ('xgboost.XGBClassifier#1', 'min_child_weight'): 1}

This will return us the same information as before, but organized a single one-level
dictionary where each key is a `tuple` containing both the name of the step and the hyperparameter.

## Setting Pipeline hyperparameter values

We can set some different hyperparameter values when loading the pipeline by adding the
`init_params` argument to `MLPipeline`.

The `init_params` has to be a dictionary where each entry corresponds to the name of one of the
pipeline steps and each value is another dictionary indicating the hyperparameter values that we
want to use on that step.

As an example, we will set a different imputer strategy and a different xgboost max dempt.

In [5]:
init_params = {
    'sklearn.impute.SimpleImputer#1': {
        'strategy': 'median'
    },
    'xgboost.XGBClassifier#1': {
        'max_depth': 4
    }
}
pipeline = MLPipeline(
    primitives,
    init_params=init_params
)

We can now see how the hyperparameters are different than before.

In [6]:
pipeline.get_hyperparameters()

{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
 'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'keep': False,
  'copy': True,
  'features': 'auto',
  'max_unique_ratio': 0,
  'max_labels': 0},
 'sklearn.impute.SimpleImputer#1': {'missing_values': nan,
  'fill_value': None,
  'verbose': False,
  'copy': True,
  'strategy': 'median'},
 'xgboost.XGBClassifier#1': {'n_jobs': -1,
  'max_depth': 4,
  'n_estimators': 100,
  'learning_rate': 0.1,
  'gamma': 0,
  'min_child_weight': 1},
 'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}

## Evaluate the Pipeline performance

We can now evaluate the pipeline performance to see what results these
hyperparameters produce.

In [7]:
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

dataset.score(y_test, y_pred)

0.8647586291610367

## Setting hyperparameter values

Another way of setting the pipeline hyperparameters without having to recreate it
from scratch, is to use its `set_hyperparameters` method.

In this case, we will change the CategoricalEncoder `max_labels` and the xgboost `learning_rate`.

In [8]:
hyperparameters = {
    'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {
        'max_labels': 10
    },
    'xgboost.XGBClassifier#1': {
        'learning_rate': 0.3
    }
}
pipeline.set_hyperparameters(hyperparameters)

Alternatively, the hyperparameters can be set using the `flat` format:

In [9]:
hyperparameters = {
    ('mlprimitives.custom.feature_extraction.CategoricalEncoder#1', 'max_labels'): 10,
    ('xgboost.XGBClassifier#1', 'learning_rate'): 0.3
}
pipeline.set_hyperparameters(hyperparameters)

And we can see how these hyperparameters now are different than before:

In [10]:
pipeline.get_hyperparameters()

{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
 'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'keep': False,
  'copy': True,
  'features': 'auto',
  'max_unique_ratio': 0,
  'max_labels': 10},
 'sklearn.impute.SimpleImputer#1': {'missing_values': nan,
  'fill_value': None,
  'verbose': False,
  'copy': True,
  'strategy': 'median'},
 'xgboost.XGBClassifier#1': {'n_jobs': -1,
  'max_depth': 4,
  'n_estimators': 100,
  'learning_rate': 0.3,
  'gamma': 0,
  'min_child_weight': 1},
 'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}

## Evaluate the Pipeline performance

We can now evaluate again the pipeline performance and see how the hyperparameter
change affected the pipeline performance.

In [11]:
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

dataset.score(y_test, y_pred)

0.870531875690947