# Hyperparameter tuning

* Two experimental hyperparamter optimizers were introduced
  * `HalvingGridSearchCV`
  * `HalvingRandomSearchCV`
* Use a technique called successive halving
  * Only a subset of data is trained on the parameter combinations
  * Worst-performing candidates are filtered out by training them on a smaller subset of data
  * After each iteration, training samples increase by some factor
  * Number of candidates decreases by as much
  * Faster evaluation time (up to 10x) than GridSearch or RandomizedSearch


### Terminology

#### Hyperparameter

* A model's internal settings - set by user
* Model cannot learn these from training data
* Example: `learning_rate` in xgboost


#### Parameter Grid

* Dictionary
  * Keys - parameter names
  * Values - list of possible hyperparameters

Example:
```python
param_grid = {
    "max_depth": [3, 4, 5, 7],
    "gamma": [0, 0.25, 1],
    "scale_pos_weight": [1, 3, 5]
}
```


#### Candidate

* Single combination of all possible sets of hyperparameters in a parameter grid


#### Resources or samples

* One sample refers to a single row of training data


#### Iteration

* Any single round in which a single set of hyperparameters is used on the training data


### GridSearch

* Exhaustive, brute-force estimator
* All combinations of hyperparameters will be trained using cross-validation
* If there are 100 possible candidates and you perform 5-fold CV; the model will undergo 500 iterations
* This can be very slow for large datasets


### RandomizedSearch

* Not all parameters are tried out
* A fixed number of parameter settings `n_iter` is sampled from the specified distributions
* Compared with Grid search:
  * Shorter training time 
  * Optimal hyperparameters may not be found


### Successive Halving

* HalvingGridSearch (HGS) is like a competition among all candidates
  * First iteration, HGS trains all candidates on a small proportion of the training data
  * Second iteration, best performing candidates are given more training samples to compete
  * With each iteration, surviving candidates are given more training samples
  * Process stops when a single set of best hyperparameters remain
* Control speed of convergence by 2 arguments:
  * `min_resources` - number of data samples to use in first iteration
  * `factor` - next iteration, number of samples = factor * min_resources, number of candidates = candidates / factor
* How do you set these parameters?
  * If not careful, can end up with out enough data or too many candidates. Remaining candidates are trained on all data which is no better than GridSearch
  * Set min_resources = 'exhaust' (is default value)
  * Algorithm automatically determines best combination of min_resources with factor


  ### Compare HalvingGridSearchCV with GridSearchCV

In [1]:
import pandas as pd