# Hyperparameter tuning

* Two experimental hyperparamter optimizers were introduced
  * `HalvingGridSearchCV`
  * `HalvingRandomSearchCV`
* Use a technique called successive halving
  * Only a subset of data is trained on the parameter combinations
  * Worst-performing candidates are filtered out by training them on a smaller subset of data
  * After each iteration, training samples increase by some factor
  * Number of candidates decreases by as much
  * Faster evaluation time (up to 10x) than GridSearch or RandomizedSearch


### Terminology

#### Hyperparameter

* A model's internal settings - set by user
* Model cannot learn these from training data
* Example: `learning_rate` in xgboost


#### Parameter Grid

* Dictionary
  * Keys - parameter names
  * Values - list of possible hyperparameters

Example:
```python
param_grid = {
    "max_depth": [3, 4, 5, 7],
    "gamma": [0, 0.25, 1],
    "scale_pos_weight": [1, 3, 5]
}
```


#### Candidate

* Single combination of all possible sets of hyperparameters in a parameter grid


#### Resources or samples

* One sample refers to a single row of training data


#### Iteration

* Any single round in which a single set of hyperparameters is used on the training data


### GridSearch

* Exhaustive, brute-force estimator
* All combinations of hyperparameters will be trained using cross-validation
* If there are 100 possible candidates and you perform 5-fold CV; the model will undergo 500 iterations
* This can be very slow for large datasets


### RandomizedSearch

* Not all parameters are tried out
* A fixed number of parameter settings `n_iter` is sampled from the specified distributions
* Compared with Grid search:
  * Shorter training time 
  * Optimal hyperparameters may not be found


### Successive Halving

* HalvingGridSearch (HGS) is like a competition among all candidates
  * First iteration, HGS trains all candidates on a small proportion of the training data
  * Second iteration, best performing candidates are given more training samples to compete
  * With each iteration, surviving candidates are given more training samples
  * Process stops when a single set of best hyperparameters remain
* Control speed of convergence by 2 arguments:
  * `min_resources` - number of data samples to use in first iteration
  * `factor` - next iteration, number of samples = factor * min_resources, number of candidates = candidates / factor
* How do you set these parameters?
  * If not careful, can end up with out enough data or too many candidates. Remaining candidates are trained on all data which is no better than GridSearch
  * Set min_resources = 'exhaust' (is default value)
  * Algorithm automatically determines best combination of min_resources with factor


  ### Compare HalvingGridSearchCV with GridSearchCV

In [5]:
import pandas as pd
from prep_rain_australia import preprocess

### Load Australian weather data

In [6]:
# Read Australian weather data
rain = pd.read_csv("data/weatherAUS.csv")
rain.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustDir,WindGustSpeed,WindDir9am,...,Humidity9am,Humidity3pm,Pressure9am,Pressure3pm,Cloud9am,Cloud3pm,Temp9am,Temp3pm,RainToday,RainTomorrow
0,2008-12-01,Albury,13.4,22.9,0.6,,,W,44.0,W,...,71.0,22.0,1007.7,1007.1,8.0,,16.9,21.8,No,No
1,2008-12-02,Albury,7.4,25.1,0.0,,,WNW,44.0,NNW,...,44.0,25.0,1010.6,1007.8,,,17.2,24.3,No,No
2,2008-12-03,Albury,12.9,25.7,0.0,,,WSW,46.0,W,...,38.0,30.0,1007.6,1008.7,,2.0,21.0,23.2,No,No
3,2008-12-04,Albury,9.2,28.0,0.0,,,NE,24.0,SE,...,45.0,16.0,1017.6,1012.8,,,18.1,26.5,No,No
4,2008-12-05,Albury,17.5,32.3,1.0,,,W,41.0,ENE,...,82.0,33.0,1010.8,1006.0,7.0,8.0,17.8,29.7,No,No


In [3]:
rain.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   Date           145460 non-null  object 
 1   Location       145460 non-null  object 
 2   MinTemp        143975 non-null  float64
 3   MaxTemp        144199 non-null  float64
 4   Rainfall       142199 non-null  float64
 5   Evaporation    82670 non-null   float64
 6   Sunshine       75625 non-null   float64
 7   WindGustDir    135134 non-null  object 
 8   WindGustSpeed  135197 non-null  float64
 9   WindDir9am     134894 non-null  object 
 10  WindDir3pm     141232 non-null  object 
 11  WindSpeed9am   143693 non-null  float64
 12  WindSpeed3pm   142398 non-null  float64
 13  Humidity9am    142806 non-null  float64
 14  Humidity3pm    140953 non-null  float64
 15  Pressure9am    130395 non-null  float64
 16  Pressure3pm    130432 non-null  float64
 17  Cloud9am       89572 non-null

### Preprocess data

* Handle missing values
* Encode categorical features
* Rescale numeric features

In [7]:
# Get the preprocessed feature and target arrays
X, y = preprocess(rain)

