# Hyperparameter Optimization

For this exercise, we will have a look at Hyperparameter Optimization --
instead of just choosing the best type of machine learning model, we also want
to choose the best hyperparameter setting for a task. The end result (i.e. the
predictive performance) is again not important; how you get there is.

Your deliverable will be a report, written in a style that it
would be suitable for inclusion in an academic paper as the "Experimental
Setup" section or similar. If unsure, check an academic paper of your choice,
for example [this one](https://www.eecs.uwyo.edu/~larsko/papers/pulatov_opening_2022-1.pdf). The
level of detail should be higher than in a typical academic paper though. Your
report should be at most five pages, including references and figures but
excluding appendices. It should have the following structure:
- Introduction: What problem are you solving, how are you going to solve it.
- Dataset Description: Describe the data you're using, e.g. how many features and observations, what are you predicting, any missing values, etc.
- Experimental Setup: What specifically are you doing to solve the problem, i.e.\ what programming languages and libraries, how are you processing the data, what machine learning algorithms are you considering and what hyperparameters and value ranges, what measures you are using to evaluate them, what hyperparameter optimization method you chose, etc.
- Results: Description of what you observed, including plots. Compare
  performance before and after tuning, and show the best configuration.
- Code: Add the code you've used as a separate file.

Your report must contain enough detail to reproduce what you did without the
code. If in doubt, include more detail.

There is no required format for the report. You could, for example, use an
iPython notebook.

## Data and Setup

We will have a look at the [Wine Quality
dataset](https://archive-beta.ics.uci.edu/dataset/186/wine+quality). Choose the
one that corresponds to your preference in wine. You may also use a dataset of
your choice, for example one that's relevant to your research.

Choose a small number of different machine learning algorithms and
hyperparameters, along with value ranges, for each. You can use implementations
of AutoML systems (e.g. auto-sklearn), scientific papers, or the documentation
of the library you are using to determine the hyperparameters to tune and the
value ranges. Note that there is not only a single way to do this, but define a
reasonable space (e.g. don't include whether to turn on debug output, or random
forests with 1,000,000 trees, or tune the loss function). Your hyperparameter
search space should be so large that you cannot simply run a grid search.

Determine the best machine learning algorithm and hyperparameter setting for
your dataset. Make sure to optimize both the type of machine learning algorithm
and the hyperparameters at the same time (do not first choose the best ML
algorithm and then optimize its hyperparameters). Choose a suitable
hyperparameter optimizer; you could also use several and e.g. compare the
results achieved by random search and Bayesian optimization. Make sure that the
way you evaluate model performance avoids bias and overfitting. You could use
statistical tests to make this determination.

## Submission

Add your report and code to this repository. Bonus points if you can set up a
Github action to automatically run the code and generate the report!

## Useful Resources :
- "*Basics of HPO - Example and Practical Hints*" -From the AutoML Course Videos
- https://www.youtube.com/watch?v=Gol_qOgRqfA
- https://www.youtube.com/watch?v=0wUF_Ov8b0A&t=1058s

## Importing the Dataset as a Pandas Dataframe

In [None]:
import pandas as pd
import numpy as np

In [None]:
red_wine_df = pd.read_csv('winequality-red.csv', delimiter=';')

In [None]:
red_wine_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [None]:
X = red_wine_df.iloc[:, :-1]
y = red_wine_df['quality']

X.shape, y.shape

((1599, 11), (1599,))

## Importing our Models (Logistic Regression)

### Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

logistic_regression_model = LogisticRegression(solver='liblinear')

In [None]:
logistic_regression_model.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'liblinear',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

## Helpful Data Scaling for Faster Convergence

In [None]:
# Due to some speed issues and after googling the issue...
# Some resources suggest scaling the data, so...
# Carry out important pre-processing for SVC :

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

## Hyperparameter Optimization

Methods Used :
- (BOHB) Bayesian Optimization with Hyper Band
- Bayesian Optimization

### (BOHB) Bayesian Optimization with Hyper Band

In [None]:
# Comment out this line to download required package for the HPO method:
!pip install hpbandster-sklearn

Collecting hpbandster-sklearn
  Downloading hpbandster_sklearn-2.0.2-py3-none-any.whl (27 kB)
Collecting hpbandster (from hpbandster-sklearn)
  Downloading hpbandster-0.7.4.tar.gz (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.3/51.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting ConfigSpace (from hpbandster-sklearn)
  Downloading ConfigSpace-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
Collecting Pyro4 (from hpbandster->hpbandster-sklearn)
  Downloading Pyro4-4.82-py2.py3-none-any.whl (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting serpent (from hpbandster->hpbandster-sklearn)
  Downloading serpent-1.41-py3-none-any.whl (9.6 kB)
Collecting netifaces

In [None]:
from hpbandster_sklearn import HpBandSterSearchCV
import ConfigSpace as CS
import ConfigSpace.hyperparameters as CSH

# Construct the hyperparameter distribution:

lbfgs_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l2", "None"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["lbfgs"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

liblinear_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l1", "l2"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["liblinear"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

newtoncg_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l2", "None"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["newton-cg"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

newtoncholesky_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l2", "None"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["newton-cholesky"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

sag_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l2", "None"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["sag"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

saga_hpo_distribution = {
    "penalty" : CS.Categorical("penalty", ["l1", "l2", "elasticnet", "None"]),
    "C" : CS.Float("C", bounds=(0.1, 1000)),
    "solver" : CS.Categorical("solver", ["saga"]),
    "max_iter" : CS.Integer("max_iter", bounds=(1000, 5000)),
}

In [None]:
RANDOM_STATE = 90

lbfgs_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = lbfgs_hpo_distribution,
)

liblinear_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = liblinear_hpo_distribution,
)

newtoncg_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = newtoncg_hpo_distribution,
)

newtoncholesky_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = newtoncholesky_hpo_distribution,
)

sag_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = sag_hpo_distribution,
)

saga_param_distributions = CS.ConfigurationSpace(
    seed=RANDOM_STATE,
    space = saga_hpo_distribution,
)

In [None]:
lbfgs_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    lbfgs_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

liblinear_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    liblinear_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

newtoncg_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    newtoncg_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

newtoncholesky_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    newtoncholesky_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

sag_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    sag_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

saga_bohb_search = HpBandSterSearchCV(
    logistic_regression_model,
    saga_param_distributions,
    scoring='accuracy',
    cv=10,
    optimizer='bohb',
    random_state=RANDOM_STATE,
    n_jobs=1,
    n_iter=15,
    verbose=1,
)

In [None]:
lbfgs_bohb_search.fit(X_scaled, y)
liblinear_bohb_search.fit(X_scaled, y)
newtoncg_bohb_search.fit(X_scaled, y)
newtoncholesky_bohb_search.fit(X_scaled, y)
sag_bohb_search.fit(X_scaled, y)
saga_bohb_search.fit(X_scaled, y)

WORKER: start listening for jobs
INFO:hpbandster_sklearn.HpBandSterSearchCV:WORKER: start listening for jobs
HBMASTER: adjusted queue size to (0, 1)
INFO:hpbandster_sklearn.HpBandSterSearchCV:HBMASTER: adjusted queue size to (0, 1)
HBMASTER: starting run at 1711264867.488352
INFO:hpbandster_sklearn.HpBandSterSearchCV:HBMASTER: starting run at 1711264867.488352
WORKER: start processing job (0, 0, 0)
INFO:hpbandster_sklearn.HpBandSterSearchCV:WORKER: start processing job (0, 0, 0)
WORKER: registered result for job (0, 0, 0) with dispatcher
INFO:hpbandster_sklearn.HpBandSterSearchCV:WORKER: registered result for job (0, 0, 0) with dispatcher
WORKER: start processing job (0, 0, 1)
INFO:hpbandster_sklearn.HpBandSterSearchCV:WORKER: start processing job (0, 0, 1)
WORKER: registered result for job (0, 0, 1) with dispatcher
INFO:hpbandster_sklearn.HpBandSterSearchCV:WORKER: registered result for job (0, 0, 1) with dispatcher
WORKER: start processing job (0, 0, 2)
INFO:hpbandster_sklearn.HpBand

In [None]:
lbfgs_bohb_search.best_score_ , lbfgs_bohb_search.best_params_

(0.5903655660377358,
 {'C': 1.3110370609184976,
  'max_iter': 4101,
  'penalty': 'l2',
  'solver': 'lbfgs'})

In [None]:
liblinear_bohb_search.best_score_ , liblinear_bohb_search.best_params_

(0.5841155660377357,
 {'C': 153.13889334627027,
  'max_iter': 1630,
  'penalty': 'l2',
  'solver': 'liblinear'})

In [None]:
newtoncg_bohb_search.best_score_ , newtoncg_bohb_search.best_params_

(0.5903655660377358,
 {'C': 1.3110370609184976,
  'max_iter': 4101,
  'penalty': 'l2',
  'solver': 'newton-cg'})

In [None]:
newtoncholesky_bohb_search.best_score_ , newtoncholesky_bohb_search.best_params_

(0.5841155660377357,
 {'C': 186.15965484077427,
  'max_iter': 2071,
  'penalty': 'l2',
  'solver': 'newton-cholesky'})

In [None]:
sag_bohb_search.best_score_ , sag_bohb_search.best_params_

(0.5878655660377358,
 {'C': 186.15965484077427, 'max_iter': 2071, 'penalty': 'l2', 'solver': 'sag'})

In [None]:
saga_bohb_search.best_score_ , saga_bohb_search.best_params_

(0.5884905660377358,
 {'C': 3.537424765457583, 'max_iter': 1016, 'penalty': 'l2', 'solver': 'saga'})

### Bayesian Optimization

In [None]:
# Comment out this line to install the necessary library for Bayesian Optimization:
!pip install baytune

Collecting baytune
  Downloading baytune-0.5.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.2/75.2 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting copulas>=0.3.2 (from baytune)
  Downloading copulas-0.10.1-py3-none-any.whl (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: copulas, baytune
Successfully installed baytune-0.5.0 copulas-0.10.1


In [None]:
models = {
    'LR_LBFGS': LogisticRegression,
    'LR_LIBLINEAR': LogisticRegression,
    'LR_NEWTONCG': LogisticRegression,
    'LR_NEWTONCHOLESKY': LogisticRegression,
    'LR_SAG': LogisticRegression,
    'LR_SAGA': LogisticRegression,
}

In [None]:
from sklearn.model_selection import cross_val_score

def scoring_function(model_name, hyperparameter_values):

  model_class = models[model_name]
  model_instance = model_class(**hyperparameter_values)
  scores = cross_val_score(
    cv=10,
    estimator=model_instance,
    X=X_scaled,
    y=y,
    scoring='accuracy',
  )

  return scores.mean()

In [None]:
from baytune.tuning import Tunable
from baytune.tuning import hyperparams as hp

tunables = {
    'LR_LBFGS': Tunable({
        'penalty': hp.CategoricalHyperParam(["l2", None]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["lbfgs"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
    'LR_LIBLINEAR': Tunable({
        'penalty': hp.CategoricalHyperParam(["l1", "l2"]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["liblinear"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
    'LR_NEWTONCG': Tunable({
        'penalty': hp.CategoricalHyperParam(["l2", None]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["newton-cg"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
    'LR_NEWTONCHOLESKY': Tunable({
        'penalty': hp.CategoricalHyperParam(["l2", None]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["newton-cholesky"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
    'LR_SAG': Tunable({
        'penalty': hp.CategoricalHyperParam(["l2", None]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["sag"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
    'LR_SAGA': Tunable({
        'penalty': hp.CategoricalHyperParam(["l1", "l2", "elasticnet", None]),
        'C': hp.FloatHyperParam(min=0.1, max=1000),
        'solver': hp.CategoricalHyperParam(["saga"]),
        'max_iter' : hp.IntHyperParam(min=1000, max=5000),
    }),
}

In [None]:
from baytune import BTBSession

session = BTBSession(
    tunables=tunables,
    scorer=scoring_function,
    verbose=True,
)

In [None]:
best_result = session.run(50)

  0%|          | 0/50 [00:00<?, ?it/s]

ERROR:baytune.session:Proposal 13 - LR_SAGA crashed with the following configuration: penalty: elasticnet
C: 553.2043671843808
solver: saga
max_iter: 4115
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/baytune/session.py", line 364, in run
    score = self._scorer(tunable_name, config)
  File "<ipython-input-21-774259d254ad>", line 7, in scoring_function
    scores = cross_val_score(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_validation.py", line 515, in cross_val_score
    cv_results = cross_validate(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_validation.py", line 285, in cross_validate
    _warn_or_raise_about_fit_failures(results, error_score)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_validation.py", line 367, in _warn_or_raise_about_fit_failures
    raise ValueError(all_fits_failed_message)
ValueError: 
All the 10 fits failed.
It is very likely that your mo

In [None]:
best_result

{'id': '6863f987a4931d7b2ebc32bd878099c1',
 'name': 'LR_SAGA',
 'config': {'penalty': 'l1', 'C': 0.1, 'solver': 'saga', 'max_iter': 1000},
 'score': 0.5878655660377359}