In [16]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/default-of-credit-card-clients-dataset/UCI_Credit_Card.csv


### Bayesian Hyperparameter tuning with Hyperopt

In this example you will set up and run a Bayesian hyperparameter optimization process using the package Hyperopt. You will set up the domain (which is similar to setting up the grid for a grid search), then set up the objective function. Finally, you will run the optimizer over 20 iterations.

You will need to set up the domain using values:

- max_depth using quniform distribution (between 2 and 10, increasing by 2)
- learning_rate using uniform distribution (0.001 to 0.9)

Note that for the purpose of this exercise, this process was reduced in data sample size and hyperopt & GBM iterations. If you are trying out this method by yourself on your own machine, try a larger search space, more trials, more cvs and a larger dataset size to really see this in action!

In [17]:
from sklearn.model_selection import train_test_split

data = pd.read_csv('/kaggle/input/default-of-credit-card-clients-dataset/UCI_Credit_Card.csv')

X = data.drop('default.payment.next.month', axis=1)
y = data['default.payment.next.month']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [18]:
from hyperopt import hp, fmin, tpe
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

In [19]:
# Set up space dictionary with specified hyperparameters
space = {'max_depth': hp.quniform('max_depth', 1, 60, 2),'learning_rate': hp.uniform('learning_rate', 0.001,2)}

# Set up objective function
def objective(params):
    params = {'max_depth': int(params['max_depth']),'learning_rate': params['learning_rate']}
    gbm_clf = GradientBoostingClassifier(n_estimators=500, **params) 
    best_score = cross_val_score(gbm_clf, X_train, y_train, scoring='accuracy', cv=3, n_jobs=4).mean()
    loss = 1 - best_score
    return loss

# Run the algorithm
best = fmin(fn=objective,space=space, max_evals=100, rstate=np.random.default_rng(42), algo=tpe.suggest)
print(best)

100%|██████████| 100/100 [1:43:39<00:00, 62.19s/trial, best loss: 0.17935323383084578]
{'learning_rate': 0.02718826730513205, 'max_depth': 2.0}


Excellent! You succesfully built your first Bayesian hyperparameter tuning algorithm. This will be a very powerful tool for your machine learning modeling in future. Bayesian hyperparameter tuning is a new and popular method so this first taster is a valuable thing to gain experience in. You are highly encouraged to extend this example on your own!

{'learning_rate': 0.038093061276450534, 'max_depth': 2.0}

100%|██████████| 20/20 [11:43<00:00, 35.16s/trial, best loss: 0.18422885572139303]

{'learning_rate': 0.08347945438445452, 'max_depth': 4.0}

100%|██████████| 100/100 [1:43:39<00:00, 62.19s/trial, best loss: 0.17935323383084578]

{'learning_rate': 0.02718826730513205, 'max_depth': 2.0}