## Environment setup:

Let's ensure we're using the latest version of the Vertex AI library

In [None]:
! pip3 install --upgrade google-cloud-aiplatform --user -q --no-warn-script-location

## Restart the kernel after install

*After restart, continue to the cells below*

In [None]:
import os
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

# Overview

Using Vertex AI Vizier for hyperparameter tuning involves several steps. First, we'll need to create a training application, which will consists of a Python script that trains our model with given hyperparameters and then saves the trained model. This script must also report the performance of the model on the validation set, so Vertex AI Vizier can determine the best hyperparameters.

Next, we need to create a configuration file for the hyperparameter tuning job, which specifies the hyperparameters to tune and their possible values, as well as the metric to optimize.

Finally, we'll use the Vertex AI Vizier client library to submit the hyperparameter tuning job, which will run our training application with different sets of hyperparameters, and find the best ones.

# Preparation

Let's set some inmitial variables and put our dataset in place.

Set variables related to our environment.

* Replace **YOUR_PROJECT_ID** with your project ID.
* Replace **YOUR_PREFERRED_REGION** with your preferred region.
* Replace **YOUR_BUCKET_NAME** with your bucket name. (Hint: you can use any bucket you created already in this book.)

In [1]:
PROJECT_ID="YOUR_PROJECT_ID"
REGION="YOUR_PREFERRED_REGION"
BUCKET="YOUR_BUCKET_NAME"
BUCKET_URI=f"gs://{BUCKET}"
APP_NAME="fraud-detect"
APPLICATION_DIR = "vizier"
TRAINER_DIR = f"{APPLICATION_DIR}/trainer"

Copy the dataset to GCS so our code can access it later:

In [2]:
!gsutil cp ./data/creditcard.csv $BUCKET_URI/creditcard.csv

Copying file://./data/creditcard.csv [Content-Type=text/csv]...
| [1 files][143.8 MiB/143.8 MiB]                                                
Operation completed over 1 objects/143.8 MiB.                                    


## Containerize the training application code

Before we can run a hyperparameter tuning job, we need to create a source code file (training script) and a Dockerfile. The source code trains a model using XGBoost, and the Dockerfile will include all the commands needed to run the container image.

It will install all of the libraries required by our training script, and set up the entry point for the training code.

First, let's create a couple of directories that we'll use, and import and initialize the Google Cloud AI Platfrom client library.

In [3]:
!mkdir -p $APPLICATION_DIR
!mkdir -p $TRAINER_DIR

In [4]:
import google.cloud.aiplatform as aiplatform
from google.cloud.aiplatform import hyperparameter_tuning as hpt

In [5]:
# Initialize the AI Platform client
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

### Create the training application (train.py)

The code in the next cell will create our training script.

**Important notes for our training code:**

*Notes related to XGBoost:*

* DMatrix is a data structure used by XGBoost that is optimized for both memory efficiency and training speed. We will convert our training, validation, and test datasets into DMatrix format before training the model.

* The param dictionary contains the parameters for the XGBoost model. eta is the learning rate, max_depth is the maximum depth of the trees, objective is the loss function to be minimized, and random_state is a seed for the random number generator for reproducibility.

* num_round is the number of rounds of training, equivalent to the number of trees in the model.

* The train function trains the model, and the predict function generates predictions. The predictions are probabilities of the positive class (fraudulent transactions), so they are between 0 and 1. We can convert these to class labels (0 or 1) by rounding them to the nearest integer (in reality, we could choose a different threshold depending on the business requirements).

*Notes related to training and tuning with Vertex AI Vizier:*

* We use the [cloudml-hypertune](https://github.com/GoogleCloudPlatform/cloudml-hypertune) Python package to pass metrics to Vertex AI. To learn more about this process, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/code-requirements#hp-tuning-metric).

* For hyperparameter tuning, Vertex AI runs our training code multiple times, with different command-line arguments each time. Our training code must parse these command-line arguments and use them as hyperparameters for training.. To learn more about this process, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/code-requirements#command-line-arguments).

**IMPORTANT:** Replace **YOUR_BUCKET_NAME** with your bucket name. 
This is because *writefile* will write the contents of this cell directly to file; it will not parse variables from earlier in this notebook.

In [6]:
%%writefile {TRAINER_DIR}/train.py

import argparse
import pandas as pd
import xgboost as xgb
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from google.cloud import storage
from hypertune import HyperTune

data_location='gs://YOUR_BUCKET_NAME/creditcard.csv'

def train_model(data, max_depth, eta, gamma):
    X = data.iloc[:,:-1]
    y = data.iloc[:,-1]
    
    # Split the data into training and test sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Split the non-training data into validation and test sets
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)
        
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    dtest = xgb.DMatrix(X_test, label=y_test)

    params = {
        'max_depth': max_depth,
        'eta': eta,
        'gamma': gamma,
        'objective': 'binary:logistic',
        'nthread': 4,
        'eval_metric': 'auc'
    }
    
    evallist = [(dval, 'eval')]

    num_round = 10
    model = xgb.train(params, dtrain, num_round, evallist)
    
    preds = model.predict(dtest)
    auc = roc_auc_score(y_test, preds)

    hpt = HyperTune()
    hpt.report_hyperparameter_tuning_metric(
        hyperparameter_metric_tag='auc',
        metric_value=auc,
        global_step=1000)

    return model

def get_args():
    parser = argparse.ArgumentParser(description='XGBoost Hyperparameter Tuning')
    parser.add_argument('--max_depth', type=int, default=3)
    parser.add_argument('--eta', type=float, default=0.3)
    parser.add_argument('--gamma', type=float, default=0)
    args = parser.parse_args()
    return args

def main():
    args = get_args()
    data = pd.read_csv(data_location)
    model = train_model(data, args.max_depth, args.eta, args.gamma)

if __name__ == "__main__":
    main()


Writing vizier5/trainer/train.py


### Write Dockerfile and requirements.txt

After writing our training code, we will next create a Dockerfile, which will include all the commands needed to run our container image, and the requirements.txt file, which specifies all the necessary libraries to install. The Dockerfile will also be used to specify and set up the entry point for the training code.

In [7]:
%%writefile {APPLICATION_DIR}/requirements.txt

numpy==1.19.2
pandas==1.1.3
scikit-learn==0.24.2
xgboost==1.4.2
tensorflow==2.6.0
google-cloud-aiplatform==1.0.1
cloudml-hypertune==0.1.0.dev6
fsspec==2023.5.0
gcsfs==2023.5.0

Writing vizier5/requirements.txt


In [8]:
%%writefile {APPLICATION_DIR}/Dockerfile

# Use an official Python runtime as a parent image
FROM python:3.8-slim-buster

WORKDIR /

COPY requirements.txt /requirements.txt

# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

# Copies the trainer code to the Docker image.
COPY trainer /trainer

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python", "-m", "trainer.train"]


Writing vizier5/Dockerfile


### Build the container and put it in Artifact Registry
Next, we'll create a Docker repository in Artifact Registry, build our container, and push it to the newly-created repository.

In [9]:
REPO_NAME=f'{APP_NAME}-app'

!gcloud artifacts repositories create $REPO_NAME --repository-format=docker \
--location=$REGION --description="Docker repository"

! gcloud auth configure-docker $REGION.pkg.dev --quiet

Create request issued for: [fraud-detect2-app]
Waiting for operation [projects/still-sight-352221/locations/us-central1/operat
ions/7e73e876-1844-473d-970a-90e72d6d37fe] to complete...done.                 
Created repository [fraud-detect2-app].

{
  "credHelpers": {
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: .pkg.dev
gcloud credential helpers already registered correctly.


In [10]:
IMAGE_URI = (
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{APP_NAME}:latest"
)

In [11]:
cd $APPLICATION_DIR

/home/jupyter/vizier5


In [12]:
! docker build ./ -t $IMAGE_URI

Sending build context to Docker daemon  6.144kB
Step 1/7 : FROM python:3.8-slim-buster
 ---> addd6962740a
Step 2/7 : WORKDIR /
 ---> Using cache
 ---> 2365821df1bb
Step 3/7 : COPY requirements.txt /requirements.txt
 ---> Using cache
 ---> a19cbd165aff
Step 4/7 : RUN pip install --upgrade pip
 ---> Using cache
 ---> 78fc604555b2
Step 5/7 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
 ---> ac640a23924e
Step 6/7 : COPY trainer /trainer
 ---> Using cache
 ---> 989379d7f2ad
Step 7/7 : ENTRYPOINT ["python", "-m", "trainer.train"]
 ---> Using cache
 ---> d44444d5ba87
Successfully built d44444d5ba87
Successfully tagged us-central1-docker.pkg.dev/still-sight-352221/fraud-detect2-app/fraud-detect2:latest


In [13]:
! docker push $IMAGE_URI

The push refers to repository [us-central1-docker.pkg.dev/still-sight-352221/fraud-detect2-app/fraud-detect2]

[1Bf4748cb5: Preparing 
[1B09d09293: Preparing 
[1B564bdc85: Preparing 
[1B0452241d: Preparing 
[1B004ee77f: Preparing 
[1B8e79e84f: Preparing 
[1B512b6f71: Preparing 
[1B55769c5e: Preparing 
[1B8a51359d: Mounted from still-sight-352221/fraud-detect-app/fraud-detect [7A[2K[6A[2K[4A[2K[3A[2K[2A[2K[1A[2Klatest: digest: sha256:8c2dc0f0fd3751e39ae9f28d1f6dc039167d7a34f08e73733b597ced374a3145 size: 2208


## Configure a hyperparameter tuning job
Now that our training application code is containerized, it's time to specify and run the hyperparameter tuning job.

To create the hyperparameter tuning job, we need to first define the worker_pool_specs, which specifies the machine type and Docker image to use. The following spec includes one n1-standard-4 machine. (For more details, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#WorkerPoolSpec).)

In [14]:
# The spec for the worker pools, including machine type and Docker image

worker_pool_specs = [
    {
        "machine_spec": {
            "machine_type": "n1-standard-4",
        },
        "replica_count": 1,
        "container_spec": {
            "image_uri": f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{APP_NAME}:latest"
        },
    }
]

## Define our custom job spec and hyperparameter tuning spec.
Next, we define our custom job spec (referencing the worker pool specs we just created), and our hyperparameter tuning spec, which includes details such as the hyperparameters and the metrics we want to optimize. (For more details, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#aiplatform_create_hyperparameter_tuning_job_python_package_sample-python).)

**IMPORTANT:** If you named your service account anything other than **ai-ml-sa** when you created it at the beginning of Chapter 8 then you will need to replace it in this code cell. (If you followed the recommended naming then you do not need to make a change here.)

In [15]:
# Define custom job
custom_job = aiplatform.CustomJob(
    display_name="xgboost_train",
    worker_pool_specs=worker_pool_specs
)

# Specify service account
service_account_email = f"ai-ml-sa@{PROJECT_ID}.iam.gserviceaccount.com"

# Set the custom service account in the job config
custom_job.service_account_email = service_account_email

In [16]:
# Define the hyperparameter tuning spec
hpt_job = aiplatform.HyperparameterTuningJob(
    display_name="xgboost_hpt",
    custom_job=custom_job,
    metric_spec={
        "auc": "maximize",
    },
    parameter_spec={
        "eta": aiplatform.hyperparameter_tuning.DoubleParameterSpec(min=0.01, max=0.3, scale='unit'),
        "max_depth": aiplatform.hyperparameter_tuning.IntegerParameterSpec(min=3, max=10, scale='unit'),
        "gamma": aiplatform.hyperparameter_tuning.DoubleParameterSpec(min=0, max=1, scale='unit'),
    },
    max_trial_count=20,
    parallel_trial_count=5,
)

# Run the hyperparameter tuning job

The following cell will run our job. Considering that the tuning job will include many trials, it may run for a long time (perhaps an hour or two). The output of this cell will display a link that will enable you to view the status of the tuning job in the Google Cloud console. The output of this cell will also repetitively display the current status of the tuning job every few seconds here in this notebook. Wait until the current status says "JOB_STATE_SUCCEEDED HyperparameterTuningJob run completed", and then we will inspect the optimized hyperparameters.

In [17]:
# Run the hyperparameter tuning job
hpt_job.run()

Creating HyperparameterTuningJob
HyperparameterTuningJob created. Resource name: projects/96449483013/locations/us-central1/hyperparameterTuningJobs/8611172621163167744
To use this HyperparameterTuningJob in another session:
hpt_job = aiplatform.HyperparameterTuningJob.get('projects/96449483013/locations/us-central1/hyperparameterTuningJobs/8611172621163167744')
View HyperparameterTuningJob:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/8611172621163167744?project=96449483013
HyperparameterTuningJob projects/96449483013/locations/us-central1/hyperparameterTuningJobs/8611172621163167744 current state:
JobState.JOB_STATE_PENDING
HyperparameterTuningJob projects/96449483013/locations/us-central1/hyperparameterTuningJobs/8611172621163167744 current state:
JobState.JOB_STATE_RUNNING
HyperparameterTuningJob projects/96449483013/locations/us-central1/hyperparameterTuningJobs/8611172621163167744 current state:
JobState.JOB_STATE_RUNNING
HyperparameterTuningJob pro

# Extract the best hyperparameters

In the next cell, we will get a list of all of the trials from our tuning job, then find the best-performing trial, and extract its hyperparameters.

In [18]:
client = aiplatform.gapic.JobServiceClient(client_options={"api_endpoint": f"{REGION}-aiplatform.googleapis.com"})

parent = f"projects/{PROJECT_ID}/locations/{REGION}"
response = client.list_hyperparameter_tuning_jobs(parent=parent)

# The best_trial variable will hold the optimal trial
best_trial = None
best_trial_value = None
current_trial_value = None

for hpt_job in response:
    if hpt_job.display_name == 'xgboost_hpt':
        # Assume the first trial is the best for initialization
        best_trial = hpt_job.trials[0]
        
        # Iterate over trials to find the best one
        for trial in hpt_job.trials:
            for metric in trial.final_measurement.metrics:
                if metric.metric_id == 'auc': 
                    current_trial_value = metric.value
            for metric in best_trial.final_measurement.metrics:
                if metric.metric_id == 'auc':  
                    best_trial_value = metric.value
                    
            if current_trial_value < best_trial_value:
                best_trial = trial

# Extract hyperparameters of the best trial
best_hyperparameters = best_trial.parameters

print(f"Best hyperparameters: {best_hyperparameters}")

Best hyperparameters: [parameter_id: "eta"
value {
  number_value: 0.155
}
, parameter_id: "gamma"
value {
  number_value: 0.5
}
, parameter_id: "max_depth"
value {
  number_value: 7.0
}
]


# Train a model with the best hyperparameters

Now, let's train a model with the best hyperparameters that were produced by our tuning job.

Note that **num_boost_round** is not a parameter of the model, but rather a parameter of the training function, so we will handle it separately. We will also convert it to an integer type here.

In [19]:
best_hyperparameters = best_trial.parameters

best_params = {}
for param in best_hyperparameters:
    param_id = param.parameter_id
    if param_id == 'num_boost_round':
        best_params[param_id] = int(param.value)
    else:
        best_params[param_id] = param.value

## Install XGboost

Let's install XGBoost so we can train a model directly here in our notebook (remember that our previous training jobs happened in a Docker container that we had created.)

In [20]:
!pip install xgboost

[0m

## Train the model

We will use a modified version of our earlier training code. In this case, we will directly provide the "best_params" to the training job.

The ouput of this cell will show us the ROC-AUC score achieved against the **validation** dataset for each training round (specified by num_round).

Finally, we will evaluate our model against the **test** dataset, and print the resulting ROC-AUC score for that.

In [21]:
import pandas as pd
import xgboost as xgb
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

data_location=f'{BUCKET_URI}/creditcard.csv'

def train_model(data, hyperparameters):
    X = data.iloc[:,:-1]
    y = data.iloc[:,-1]
    
    # Split the data into training and test sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Split the non-training data into validation and test sets
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)
        
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    dtest = xgb.DMatrix(X_test, label=y_test)

    # Convert max_depth to int (xgboost expects it as an int)
    hyperparameters['max_depth'] = int(hyperparameters['max_depth'])

    hyperparameters.update({
        'objective': 'binary:logistic',
        'nthread': 4,
        'eval_metric': 'auc'
    })
    
    evallist = [(dval, 'eval')]

    num_round = 10
    model = xgb.train(hyperparameters, dtrain, num_round, evals=evallist)
    
    preds = model.predict(dtest)
    auc = roc_auc_score(y_test, preds)

    print(f'ROC-AUC Score on Test Set: {auc:.4f}')

    return model

def main():
    data = pd.read_csv(data_location)
    model = train_model(data, best_params)

if __name__ == "__main__":
    main()

[0]	eval-auc:0.90520
[1]	eval-auc:0.90520
[2]	eval-auc:0.90521
[3]	eval-auc:0.90522
[4]	eval-auc:0.90520
[5]	eval-auc:0.90520
[6]	eval-auc:0.90520
[7]	eval-auc:0.90520
[8]	eval-auc:0.90520
[9]	eval-auc:0.90520
ROC-AUC Score on Test Set: 0.9188


**Note:** when I ran this, I got an ROC-AUC score of 0.9188, which is pretty good!