# Vertex AI Model Packaging

## 1. Quickdraw source distribution

Directory tree structure exemple :

In [None]:
quickdraw_classifier/
├── quickdraw_classifier/
│   ├──__init__.py 
│   ├──io_handler.py
│   ├──model.py
│   ├──training.py
│   └──utils.py
└── setup.py
submit_vertex_custom_training_job.sh
submit_vertex_hp_tuning_job.sh

setup.py defines how to create a source districution.
More on that : [Create a source distribution](https://cloud.google.com/vertex-ai/docs/training/create-python-pre-built-container#create_a_source_distribution)

Start by creating a REGIONAL bucket in europe-west1 for your experiments. <br>
Choose a USERNAME you are going to use for all your resources. <br>
Bucket name : **$USERNAME-devoxx_quickdraw**

Upload generated package to
*gs://< USERNAME >-devoxx_quickdraw/vertex_job_code/quickdraw_classifier-0.0.1.tar.gz*

## 2. Custom Training Job

CustomJob executes the training application and creates a model resource that facilitates deployment. Mode on creating and submitting a CustomJob : [CustomJob and model upload](https://cloud.google.com/vertex-ai/docs/training/create-training-pipeline#custom-job-model-upload)

Create **submit_vertex_custom_training_job.sh**

Let's start by defining all the variables.<br>
**!! PLEASE do not change machine and accelarator types. !!**

In [None]:
#!/bin/bash
REGION='europe-west1'
PROJECT_NAME='par-devoxx-sfeir'
SERVICE_ACCOUNT='sa-vertex@par-devoxx-sfeir.iam.gserviceaccount.com'

USERNAME='<TO DEFINE>'

DATE=$(date +"%Y%m%d_%H%M%S")
TRAINING_JOB_NAME="$USERNAME-quickdraw_training_$DATE"
MODEL_DISPLAY_NAME='$USERNAME-quickdraw_model_v01'

REPLICA_COUNT='1'
TRAINING_MACHINE_TYPE='n1-standard-8'
ACCELERATOR_TYPE='NVIDIA_TESLA_K80'
ACCELERATOR_COUNT='1'

TRAINING_EXECUTOR_IMAGE_URI='europe-docker.pkg.dev/vertex-ai/training/tf-gpu.2-8:latest'
PREDICTION_EXECUTOR_IMAGE_URI='europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest'
PYTHON_MODULE='quickdraw_classifier.training'


GCS_TRAINING_DATA='gs://devoxx_quickdraw/tfrecord_data/training_data/'
GCS_VALIDATION_DATA='gs://devoxx_quickdraw/tfrecord_data/validation_data/'
GCS_MODEL_DATA_PATH="gs://$USERNAME-devoxx_quickdraw/gcs_model_data/quickdraw_classifier_$DATE/"
PACKAGE_URI='gs://$USERNAME-devoxx_quickdraw/vertex_job_code/quickdraw_classifier-0.0.1.tar.gz'

You should pass to training pipeline the following arguments :
- batch_size = 50
- validation_batch_size = 20
- validation_ds_size = 5000
- img_height = 64
- img_width = 64
- nb_classes = 5

As well as set environment variables : 
- GCS_TRAINING_DATA
- GCS_VALIDATION_DATA
- GCS_MODEL_DATA_PATH

In [None]:
#!/bin/bash
TRAINING_PIPELINE_REQUEST=\
"{
  'displayName': '$TRAINING_JOB_NAME',
  'trainingTaskDefinition': 'gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml',
  'trainingTaskInputs': {
    'serviceAccount': '$SERVICE_ACCOUNT',
    'baseOutputDirectory': {
      'outputUriPrefix': '$GCS_MODEL_DATA_PATH',
    },
    'workerPoolSpecs': [
      {
        'machineSpec': {
          'machineType': '$TRAINING_MACHINE_TYPE',
          <TO DEFINE>
        },
        'replicaCount': '$REPLICA_COUNT',
        'pythonPackageSpec': {
          'executorImageUri': '$TRAINING_EXECUTOR_IMAGE_URI',
          'packageUris': ['$PACKAGE_URI'],
          'pythonModule': '$PYTHON_MODULE',
          'args': [
            <TO DEFINE>
          ],
          'env': [
            <TO DEFINE>
          ]
        }
      }
    ],
  },
  'modelToUpload': {
    'displayName': '$MODEL_DISPLAY_NAME',
    'containerSpec': {
      'imageUri': '$PREDICTION_EXECUTOR_IMAGE_URI',
    },
  },
}"

The last step is to save the request body in a *training_request.json* file and make a POST request.

In [None]:
#!/bin/bash
echo $TRAINING_PIPELINE_REQUEST > training_request.json

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @training_request.json \
"https://$REGION-aiplatform.googleapis.com/v1/projects/$PROJECT_NAME/locations/$REGION/trainingPipelines"

## 3. Hyperparameter Tuning Job

Hyperparameter tuning job runs trials of your training job with different sets of hyperparameters. More on creating and submitting hyperparameter tuning job on Vertex AI : [Create a hyperparameter tuning job](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#create)

Create a **submit_vertex_hp_tuning_job.sh**

Start with the same set of variable, but don't forget to change the job name.

In [None]:
#!/bin/bash
REGION="europe-west1"
PROJECT_NAME="par-devoxx-sfeir"
SERVICE_ACCOUNT="sa-vertex@par-devoxx-sfeir.iam.gserviceaccount.com"

USERNAME="<TO DEFINE>"

DATE=$(date +"%Y%m%d_%H%M%S")
TRAINING_JOB_NAME="$USERNAME-quickdraw_hp_tunning_$DATE"

REPLICA_COUNT="1"
TRAINING_MACHINE_TYPE="n1-standard-8"
ACCELERATOR_TYPE="NVIDIA_TESLA_K80"
ACCELERATOR_COUNT="1"

TRAINING_EXECUTOR_IMAGE_URI="europe-docker.pkg.dev/vertex-ai/training/tf-gpu.2-8:latest"
PREDICTION_EXECUTOR_IMAGE_URI="europe-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest"
PYTHON_MODULE="quickdraw_classifier.training"

GCS_TRAINING_DATA="gs://devoxx_quickdraw/tfrecord_data/training_data/"
GCS_VALIDATION_DATA="gs://devoxx_quickdraw/tfrecord_data/validation_data/"
GCS_MODEL_DATA_PATH="gs://$USERNAME-devoxx_quickdraw/gcs_model_data/quickdraw_classifier_$DATE/"
PACKAGE_URI="gs://$USERNAME-devoxx_quickdraw/vertex_job_code/quickdraw_classifier-0.0.1.tar.gz"

Define all the necessary parameters of the tuning job. <br>
Set the same argments and environment variables. <br>
In this job we are going to fine-tune :
- learning rate (lr)
- batch size (batch_size)

**metricId** should correspond to the metric saved during training using **hypertune** library. Don't forget to uncomment the corresponding code.

In [None]:
#!/bin/bash
HP_TUNING_REQUEST=\
"{
  'displayName': '$TRAINING_JOB_NAME',
  'studySpec': {
    'metrics': [
      {
        'metricId': 'val_accuracy',
        'goal': 'MAXIMIZE'
      }
    ],
    'parameters': [
      <TO DEFINE>
    ]
  },
  'maxTrialCount': 8,
  'parallelTrialCount': 2,
  'maxFailedTrialCount': 3,
  'trialJobSpec': {
    'serviceAccount': '$SERVICE_ACCOUNT',
    'workerPoolSpecs': [
      {
        'machineSpec': {
          'machineType': '$TRAINING_MACHINE_TYPE',
          'acceleratorType': '$ACCELERATOR_TYPE',
          'acceleratorCount': '$ACCELERATOR_COUNT'
        },
        'replicaCount': '$REPLICA_COUNT',
        'pythonPackageSpec': {
          'executorImageUri': '$TRAINING_EXECUTOR_IMAGE_URI',
          'packageUris': ['$PACKAGE_URI'],
          'pythonModule': '$PYTHON_MODULE',
          'args': [
            <TO DEFINE>
          ],
          'env': [
            <TO DEFINE>
          ]
        }
      }
    ],
  }
}"

The last step is to save the request body in a *hp_tuning_request.json* file and make a POST request.

In [None]:
#!/bin/bash
echo $HP_TUNING_REQUEST > hp_tuning_request.json

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @hp_tuning_request.json \
"https://$REGION-aiplatform.googleapis.com/v1/projects/$PROJECT_NAME/locations/$REGION/hyperparameterTuningJobs"