# Train a Model using Vertex AI

In this notebook we will be training a model on the iris data using Vertex AI. Here are the steps that you need to do:
1. You will first have to fetch the data from Big Query and create a tabular dataset
2. You will next have to write a training script
3. You will then need to create and submit a training job using vertex AI

Bonus: After the training is done, can you fetch the results of training using the Vertex AI API

**Note**: You will need to finish the cells marked with TODO.

In [None]:
! pip3 install --upgrade google-cloud-aiplatform --user -q

#### Restart the kernel

After you install the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

#### Setting the project ID and region

In [None]:
shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
PROJECT_ID = shell_output[0]
print("Project ID:", PROJECT_ID)

REGION = "us-central1"

In [None]:
! gcloud config set project $PROJECT_ID

#### UUID

Some resources like the cloud bucket will need to have a unique name. An easy way to do that is to use a UUID.

In [None]:
import random
import string


# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

#### Create a Cloud Storage bucket

In [None]:
BUCKET_NAME = "[your-bucket-name]"
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = PROJECT_ID + "aip-" + UUID
    BUCKET_URI = f"gs://{BUCKET_NAME}"

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Import libraries

In [None]:
from google.cloud import bigquery

import google.cloud.aiplatform as aiplatform
from google.cloud.aiplatform import hyperparameter_tuning as hpt

### Initialize Vertex AI SDK and BQ Client for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
bq_client = bigquery.Client(project=PROJECT_ID)

In [None]:
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

### TODO: Put Data into bucket

In the cells below write code that will fetch the iris data from Big Query. Then create a dataset in Vertex AI that you can use for training.

In [None]:
#TODO

BQ_SOURCE = "bigquery-public-data.ml_datasets.iris"

### TODO: Write the training script

In the cell below, write a script to train a model on the iris data.

Remember that the data needs to be fetched from the `AIP_TRAINING_DATA_URI`, `AIP_VALIDATION_DATA_URI`, `AIP_TEST_DATA_URI` for training, validating and testing respectively. The trained model needs to be saved to the location in the AIP_MODEL_DIR.


Note: The `%%writefile` magic function will take the contents of the cell and save it as a file.

In [None]:
%%writefile iris_training.py

# Read environmental variables
training_data_uri = os.getenv("AIP_TRAINING_DATA_URI")
validation_data_uri = os.getenv("AIP_VALIDATION_DATA_URI")
test_data_uri = os.getenv("AIP_TEST_DATA_URI")
model_save_dir=os.getenv("AIP_MODEL_DIR")

### TODO: Create and Submit a Training Job

Below, create a training job and submit it for training

### Remember to delete the jobs and models you created to save training costs

In [None]:
# Delete the training job
job.delete()

# Delete the model
model.delete()