### Setup

inputs:

In [3]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'osn-smartcapex-404-sbx'

In [4]:
REGION = 'europe-west1'
DATANAME = 'component-name'
NOTEBOOK = 'component-name'

BASE_IMAGE = 'python:3.7-slim-buster'
TRAIN_COMPUTE = 'n1-standard-4' ## Choose wisly the compute_machine depending to the task (ex : 64 vCPUs, 240 GB RAM)

packages:

In [5]:
from google.cloud import bigquery
from google.cloud import aiplatform

import matplotlib.pyplot as plt
from datetime import datetime
import json

clients:

In [6]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

# helper function for queries
def bq_runner(query):
    return bq.query(query = query)


parameters:

In [7]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = "osn-smartcapex-data-uploaded-sbx"
URI = f"gs://{BUCKET}/{DATANAME}/{NOTEBOOK}"
DIR = f"temp/{NOTEBOOK}"

In [8]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'989544951348-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [9]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/aiplatform.admin
roles/bigquery.admin
roles/bigquery.jobUser
roles/dataflow.admin
roles/dataflow.worker
roles/editor
roles/iam.serviceAccountUser
roles/ml.admin
roles/storage.admin
roles/storage.objectAdmin
roles/storage.objectViewer
roles/workflows.admin


environment:

In [10]:
!rm -rf {DIR}
!mkdir -p {DIR}

### Training

#### Assemble Python File for Training

Create the main python trainer file as /train.py:

In [11]:
!mkdir -p {DIR}/src

In [13]:
%%writefile  {DIR}/src/mycode.py
# add here your code 

Overwriting temp/component-name/src/mycode.py


In [14]:
%%writefile  {DIR}/src/run_component.py


from google.cloud import bigquery
import pandas as pd
from multiprocessing import Pool, cpu_count
from tqdm import tqdm
import argparse
from .mycode import myfunction # import your module

# import parameters
parser = argparse.ArgumentParser()
parser.add_argument('--PROJECT_ID', dest = 'PROJECT_ID', type = str)
parser.add_argument('--DATANAME', dest = 'DATANAME', type = str)
parser.add_argument('--NOTEBOOK', dest = 'NOTEBOOK', type = str)

parser.add_argument('--my_arg', dest = 'my_arg', type = str) # all all your arguments


args = parser.parse_args()
PROJECT_ID = args.PROJECT_ID
DATANAME = args.DATANAME
NOTEBOOK = args.NOTEBOOK

my_arg = args.my_arg
print(PROJECT_ID, DATANAME, NOTEBOOK)

# client for BQ
bq = bigquery.Client(project = PROJECT_ID)

query = f"SELECT * FROM `{PROJECT_ID}.{DATANAME}.source` ORDER by cell_name, date" 

source = bq.query(query = query).to_dataframe()

output = myfunction(my_arg)

# output data - to BQ
output.to_gbq(f"{PROJECT_ID}.{DATANAME}.{NOTEBOOK}", f'{PROJECT_ID}', if_exists = 'replace')



Writing temp/component-name/src/run_component.py


### Create Custom Container

* https://cloud.google.com/vertex-ai/docs/training/create-custom-container
* https://cloud.google.com/vertex-ai/docs/training/pre-built-containers
* https://cloud.google.com/vertex-ai/docs/general/deep-learning
* https://cloud.google.com/deep-learning-containers/docs/choosing-container

Choose a Base Image

In [15]:
BASE_IMAGE # Defined above in Setup

'python:3.7-slim-buster'

#### Create the Dockerfile

A basic dockerfile thats take the base image and copies the code in and define an entrypoint - what python script to run first in this case. Add RUN entries to pip install additional packages.

In [17]:
requirements = f"""
pandas
pystan==2.19.1.1
holidays==0.24
prophet==1.1.1
pandas-gbq
google-cloud-bigquery
pyarrow
scipy
tqdm
"""
with open(f'{DIR}/requirements.txt', 'w') as f:
    f.write(requirements)

In [18]:
dockerfile = f"""
FROM {BASE_IMAGE}
WORKDIR /
## Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY fit /fit
## Sets up the entry point to invoke the trainer
ENTRYPOINT ["python", "-m", "fit.prophet"]
"""
with open(f'{DIR}/Dockerfile', 'w') as f:
    f.write(dockerfile)

#### Setup Artifact Registry

The container will need to be stored in Artifact Registry, Container Registry or Docker Hub in order to be used by Vertex AI Training jobs. This notebook will setup Artifact registry and push a local (to this notebook) built container to it.

* https://cloud.google.com/artifact-registry/docs/docker/store-docker-container-images#gcloud

Enable Artifact Registry API:

Check to see if the api is enabled, if not then enable it:

In [38]:
services = !gcloud services list --format="json" --available --filter=name:artifactregistry.googleapis.com
services = json.loads("".join(services))

if (services[0]['config']['name'] == 'artifactregistry.googleapis.com') & (services[0]['state'] == 'ENABLED'):
    print(f"Artifact Registry is Enabled for This Project: {PROJECT_ID}")
else:
    print(f"Enabeling Artifact Registry for this Project: {PROJECT_ID}")
    !gcloud services enable artifactregistry.googleapis.com

Artifact Registry is Enabled for This Project: osn-smartcapex-404-sbx


Create A Repository

Check to see if the registry is already created, if not then create it

In [39]:
check_for_repo = !gcloud artifacts repositories describe {PROJECT_ID} --location={REGION}

if check_for_repo[0].startswith('ERROR'):
    print(f'Creating a repository named {PROJECT_ID}')
    !gcloud  artifacts repositories create {PROJECT_ID} --repository-format=docker --location={REGION} --description="Vertex AI Training Custom Containers"
else:
    print(f'There is already a repository named {PROJECT_ID}')

There is already a repository named osn-smartcapex-404-sbx


Configure Local Docker to Use GCLOUD CLI

In [40]:
!gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "europe-west1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: europe-west1-docker.pkg.dev
gcloud credential helpers already registered correctly.


Build The Custom Container (local to notebook)

In [19]:
IMAGE_URI=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PROJECT_ID}/{NOTEBOOK}:latest"
IMAGE_URI

'europe-west1-docker.pkg.dev/osn-smartcapex-404-sbx/osn-smartcapex-404-sbx/component-name:latest'

In [42]:
!docker build {DIR}/model/. -t $IMAGE_URI

Sending build context to Docker daemon  27.14kB
Step 1/6 : FROM python:3.7-slim-buster
 ---> 099f4583c701
Step 2/6 : WORKDIR /
 ---> Using cache
 ---> 4d612ffb5af4
Step 3/6 : COPY requirements.txt .
 ---> Using cache
 ---> 4df0510656ec
Step 4/6 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> 97c2332f86f9
Step 5/6 : COPY fit /fit
 ---> 25d4df75b8a2
Step 6/6 : ENTRYPOINT ["python", "-m", "fit.prophet"]
 ---> Running in 1fe82d6c888c
Removing intermediate container 1fe82d6c888c
 ---> 9576406c5899
Successfully built 9576406c5899
Successfully tagged europe-west1-docker.pkg.dev/osn-smartcapex-404-sbx/osn-smartcapex-404-sbx/fbprophet_forcast:latest


Test The Custom Container (local to notebook)

In [None]:
!docker run {IMAGE_URI} --PROJECT_ID {PROJECT_ID} --DATANAME {DATANAME} --NOTEBOOK {NOTEBOOK} # add all your argument

Importing plotly failed. Interactive plots will not work.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ) + 1
osn-smartcapex-404-sbx forcasting_oss 04f
train prepared!
14:32:33 - cmdstanpy - INFO - Chain [1] start processing
14:32:33 - cmdstanpy - INFO - Chain [1] start processing
14:32:33 - cmdstanpy - INFO - Chain [1] start processing
14:32:33 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] start processing
14:32:34 - cmdstanpy - INFO - Chain [1] sta

Push The Custom Container To Artifact Registry

In [43]:
!docker push $IMAGE_URI

The push refers to repository [europe-west1-docker.pkg.dev/osn-smartcapex-404-sbx/osn-smartcapex-404-sbx/fbprophet_forcast]

[1Ba2daa481: Preparing 
[1B87f3f6d2: Preparing 
[1Bbec44759: Preparing 
[1B8d012914: Preparing 
[1Bd30bdfa9: Preparing 
[1B9f968310: Preparing 
[1B55769c5e: Preparing 
[8Ba2daa481: Pushed lready exists 6kB[4A[2K[8A[2Klatest: digest: sha256:4150f28d02e38652f6c008509ea4940a3925c2b9075c8b9347a76974184232ed size: 1998


#### Setup Training Job

In [44]:
CMDARGS = [
    "--PROJECT_ID=" + PROJECT_ID,
    "--DATANAME=" + DATANAME,
    "--NOTEBOOK=" + NOTEBOOK
] # add your arg here

MACHINE_SPEC = {
    "machine_type": TRAIN_COMPUTE,
    "accelerator_count": 0
}

WORKER_POOL_SPEC = [
    {
        "replica_count": 1,
        "machine_spec": MACHINE_SPEC,
        "container_spec": {
            "image_uri": IMAGE_URI,
            "command": [],
            "args": CMDARGS
        }
    }
]

In [45]:
customJob = aiplatform.CustomJob(
    location= 'europe-west1',
    display_name = f'{NOTEBOOK}_{DATANAME}',
    worker_pool_specs = WORKER_POOL_SPEC,
    base_output_dir = f"{URI}/{TIMESTAMP}",
    staging_bucket = f"{URI}/{TIMESTAMP}",
    labels = {'notebook':f'{NOTEBOOK}'}
)

#### Run Training Job

In [46]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    sync = False
)

Creating CustomJob
CustomJob created. Resource name: projects/989544951348/locations/europe-west1/customJobs/1210320684501172224
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/989544951348/locations/europe-west1/customJobs/1210320684501172224')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/europe-west1/training/1210320684501172224?project=989544951348
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs

In [47]:
customJob.display_name

'fbprophet_forcast_Traffic_forcasting'

CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
JobState.JOB_STATE_RUNNING
CustomJob projects/989544951348/locations/europe-west1/customJobs/1210320684501172224 current state:
Job