# Vertex SDK for Python: Vertex Forecast Model Training Example

# Install Vertex SDK for Python, Authenticate, and upload of a Dataset to your GCS bucket


After the SDK installation the kernel will be automatically restarted. You may see this error message `Your session crashed for an unknown reason` which is normal.

In [None]:
!pip3 uninstall -y google-cloud-aiplatform
!pip3 install google-cloud-aiplatform
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

In [1]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Enter your project and GCS bucket

Enter your Project Id in the cell below. Then run the cell to make sure the Cloud SDK uses the right project for all the commands in this notebook.

In [2]:
MY_PROJECT = "kubeflow-on-gcp-123"
MY_STAGING_BUCKET = "gs://aiplatform-custom"  # bucket should be in same region as Vertex AI

In [4]:
!gsutil cp gs://automl-demo-240614-lcm/iowa_liquor/iowa_daily.csv iowa_daily.csv

Copying gs://automl-demo-240614-lcm/iowa_liquor/iowa_daily.csv...
/ [1 files][ 62.0 KiB/ 62.0 KiB]                                                
Operation completed over 1 objects/62.0 KiB.                                     


In [5]:
TRAIN_FILE_NAME = "iowa_daily.csv"
!gsutil cp {TRAIN_FILE_NAME} {MY_STAGING_BUCKET}/data/

gcs_csv_path = f"{MY_STAGING_BUCKET}/data/{TRAIN_FILE_NAME}"

Copying file://iowa_daily.csv [Content-Type=text/csv]...
/ [1 files][ 62.0 KiB/ 62.0 KiB]                                                
Operation completed over 1 objects/62.0 KiB.                                     


# Initialize Vertex SDK for Python

Initialize the *client* for Vertex AI

In [6]:
from google.cloud import aiplatform

aiplatform.init(project=MY_PROJECT, staging_bucket=MY_STAGING_BUCKET)

# Create a Managed Time Series Dataset from CSV

This section will create a dataset from a CSV file stored on your GCS bucket

In [7]:
ds = aiplatform.TimeSeriesDataset.create(display_name="iowa_daily", gcs_source=[gcs_csv_path])

ds.resource_name

INFO:google.cloud.aiplatform.datasets.dataset:Creating TimeSeriesDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TimeSeriesDataset backing LRO: projects/306016756844/locations/us-central1/datasets/4995978526474633216/operations/1448751980007653376
INFO:google.cloud.aiplatform.datasets.dataset:TimeSeriesDataset created. Resource name: projects/306016756844/locations/us-central1/datasets/4995978526474633216
INFO:google.cloud.aiplatform.datasets.dataset:To use this TimeSeriesDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TimeSeriesDataset('projects/306016756844/locations/us-central1/datasets/4995978526474633216')


'projects/306016756844/locations/us-central1/datasets/4995978526474633216'

# Launch a Training Job to Create a Model

Once we have defined your training script, we will create a model.

In [9]:
job = aiplatform.AutoMLForecastingTrainingJob(
    display_name="forecast-iowa-daily",
    optimization_objective="minimize-mae",
    column_transformations=[
        {"timestamp": {"column_name": "ds"}},
        {"categorical": {"column_name": "holiday"}},
        {"numeric": {"column_name": "y"}}
    ]
)

# This will take around an hour to run
model = job.run(
    dataset=ds,
    target_column="y",
    time_column="ds",
    time_series_identifier_column="id",
    unavailable_at_forecast_columns=["y"],
    available_at_forecast_columns=["ds", "holiday"],
    forecast_horizon=7,
    data_granularity_unit="day",
    data_granularity_count=1,
    context_window=30,
    export_evaluated_data_items=True,
    validation_options="fail-pipeline",
    budget_milli_node_hours=1000,
    model_display_name="forecast-iowa-daily-model",
)

INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7042789790323834880?project=306016756844
INFO:google.cloud.aiplatform.training_jobs:AutoMLForecastingTrainingJob projects/306016756844/locations/us-central1/trainingPipelines/7042789790323834880 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLForecastingTrainingJob projects/306016756844/locations/us-central1/trainingPipelines/7042789790323834880 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLForecastingTrainingJob projects/306016756844/locations/us-central1/trainingPipelines/7042789790323834880 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLForecastingTrainingJob projects/306016756844/locations/us-central1/trainingPipelines/7042789790323834880 current state:
PipelineState.PIPELINE_STATE_RUNNING
I

In [13]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

BATCH_PREDICTION_GCS_SOURCE = ("gs://automl-demo-240614-lcm/iowa_liquor/iowa_daily_automl_predict.csv")

# Batch Prediction

Submit instances for batch prediction.


In [14]:
MIN_NODES = 1
MAX_NODES = 1

# The name of the job
BATCH_PREDICTION_JOB_NAME = "forecast-iowa-daily-" + TIMESTAMP

# Folder in the bucket to write results to
DESTINATION_FOLDER = "batch_prediction_results"

# The Cloud Storage bucket to upload results to
BATCH_PREDICTION_GCS_DEST_PREFIX = MY_STAGING_BUCKET + "/" + DESTINATION_FOLDER

# Make SDK batch_predict method call
batch_prediction_job = model.batch_predict(
    instances_format="csv",
    predictions_format="csv",
    job_display_name=BATCH_PREDICTION_JOB_NAME,
    gcs_source=BATCH_PREDICTION_GCS_SOURCE,
    gcs_destination_prefix=BATCH_PREDICTION_GCS_DEST_PREFIX,
    model_parameters=None,
    starting_replica_count=MIN_NODES,
    max_replica_count=MAX_NODES,
    sync=True,
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/306016756844/locations/us-central1/batchPredictionJobs/86417215896682496
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/306016756844/locations/us-central1/batchPredictionJobs/86417215896682496')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/86417215896682496?project=306016756844
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/306016756844/locations/us-central1/batchPredictionJobs/86417215896682496 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/306016756844/locations/us-central1/batchPredictionJobs/86417215896682496 current state:
JobState.JOB_STATE_RUNNING
