In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# AutoMLOps - Introduction Training Example

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/automlops/blob/main/examples/training/00_introduction_training_example.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/automlops/blob/main/examples/training/00_introduction_training_example.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/automlops/examples/training/00_introduction_training_example.ipynb">
        <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

# Overview

In this tutorial, you will build two [Vertex AI](https://cloud.google.com/vertex-ai) pipelines, complete with an integrated CI/CD pipeline. This tutorial will walk you through how to use AutoMLOps to define, create and run pipelines, as well as monitoring deployed models.

# Objective
In this tutorial, you will learn how to create and run MLOps pipelines integrated with CI/CD. This tutorial goes through an example kubeflow pipeline that is defined using AutoMLOps. The example pipeline builds and deploys a classification model; the pipeline go through a very basic workflow:
1. create_dataset: A custom component that will export the dataset from BQ to GCS as a csv.
2. train_model: A custom component that will train a decision tree classifier on the training data.
3. deploy_model: A custom component that will upload the saved_model to Vertex AI Model Registry and deploy it to an endpoint.

# Prerequisites

In order to use AutoMLOps, the following are required:

- Python 3.7 - 3.10
- [Google Cloud SDK 407.0.0](https://cloud.google.com/sdk/gcloud/reference)
- [beta 2022.10.21](https://cloud.google.com/sdk/gcloud/reference/beta)
- `git` installed
- `git` logged-in:
```
  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"
```
- [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/provide-credentials-adc) are setup. This can be done through the following commands:
```
gcloud auth application-default login
gcloud config set account <account@example.com>
```

# APIs & IAM
Based on the user options selection, AutoMLOps will enable up to the following APIs during the provision step:
- [aiplatform.googleapis.com](https://cloud.google.com/vertex-ai/docs/reference/rest)
- [artifactregistry.googleapis.com](https://cloud.google.com/artifact-registry/docs/reference/rest)
- [cloudbuild.googleapis.com](https://cloud.google.com/build/docs/api/reference/rest)
- [cloudfunctions.googleapis.com](https://cloud.google.com/functions/docs/reference/rest)
- [cloudresourcemanager.googleapis.com](https://cloud.google.com/resource-manager/reference/rest)
- [cloudscheduler.googleapis.com](https://cloud.google.com/scheduler/docs/reference/rest)
- [compute.googleapis.com](https://cloud.google.com/compute/docs/reference/rest/v1)
- [iam.googleapis.com](https://cloud.google.com/iam/docs/reference/rest)
- [iamcredentials.googleapis.com](https://cloud.google.com/iam/docs/reference/credentials/rest)
- [logging.googleapis.com](https://cloud.google.com/logging/docs/reference/v2/rest)
- [pubsub.googleapis.com](https://cloud.google.com/pubsub/docs/reference/rest)
- [run.googleapis.com](https://cloud.google.com/run/docs/reference/rest)
- [storage.googleapis.com](https://cloud.google.com/storage/docs/apis)
- [sourcerepo.googleapis.com](https://cloud.google.com/source-repositories/docs/reference/rest)


AutoMLOps will create the following service account and update [IAM permissions](https://cloud.google.com/iam/docs/understanding-roles) during the provision step:
1. Pipeline Runner Service Account (defaults to: vertex-pipelines@PROJECT_ID.iam.gserviceaccount.com). Roles added:
- roles/aiplatform.user
- roles/artifactregistry.reader
- roles/bigquery.user
- roles/bigquery.dataEditor
- roles/iam.serviceAccountUser
- roles/storage.admin
- roles/cloudfunctions.admin

# User Guide

For a user-guide, please view these [slides](../../AutoMLOps_User_Guide.pdf).

# Costs

This tutorial uses billable components of Google Cloud:
- Vertex AI
- Artifact Registry
- Cloud Storage
- Cloud Source Repository
- Cloud Build
- Cloud Run
- Cloud Scheduler
- Cloud Pub/Sub

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

# Ground-rules for using AutoMLOps
1. Do not use variables, functions, code, etc. not defined within the scope of a custom component. These custom components will become containers and will have no reference to the out of scope code.
2. Import statements and helper functions must be added inside the function. Provide parameter type hints.
3. Test each of your components for accuracy and correctness before running them using AutoMLOps. We cannot fix bugs automatically; bugs are much more difficult to fix once they are made into pipelines.
4. If you are using Kubeflow, be sure to define all the requirements needed to run the custom component - it can be easy to leave out packages which will cause the container to fail when running within a pipeline. 


# Dataset
For training data, we are using the [dry beans dataset](https://archive.ics.uci.edu/ml/datasets/dry+bean+dataset) which contains metadata on images of seven different types of dry beans taken with a high-resolution camera. The raw dataset can be found [here](https://github.com/GoogleCloudPlatform/automlops/blob/main/example/data/Dry_Beans_Dataset.csv).

# Setup Git
Set up your git configuration below

In [None]:
!git config --global user.email 'you@example.com'
!git config --global user.name 'Your Name'

# Install AutoMLOps

Install AutoMLOps from [PyPI](https://pypi.org/project/google-cloud-automlops/), or locally by cloning the repo and running `pip install .`

In [None]:
!pip3 install google-cloud-automlops --user

# Restart the kernel
Once you've installed the AutoMLOps package, you need to restart the notebook kernel so it can find the package.

**Note: Once this cell has finished running, continue on. You do not need to re-run any of the cells above.**

In [1]:
import os

if not os.getenv('IS_TESTING'):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

# Set your project ID
Set your project ID below. If you don't know your project ID, leave the field blank and the following cells may be able to find it.

In [1]:
PROJECT_ID = '[your-project-id]'

In [2]:
if PROJECT_ID == '' or PROJECT_ID is None or PROJECT_ID == '[your-project-id]':
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print('Project ID:', PROJECT_ID)

Project ID: automlops-sandbox


In [3]:
! gcloud config set project $PROJECT_ID

Updated property [core/project].


Set your Model_ID below:

In [4]:
MODEL_ID = 'dry-beans-dt'

Miscellaneous constants:

In [5]:
TRAINING_DATASET = f'{PROJECT_ID}.test_dataset.dry_beans'
TARGET_COLUMN = 'Class'

# Upload Data
This will create a BQ table and upload the Dry Beans csv. 

In [6]:
!python3 -m data.load_data_to_bq --project $PROJECT_ID --file data/Dry_Beans_Dataset.csv

Dataset automlops-sandbox.test_dataset already exists
Table test_dataset.dry_beans already exists


# Example Workflow
This workflow will define and generate a pipeline using AutoMLOps. AutoMLOps provides 2 functions for defining MLOps pipelines:

- `AutoMLOps.component(...)`: Defines a component, which is a containerized python function.
- `AutoMLOps.pipeline(...)`: Defines a pipeline, which is a series of components.

AutoMLOps provides 6 functions for building and maintaining MLOps pipelines:

- `AutoMLOps.generate(...)`: Generates the MLOps codebase. Users can specify the tooling and technologies they would like to use in their MLOps pipeline.
- `AutoMLOps.provision(...)`: Runs provisioning scripts to create and maintain necessary infra for MLOps.
- `AutoMLOps.deprovision(...)`: Runs deprovisioning scripts to tear down MLOps infra created using AutoMLOps.
- `AutoMLOps.deploy(...)`: Builds and pushes component container, then triggers the pipeline job.
- `AutoMLOps.launchAll(...)`: Runs `generate()`, `provision()`, and `deploy()` all in succession.
- `AutoMLOps.monitor(...)`: Creates model monitoring jobs on deployed endpoints.

Please see the [readme](https://github.com/GoogleCloudPlatform/automlops/blob/main/README.md) for more information.

## Import AutoMLOps

In [7]:
from google_cloud_automlops import AutoMLOps

## Data Loading
Define a custom component for loading and creating a dataset using `@AutoMLOps.component`. Import statements and helper functions must be added inside the function. Provide parameter type hints.

In [8]:
@AutoMLOps.component(
    packages_to_install=[
        'google-cloud-bigquery', 
        'pandas',
        'pyarrow',
        'db_dtypes',
        'fsspec',
        'gcsfs'
    ]
)
def create_dataset(
    bq_table: str,
    data_path: str,
    project_id: str
):
    """Custom component that takes in a BQ table and writes it to GCS.

    Args:
        bq_table: The source biquery table.
        data_path: The gcs location to write the csv.
        project_id: The project ID.
    """
    from google.cloud import bigquery
    import pandas as pd
    from sklearn import preprocessing
    
    bq_client = bigquery.Client(project=project_id)

    def get_query(bq_input_table: str) -> str:
        """Generates BQ Query to read data.

        Args:
            bq_input_table: The full name of the bq input table to be read into
                the dataframe (e.g. <project>.<dataset>.<table>)
        Returns: A BQ query string.
        """
        return f'''
        SELECT *
        FROM `{bq_input_table}`
        '''

    def load_bq_data(query: str, client: bigquery.Client) -> pd.DataFrame:
        """Loads data from bq into a Pandas Dataframe for EDA.
        Args:
            query: BQ Query to generate data.
            client: BQ Client used to execute query.
        Returns:
            pd.DataFrame: A dataframe with the requested data.
        """
        df = client.query(query).to_dataframe()
        return df

    dataframe = load_bq_data(get_query(bq_table), bq_client)
    le = preprocessing.LabelEncoder()
    dataframe['Class'] = le.fit_transform(dataframe['Class'])
    dataframe.to_csv(data_path, index=False)

## Model Training
Define a custom component for training a model using `@AutoMLOps.component`. Import statements and helper functions must be added inside the function.

In [9]:
@AutoMLOps.component(
    packages_to_install=[
        'scikit-learn==1.2.2',
        'pandas',
        'joblib',
        'tensorflow'
    ]
)
def train_model(
    data_path: str,
    model_directory: str
):
    """Custom component that trains a decision tree on the training data.

    Args:
        data_path: GS location of the training data.
        model_directory: GS location of saved model.
    """
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import train_test_split
    import pandas as pd
    import tensorflow as tf
    import pickle
    import os

    def save_model(model, uri):
        """Saves a model to uri."""
        with tf.io.gfile.GFile(uri, 'w') as f:
            pickle.dump(model, f)

    df = pd.read_csv(data_path)
    labels = df.pop('Class').tolist()
    data = df.values.tolist()
    x_train, x_test, y_train, y_test = train_test_split(data, labels)
    skmodel = DecisionTreeClassifier()
    skmodel.fit(x_train,y_train)
    score = skmodel.score(x_test,y_test)
    print('accuracy is:',score)

    output_uri = os.path.join(model_directory, 'model.pkl')
    save_model(skmodel, output_uri)

## Uploading & Deploying the Model
Define a custom component for uploading and deploying a model in Vertex AI, using `@AutoMLOps.component`. Import statements and helper functions must be added inside the function.

In [10]:
@AutoMLOps.component(
    packages_to_install=[
        'google-cloud-aiplatform'
    ]
)
def deploy_model(
    model_directory: str,
    project_id: str,
    region: str
):
    """Custom component that uploads a saved model from GCS to Vertex Model Registry
       and deploys the model to an endpoint for online prediction.

    Args:
        model_directory: GS location of saved model.
        project_id: Project_id.
        region: Region.
    """
    import pprint as pp
    import random

    from google.cloud import aiplatform

    aiplatform.init(project=project_id, location=region)
    # Check if model exists
    models = aiplatform.Model.list()
    model_name = 'beans-model'
    if 'beans-model' in (m.name for m in models):
        parent_model = model_name
        model_id = None
        is_default_version=False
        version_aliases=['experimental', 'challenger', 'custom-training', 'decision-tree']
        version_description='challenger version'
    else:
        parent_model = None
        model_id = model_name
        is_default_version=True
        version_aliases=['champion', 'custom-training', 'decision-tree']
        version_description='first version'

    serving_container = 'us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-2:latest'
    uploaded_model = aiplatform.Model.upload(
        artifact_uri=model_directory,
        model_id=model_id,
        display_name=model_name,
        parent_model=parent_model,
        is_default_version=is_default_version,
        version_aliases=version_aliases,
        version_description=version_description,
        serving_container_image_uri=serving_container,
        serving_container_ports=[8080],
        labels={'created_by': 'automlops-team'},
    )

    endpoint = uploaded_model.deploy(
        machine_type='n1-standard-4',
        deployed_model_display_name='deployed-beans-model')

    sample_input = [[random.uniform(0, 300) for x in range(16)]]

    # Test endpoint predictions
    print('running prediction test...')
    try:
        resp = endpoint.predict(instances=sample_input)
        pp.pprint(resp)
    except Exception as ex:
        print('prediction request failed', ex)

## Define the Pipeline
Define your pipeline using `@AutoMLOps.pipeline`. You can optionally give the pipeline a name and description. Define the structure by listing the components to be called in your pipeline; use `.after` to specify the order of execution.

In [11]:
@AutoMLOps.pipeline #(name='automlops-pipeline', description='This is an optional description')
def pipeline(
    bq_table: str,
    model_directory: str,
    data_path: str,
    project_id: str,
    region: str):

    create_dataset_task = create_dataset(
        bq_table=bq_table,
        data_path=data_path,
        project_id=project_id)

    train_model_task = train_model(
        model_directory=model_directory,
        data_path=data_path).after(create_dataset_task)

    deploy_model_task = deploy_model(
        model_directory=model_directory,
        project_id=project_id,
        region=region).after(train_model_task)

## Define the Pipeline Arguments

In [12]:
import datetime
pipeline_params = {
    'bq_table': TRAINING_DATASET,
    'model_directory': f'gs://{PROJECT_ID}-{MODEL_ID}-bucket/trained_models/{datetime.datetime.now()}',
    'data_path': f'gs://{PROJECT_ID}-{MODEL_ID}-bucket/data.csv',
    'project_id': PROJECT_ID,
    'region': 'us-central1'
}

## Generate and Run the pipeline
`AutoMLOps.generate(...)` generates the MLOps codebase. Users can specify the tooling and technologies they would like to use in their MLOps pipeline.

In [13]:
AutoMLOps.generate(project_id=PROJECT_ID,
                   pipeline_params=pipeline_params,
                   use_ci=True,
                   naming_prefix=MODEL_ID,
                   schedule_pattern='59 11 * * 0', # retrain every Sunday at Midnight
                   setup_model_monitoring=True     # use this if you would like to use Vertex Model Monitoring
)

Writing directories under AutoMLOps/
Writing configurations to AutoMLOps/configs/defaults.yaml
Writing kubeflow pipelines code to AutoMLOps/pipelines
Writing kubeflow components code to AutoMLOps/components
     -- Writing create_dataset
     -- Writing train_model
     -- Writing deploy_model
Writing submission service code to AutoMLOps/services
Writing gcloud provisioning code to AutoMLOps/provision
Writing cloud build config to AutoMLOps/cloudbuild.yaml
Code Generation Complete.


`AutoMLOps.provision(...)` runs provisioning scripts to create and maintain necessary infra for MLOps.

In [15]:
AutoMLOps.provision(hide_warnings=False)           # hide_warnings is optional, defaults to True

-serviceusage.services.enable
-serviceusage.services.use
-resourcemanager.projects.setIamPolicy
-iam.serviceAccounts.list
-iam.serviceAccounts.create
-iam.serviceAccounts.actAs
-storage.buckets.get
-storage.buckets.create
-artifactregistry.repositories.list
-artifactregistry.repositories.create
-pubsub.topics.list
-pubsub.topics.create
-pubsub.subscriptions.list
-pubsub.subscriptions.create
-cloudbuild.builds.list
-cloudbuild.builds.create
-cloudscheduler.jobs.list
-cloudscheduler.jobs.create
-cloudfunctions.functions.get
-cloudfunctions.functions.create
-source.repos.list
-source.repos.create

You are currently using: srastatter@google.com. Please check your account permissions.
The following are the recommended roles for provisioning:
-roles/serviceusage.serviceUsageAdmin
-roles/resourcemanager.projectIamAdmin
-roles/iam.serviceAccountAdmin
-roles/iam.serviceAccountUser
-roles/storage.admin
-roles/artifactregistry.admin
-roles/pubsub.editor
-roles/cloudbuild.builds.editor
-roles/clou

`AutoMLOps.deploy(...)` builds and pushes component container, then triggers the pipeline job.

In [16]:
AutoMLOps.deploy(precheck=True,                     # precheck is optional, defaults to True
                 hide_warnings=False)               # hide_warnings is optional, defaults to True

-serviceusage.services.get
-resourcemanager.projects.getIamPolicy
-storage.buckets.update
-iam.serviceAccounts.get
-artifactregistry.repositories.get
-pubsub.topics.get
-pubsub.subscriptions.get
-cloudbuild.builds.get
-cloudfunctions.functions.get
-source.repos.update

You are currently using: srastatter@google.com. Please check your account permissions.
The following are the recommended roles for deploying with precheck:
-roles/serviceusage.serviceUsageViewer
-roles/iam.roleViewer
-roles/storage.admin
-roles/iam.serviceAccountUser
-roles/artifactregistry.reader
-roles/pubsub.viewer
-roles/cloudbuild.builds.editor
-roles/cloudfunctions.viewer
-roles/source.writer

Checking for required API services in project automlops-sandbox...
Checking for Artifact Registry in project automlops-sandbox...
Checking for Storage Bucket in project automlops-sandbox...
Checking for Pipeline Runner Service Account in project automlops-sandbox...
Checking for IAM roles on Pipeline Runner Service Account in

## Create Monitoring Jobs (Optional)
Set up the monitoring job by first getting the most recent deployed beans-model endpoint.

**Note: Only run this step after the PipelineJob above has completed successfully**

In [14]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID)
beans_endpoints = aiplatform.Endpoint.list(filter=f'display_name="beans-model_endpoint"')

# Grab the most recent beans-model deployment
endpoint_name = beans_endpoints[0].resource_name
endpoint_name

'projects/45373616427/locations/us-central1/endpoints/6902586664619606016'

Install the requirements for creating model monitoring jobs:

In [None]:
!pip3 install -r AutoMLOps/model_monitoring/requirements.txt --user

`AutoMLOps.monitor(...)` Creates model monitoring jobs on deployed endpoints. Users can specify the drift and skew thresholds, as well as other parameters to configure the monitoring job. Specifying `alert_emails` will send anomaly alerts to the listed emails. Specifying `auto_retraining_params` will enable automatic re-running of the above pipeline if an anomaly is detected.

In [15]:
AutoMLOps.monitor(
    alert_emails=[], # update if you would like to receive email alerts
    target_field=TARGET_COLUMN,
    model_endpoint=endpoint_name,
    monitoring_interval=1,
    auto_retraining_params=pipeline_params,
    drift_thresholds={'Area': 0.000001, 'Perimeter': 0.000001},
    skew_thresholds={'Area': 0.000001, 'Perimeter': 0.000001},
    training_dataset=f'bq://{TRAINING_DATASET}',
    hide_warnings=False
)

Creating ModelDeploymentMonitoringJob
ModelDeploymentMonitoringJob created. Resource name: projects/45373616427/locations/us-central1/modelDeploymentMonitoringJobs/5631206457295765504
To use this ModelDeploymentMonitoringJob in another session:
mdm_job = aiplatform.ModelDeploymentMonitoringJob('projects/45373616427/locations/us-central1/modelDeploymentMonitoringJobs/5631206457295765504')
View Model Deployment Monitoring Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/model-deployment-monitoring/5631206457295765504?project=45373616427
Updated Anomaly Log Sink dry-beans-dt-model-monitoring-log-sink.

All anomaly logs for this model monitoring job are being routed to pub/sub topic dry-beans-dt-queueing-svc for automatic retraining.
Retraining will use the following parameters located at gs://automlops-sandbox-dry-beans-dt-bucket/pipeline_root/dry-beans-dt/automatic_retraining_parameters.json: 

{'bq_table': 'automlops-sandbox.test_dataset.dry_beans',
 'data_path': 

### Test the monitoring job by sending some sample requests
The below code will send a request for predicting 5000 instances. Based on the above configuration, Vertex Model monitoring will run a monitoring job every hour at the top of the hour, compile skew and drift statistics, and compare to the thresholds specified. Thus, the below prediction code should produce a series of alerts in a few hours, and trigger a retraining of the model.

In [17]:
from google.cloud import bigquery
import pandas as pd

def get_query(bq_input_table: str) -> str:
    """Generates BQ Query to read data.

    Args:
        bq_input_table: The full name of the bq input table to be read into
        the dataframe (e.g. <project>.<dataset>.<table>)

    Returns: A BQ query string.
    """
    return f'''SELECT * FROM `{bq_input_table}`'''

def load_bq_data(query: str, client: bigquery.Client) -> pd.DataFrame:
    """Loads data from bq into a Pandas Dataframe for EDA.

    Args:
        query: BQ Query to generate data.
        client: BQ Client used to execute query.

    Returns:
        pd.DataFrame: A dataframe with the requested data.
    """
    df = client.query(query).to_dataframe()
    return df

bq_client = bigquery.Client(project=PROJECT_ID)    

# Get samples
df = load_bq_data(get_query(TRAINING_DATASET), bq_client)
X_sample = df.iloc[:,:-1][:5000].values.tolist()

endpoint = aiplatform.Endpoint(endpoint_name)
response = endpoint.predict(instances=X_sample)
prediction = response[0]
# print the first prediction
print(prediction[0])

5.0


# Train using a GPU
Use the `custom_training_job_specs` parameter to specify custom resources for any custom component in the pipeline. The example below uses a GPU for accelerated training.
See [Machine types](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types) and [GPUs](https://cloud.google.com/vertex-ai/docs/training/configure-compute#specifying_gpus).

In [12]:
AutoMLOps.generate(project_id=PROJECT_ID, 
                   pipeline_params=pipeline_params, 
                   use_ci=True, 
                   schedule_pattern='59 11 * * 0',
                   naming_prefix=MODEL_ID,
                   base_image='us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-11.py310:latest', # includes required cuda pacakges
                   custom_training_job_specs = [{
                       'component_spec': 'train_model',
                       'display_name': 'train-model-accelerated',
                       'machine_type': 'n1-standard-8',
                       'accelerator_type': 'NVIDIA_TESLA_V100',
                       'accelerator_count': 1
                   }]
)

Writing directories under AutoMLOps/
Writing configurations to AutoMLOps/configs/defaults.yaml
Writing README.md to AutoMLOps/README.md
Writing kubeflow pipelines code to AutoMLOps/pipelines, AutoMLOps/components
Writing scripts to AutoMLOps/scripts
Writing submission service code to AutoMLOps/services
Writing gcloud provisioning code to AutoMLOps/provision
Writing cloud build config to AutoMLOps/cloudbuild.yaml
Code Generation Complete.


## Default Run Settings
Below are the default parameters for running `AutoMLOps`. Note there are only two required parameters:
1. project_id
2. pipeline_params

The other parameters are optional. You can customize the output of `AutoMLOps` by specify the resources you'd like to use (or specifying the name of resources you'd like `AutoMLOps` to create if they don't currently exist). A description of the parameters is below:
- `project_id`: The project ID.
- `pipeline_params`: Dictionary containing runtime pipeline parameters.
- `artifact_repo_location`: Region of the artifact repo (default use with Artifact Registry).
- `artifact_repo_name`: Artifact repo name where components are stored (default use with Artifact Registry).
- `artifact_repo_type`: The type of artifact repository to use (e.g. Artifact Registry, JFrog, etc.)        
- `base_image`: The image to use in the component base dockerfile.
- `build_trigger_location`: The location of the build trigger (for cloud build).
- `build_trigger_name`: The name of the build trigger (for cloud build).
- `custom_training_job_specs`: Specifies the specs to run the training job with.
- `deployment_framework`: The CI tool to use (e.g. cloud build, github actions, etc.)
- `naming_prefix`: Unique value used to differentiate pipelines and services across AutoMLOps runs.
- `orchestration_framework`: The orchestration framework to use (e.g. kfp, tfx, etc.)
- `pipeline_job_runner_service_account`: Service Account to run PipelineJobs (specify the full string).
- `pipeline_job_submission_service_location`: The location of the cloud submission service.
- `pipeline_job_submission_service_name`: The name of the cloud submission service.
- `pipeline_job_submission_service_type`: The tool to host for the cloud submission service (e.g. cloud run, cloud functions).
- `precheck`: Boolean used to specify whether to check for provisioned resources before deploying.
- `project_number`: The project number.
- `provision_credentials_key`: Either a path to or the contents of a service account key file in JSON format.
- `provisioning_framework`: The IaC tool to use (e.g. Terraform, Pulumi, etc.)
- `pubsub_topic_name`: The name of the pubsub topic to publish to.
- `schedule_location`: The location of the scheduler resource.
- `schedule_name`: The name of the scheduler resource.
- `schedule_pattern`: Cron formatted value used to create a Scheduled retrain job.
- `setup_model_monitoring`: Boolean parameter which specifies whether to set up a Vertex AI Model Monitoring Job.
- `source_repo_branch`: The branch to use in the source repository.
- `source_repo_name`: The name of the source repository to use.
- `source_repo_type`: The type of source repository to use (e.g. gitlab, github, etc.)
- `storage_bucket_location`: Region of the GS bucket.
- `storage_bucket_name`: GS bucket name where pipeline run metadata is stored.
- `hide_warnings`: Boolean used to specify whether to show provision/deploy permission warnings
- `use_ci`: Flag that determines whether to use Cloud CI/CD.
- `vpc_connector`: The name of the vpc connector to use.
- `workload_identity_pool`: Pool for workload identity federation. 
- `workload_identity_provider`: Provider for workload identity federation.
- `workload_identity_service_account`: Service account for workload identity federation (specify the full string).

The `use_ci` parameter specifies whether to use the generated `scripts/run_all.sh` local script to submit the build job and PipelineJob. If this parameter is set to True, `AutoMLOps` will use the cloud [CI/CD workflow](https://github.com/GoogleCloudPlatform/automlops#deployment). The run above uses `use_ci=True`, and the run below uses `use_ci=False`, notice the differences in output (`use_ci=False` means you will not use the Source Repository to trigger build jobs on push). 

In [17]:
AutoMLOps.generate(project_id=PROJECT_ID, # required
                   pipeline_params=pipeline_params, # required
                   artifact_repo_location='us-central1', # default
                   artifact_repo_name=None, # default
                   artifact_repo_type='artifact-registry', # default
                   base_image='python:3.9-slim', # default
                   build_trigger_location='us-central1', # default
                   build_trigger_name=None, # default
                   custom_training_job_specs=None, # default
                   deployment_framework='cloud-build', # default
                   naming_prefix='automlops-default-prefix', # default
                   orchestration_framework='kfp', # default
                   pipeline_job_runner_service_account=None, # default
                   pipeline_job_submission_service_location='us-central1', # default
                   pipeline_job_submission_service_name=None, # default
                   pipeline_job_submission_service_type='cloud-functions', # default
                   project_number=None, # default
                   provision_credentials_key=None, # default
                   provisioning_framework='gcloud', # default
                   pubsub_topic_name=None, # default
                   schedule_location='us-central1', # default
                   schedule_name=None, # default
                   schedule_pattern='No Schedule Specified', # default
                   setup_model_monitoring=False, # default
                   source_repo_branch='automlops', # default
                   source_repo_name=None, # default
                   source_repo_type='cloud-source-repositories', # default
                   storage_bucket_location='us-central1', # default
                   storage_bucket_name=None, # default
                   use_ci=False, # default
                   vpc_connector='No VPC Specified', # default
                   workload_identity_pool=None, # default
                   workload_identity_provider=None, # default
                   workload_identity_service_account=None, # default
)

Writing directories under AutoMLOps/
Writing configurations to AutoMLOps/configs/defaults.yaml
Writing Kubeflow Pipelines code to AutoMLOps/pipelines, AutoMLOps/components, AutoMLOps/services
Writing README.md to AutoMLOps/README.md
Writing scripts to AutoMLOps/scripts
Writing CloudBuild config to AutoMLOps/cloudbuild.yaml
Code Generation Complete.
