In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# AutoMLOps - Tuning and deploying a foundation model 

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/automlops/blob/main/example/automlops_example_notebook.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/automlops/blob/main/example/automlops_example_notebook.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/automlops/main/example/automlops_example_notebook.ipynb">
        <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

# Overview

This tutorial explores using AutoMLOps for tuning a foundation LLM model. Creating an LLM requires massive amounts of data, significant computing resources, and specialized skills. On Vertex AI, tuning allows you to customize a foundation model for more specific tasks or knowledge domains.

While the prompt design is excellent for quick experimentation, if training data is available, you can achieve higher quality by tuning the model. Tuning a model enables you to customize the model response based on examples of the task you want the model to perform.

This tutorial is adapted from the [getting started tuning](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/tuning/getting_started_tuning.ipynb) example for Vertex AI. For more details on tuning have a look at the [official documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models).

# Objective

In this tutorial, you learn how to fine tune an LLM in Vertex AI. You will then learn how to create and run MLOps pipelines integrated with CI/CD. The pipeline goes through the following steps:

1. create_datasets: Custom component that queries Bigquery and saves data as jsonl format to GCS.
2. tune_model: Custom component that prompt-tunes a foundation model.
4. evaluate_model: Evaluate the tuned model.
5. predict: Custom component that runs predictions with the tuned model.

# Prerequisites

In order to use AutoMLOps, the following are required:

- Python 3.0 - 3.10
- [Google Cloud SDK 407.0.0](https://cloud.google.com/sdk/gcloud/reference)
- [beta 2022.10.21](https://cloud.google.com/sdk/gcloud/reference/beta)
- `git` installed
- `git` logged-in:
```
  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"
```
- [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/provide-credentials-adc) are setup. This can be done through the following commands:
```
gcloud auth application-default login
gcloud config set account <account@example.com>
```

# Dependencies
- `docopt==0.6.2`,
- `docstring-parser==0.15`,
- `pipreqs==0.4.11`,
- `PyYAML==5.4.1`,
- `yarg==0.1.9`

# APIs & IAM
AutoMLOps will enable the following APIs:
- [cloudresourcemanager.googleapis.com](https://cloud.google.com/resource-manager/reference/rest)
- [aiplatform.googleapis.com](https://cloud.google.com/vertex-ai/docs/reference/rest)
- [artifactregistry.googleapis.com](https://cloud.google.com/artifact-registry/docs/reference/rest)
- [cloudbuild.googleapis.com](https://cloud.google.com/build/docs/api/reference/rest)
- [cloudscheduler.googleapis.com](https://cloud.google.com/scheduler/docs/reference/rest)
- [cloudtasks.googleapis.com](https://cloud.google.com/tasks/docs/reference/rest)
- [compute.googleapis.com](https://cloud.google.com/compute/docs/reference/rest/v1)
- [iam.googleapis.com](https://cloud.google.com/iam/docs/reference/rest)
- [iamcredentials.googleapis.com](https://cloud.google.com/iam/docs/reference/credentials/rest)
- [ml.googleapis.com](https://cloud.google.com/ai-platform/training/docs/reference/rest)
- [run.googleapis.com](https://cloud.google.com/run/docs/reference/rest)
- [storage.googleapis.com](https://cloud.google.com/storage/docs/apis)
- [sourcerepo.googleapis.com](https://cloud.google.com/source-repositories/docs/reference/rest)

AutoMLOps will update [IAM privileges](https://cloud.google.com/iam/docs/understanding-roles) for the following accounts:
1. Pipeline Runner Service Account (one is created if it does exist, defaults to: vertex-pipelines@PROJECT_ID.iam.gserviceaccount.com). Roles added:
- roles/aiplatform.user
- roles/artifactregistry.reader
- roles/bigquery.user
- roles/bigquery.dataEditor
- roles/iam.serviceAccountUser
- roles/storage.admin
- roles/run.admin
2. Cloudbuild Default Service Account (PROJECT_NUMBER@cloudbuild.gserviceaccount.com). Roles added:
- roles/run.admin
- roles/iam.serviceAccountUser
- roles/cloudtasks.enqueuer
- roles/cloudscheduler.admin

# User Guide

For a user-guide, please view these [slides](../AutoMLOps_Implementation_Guide_External.pdf).

# Costs

This tutorial uses billable components of Google Cloud:
- Vertex AI
- BigQuery
- Artifact Registry
- Cloud Storage
- Cloud Source Repository
- Cloud Build
- Cloud Run
- Cloud Scheduler

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

# Ground-rules for using AutoMLOps
1. Do not use variables, functions, code, etc. not defined within the scope of a custom component. These custom components will become containers and will have no reference to the out of scope code.
2. Import statements and helper functions must be added inside the function. Provide parameter type hints.
3. Test each of your components for accuracy and correctness before running them using AutoMLOps. We cannot fix bugs automatically; bugs are much more difficult to fix once they are made into pipelines.
4. If you are using Kubeflow, be sure to define all the requirements needed to run the custom component - it can be easy to leave out packages which will cause the container to fail when running within a pipeline. 


### Quota
**important**: Tuning the text-bison@001  model uses the tpu-v3-8 training resources and the accompanying quotas from your Google Cloud project. Each project has a default quota of eight v3-8 cores, which allows for one to two concurrent tuning jobs. If you want to run more concurrent jobs you need to request additional quota via the [Quotas page](https://console.cloud.google.com/iam-admin/quotas).

## Tuned Dataset

Your model tuning dataset must be in a JSONL format where each line contains a single training example. You must make sure that you include instructions.

You will use the StackOverflow data on [BigQuery public datasets](https://cloud.google.com/bigquery/public-data), limiting to questions with the `python` tag, and accepted answers for answers since 2020-01-01.

## Setup Git
Set up your git configuration below

In [None]:
!git config --global user.email 'you@example.com'
!git config --global user.name 'Your Name'

## Install AutoMLOps

Install AutoMLOps from [PyPI](https://pypi.org/project/google-cloud-automlops/), or locally by cloning the repo and running `pip install .`

In [None]:
!pip3 install google-cloud-automlops --user

## Restart the kernel
Once you've installed the AutoMLOps package, you need to restart the notebook kernel so it can find the package.

**Note: Once this cell has finished running, continue on. You do not need to re-run any of the cells above.**

In [1]:
import os

if not os.getenv('IS_TESTING'):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Set your project ID
Set your project ID below. If you don't know your project ID, leave the field blank and the following cells may be able to find it.

In [2]:
PROJECT_ID = '[your-project-id]'  # @param {type:"string"}

In [3]:
if PROJECT_ID == '' or PROJECT_ID is None or PROJECT_ID == '[your-project-id]':
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print('Project ID:', PROJECT_ID)

Project ID: automlops-sandbox


In [4]:
! gcloud config set project $PROJECT_ID

Updated property [core/project].


# Automating pipeline creation with AutoMLOps<a id='automlops'></a>

## Import AutoMLOps

In [4]:
from AutoMLOps import AutoMLOps

## Other Imports

In [None]:
!pip3 install kfp

In [5]:
from kfp.v2 import dsl
from kfp.v2.dsl import Artifact, Dataset, Metrics, Output

## Clear the cache
`AutoMLOps.clear_cache` will remove previous instantiations of AutoMLOps components and pipelines. Use this function if you have previously defined a component that you no longer need.

In [6]:
AutoMLOps.clear_cache()

Cache cleared.


## Define an AutoMLOps component for Creating Train & Test Datasets

In [7]:
@AutoMLOps.component
def create_datasets(
    lookback_date: str,
    project_id: str,
    test_data_path: str,
    train_data_path: str
):
    """Custom component that prepares the stackoverflow Questions and Answers.

    Args:
        lookback_date: The start date for posts.
        project_id: The project ID.
        test_data_path: The gcs location to write the jsonl for evaluation.
        train_data_path: The gcs location to write the jsonl for training.
    """    
    import pandas as pd
    from google.cloud import bigquery
    from sklearn.model_selection import train_test_split

    bq_client = bigquery.Client(project=project_id)

    def get_query() -> str:
        """Generates BQ Query to read data."""
        
        return f'''SELECT
        CONCAT(q.title, q.body) as input_text,
        a.body AS output_text
        FROM
            `bigquery-public-data.stackoverflow.posts_questions` q
        JOIN
            `bigquery-public-data.stackoverflow.posts_answers` a
        ON
            q.accepted_answer_id = a.id
        WHERE
            q.accepted_answer_id IS NOT NULL AND
            REGEXP_CONTAINS(q.tags, "python") AND
            a.creation_date >= "{lookback_date}"
        LIMIT
            10000
        '''

    def load_bq_data(query: str, client: bigquery.Client) -> pd.DataFrame:
        """Loads data from bq into a Pandas Dataframe for EDA.
        Args:
            query: BQ Query to generate data.
            client: BQ Client used to execute query.
        Returns:
            pd.DataFrame: A dataframe with the requested data.
        """
        df = client.query(query).to_dataframe()
        return df

    dataframe = load_bq_data(get_query(), bq_client)
    train, test = train_test_split(dataframe, test_size=0.2)
    train.to_json(train_data_path, orient='records', lines=True)
    test.to_json(test_data_path, orient='records', lines=True)

In [8]:
lookback_date = '2020-01-01'
project_id = PROJECT_ID
test_data_path = f'gs://{PROJECT_ID}-bucket/llmops/test_data.jsonl'
train_data_path = f'gs://{PROJECT_ID}-bucket/llmops/train_data.jsonl'


import pandas as pd
from google.cloud import bigquery
from sklearn.model_selection import train_test_split
bq_client = bigquery.Client(project=project_id)

def get_query() -> str:
    """Generates BQ Query to read data."""

    return f'''SELECT
    CONCAT(q.title, q.body) as input_text,
    a.body AS output_text
    FROM
        `bigquery-public-data.stackoverflow.posts_questions` q
    JOIN
        `bigquery-public-data.stackoverflow.posts_answers` a
    ON
        q.accepted_answer_id = a.id
    WHERE
        q.accepted_answer_id IS NOT NULL AND
        REGEXP_CONTAINS(q.tags, "python") AND
        a.creation_date >= "{lookback_date}"
    LIMIT
        10000
    '''

def load_bq_data(query: str, client: bigquery.Client) -> pd.DataFrame:
    """Loads data from bq into a Pandas Dataframe for EDA.
    Args:
        query: BQ Query to generate data.
        client: BQ Client used to execute query.
    Returns:
        pd.DataFrame: A dataframe with the requested data.
    """
    df = client.query(query).to_dataframe()
    return df

dataframe = load_bq_data(get_query(), bq_client)
train, test = train_test_split(dataframe, test_size=0.2)
train.to_json(train_data_path, orient='records', lines=True)
test.to_json(test_data_path, orient='records', lines=True)

## Define an AutoMLOps component for Tuning the Foundation Model

In [9]:
@AutoMLOps.component
def tune_model(
    project_id: str,
    model_display_name: str,
    region: str,
    train_data_path: str
):
    """Custom component that prompt-tunes a foundation model.

    Args:
        project_id: The project ID.
        model_display_name: Name of the model.
        region: Region.
        train_data_path: The gcs location to write the jsonl for training.
        
    """ 
    from google.cloud import aiplatform
    from vertexai.preview.language_models import TextGenerationModel

    aiplatform.init(project=project_id, location=region)
    model = TextGenerationModel.from_pretrained('text-bison@001')

    model.tune_model(
        training_data=train_data_path,
        model_display_name=model_display_name,
        train_steps=100,
        # Tuning can only happen in the "europe-west4" location
        tuning_job_location='europe-west4',
        # Model can only be deployed in the "us-central1" location
        tuned_model_location='us-central1')

In [10]:
project_id = PROJECT_ID
model_display_name = 'llmops-tuned-model'
region = 'us-central1'
train_data_path = f'gs://{PROJECT_ID}-bucket/llmops/train_data.jsonl'

from google.cloud import aiplatform
from vertexai.preview.language_models import TextGenerationModel

aiplatform.init(project=project_id, location=region)
model = TextGenerationModel.from_pretrained('text-bison@001')

model.tune_model(
    training_data=train_data_path,
    model_display_name=model_display_name,
    train_steps=100,
    # Tuning can only happen in the "europe-west4" location
    tuning_job_location='europe-west4',
    # Model can only be deployed in the "us-central1" location
    tuned_model_location='us-central1')

Creating PipelineJob
Creating PipelineJob
PipelineJob created. Resource name: projects/45373616427/locations/europe-west4/pipelineJobs/tune-large-model-20230615231642
PipelineJob created. Resource name: projects/45373616427/locations/europe-west4/pipelineJobs/tune-large-model-20230615231642
To use this PipelineJob in another session:
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/45373616427/locations/europe-west4/pipelineJobs/tune-large-model-20230615231642')
pipeline_job = aiplatform.PipelineJob.get('projects/45373616427/locations/europe-west4/pipelineJobs/tune-large-model-20230615231642')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/europe-west4/pipelines/runs/tune-large-model-20230615231642?project=45373616427
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/europe-west4/pipelines/runs/tune-large-model-20230615231642?project=45373616427
PipelineJob projects/45373616427/locations/europ

## Define a Component for Evaluating the Tuned Model

In [65]:
@dsl.component(
    packages_to_install=[
        'google-cloud-aiplatform', 
        'pandas',
        'rouge',
        'sequence-evaluate',
        'sentence-transformers'
    ],
    output_component_file=f'{AutoMLOps.OUTPUT_DIR}/evaluate_model.yaml',
)
def evaluate_model(
    metrics: Output[Metrics],
    model_display_name: str,
    test_data_path: str,
    test_dataset_size: int
):
    """Custom component that evaluates the tuned model 
       and compares its performance to the foundation model.

    Args:
        model_display_name: Name of the model.
        test_data_path: The gcs location to write the jsonl for evaluation.
        test_dataset_size: The size of the data slice from the test dataset.
        
    """
    import pandas as pd
    from seq_eval import SeqEval
    from vertexai.preview.language_models import TextGenerationModel

    foundation_model = TextGenerationModel.from_pretrained('text-bison@001')
    list_tuned_models = model.list_tuned_model_names()
    tuned_model = TextGenerationModel.get_tuned_model(list_tuned_models[-1])
    
    evaluator = SeqEval()
    
    test_data = pd.read_json(test_data_path, lines=True)
    
    test_data = test_data.head(test_dataset_size)
    test_questions = test_data['input_text']
    test_answers = test_data['output_text']

    foundation_candidates = []
    tuned_candidates = []
    for q in test_questions:
        response = foundation_model.predict(q)
        foundation_candidates.append(response.text)

        response = tuned_model.predict(q)
        tuned_candidates.append(response.text)
    
    references = test_answers.tolist()
    
    foundation_scores = evaluator.evaluate(foundation_candidates, references, verbose=False)
    tuned_scores = evaluator.evaluate(tuned_candidates, references, verbose=False)
    print(foundation_scores)
    print(tuned_scores)
    
    # ADD IN METRICS PART
    
    # ADD IN PREDICTION PART

In [8]:
model = TextGenerationModel.from_pretrained("text-bison@001")
list_tuned_models = model.list_tuned_model_names()
list_tuned_models[-1]

'projects/45373616427/locations/us-central1/models/8340835834680836096'

In [165]:
test_data_path = f'gs://{PROJECT_ID}-bucket/llmops/test_data.jsonl'
test_dataset_size = 200

import re

import pandas as pd
from seq_eval import SeqEval
from vertexai.preview.language_models import TextGenerationModel

foundation_model = TextGenerationModel.from_pretrained('text-bison@001')
list_tuned_models = model.list_tuned_model_names()
tuned_model = TextGenerationModel.get_tuned_model(list_tuned_models[-1])

evaluator = SeqEval()

test_data = pd.read_json(test_data_path, lines=True)

test_data = test_data.head(test_dataset_size)
test_questions = test_data['input_text']
test_answers = test_data['output_text']

foundation_candidates = []
tuned_candidates = []
references = []
for i in range(len(test_questions)):
    response_a = foundation_model.predict(re.sub(r'\<.*?\>', '', test_questions[i]))
    response_b = tuned_model.predict(test_questions[i])
    if response_a.text != '' and response_b.text != '':
        references.append(re.sub(r'\<.*?\>', '', test_answers[i]))
        foundation_candidates.append(response_a.text)
        tuned_candidates.append(response_b.text)

foundation_scores = evaluator.evaluate(foundation_candidates, references, verbose=False)
tuned_scores = evaluator.evaluate(tuned_candidates, references, verbose=False)

print(foundation_scores)
print(tuned_scores)

{'bleu_1': 0.10458544022959128, 'bleu_2': 0.04248843653127929, 'bleu_3': 0.02345984191215938, 'bleu_4': 0.014390707453807598, 'rouge_1_precision': 0.25326673070468825, 'rouge_1_recall': 0.17789095911876043, 'rouge_1_f1': 0.18223478361172607, 'rouge_2_precision': 0.055061102765300844, 'rouge_2_recall': 0.04190439405705797, 'rouge_2_f1': 0.03836401633965376, 'rouge_l_precision': 0.23505666853683077, 'rouge_l_recall': 0.16574717607271883, 'rouge_l_f1': 0.1690684979609612, 'inter_dist1': 0.0015170786273477207, 'inter_dist2': 0.03172384126838158, 'intra_dist1': 0.11913427505604801, 'intra_dist2': 0.42174181634607605, 'semantic_textual_similarity': 0.5751550720959175}
{'bleu_1': 0.10458544022959128, 'bleu_2': 0.04248843653127929, 'bleu_3': 0.02345984191215938, 'bleu_4': 0.014390707453807598, 'rouge_1_precision': 0.25326673070468825, 'rouge_1_recall': 0.17789095911876043, 'rouge_1_f1': 0.18223478361172607, 'rouge_2_precision': 0.055061102765300844, 'rouge_2_recall': 0.04190439405705797, 'roug

In [166]:
foundation_scores = evaluator.evaluate(foundation_candidates, references, verbose=True)
print(foundation_scores)

***************
* BLEU SCORES *
***************
BLEU-1:  0.1080129638876945
BLEU-2:  0.04915338038875601
BLEU-3:  0.028422402691031894
BLEU-4:  0.018621638960148627


****************
* ROUGE SCORES *
****************
ROUGE-1 PRECISION:  0.2773307474734131
ROUGE-1 RECALL:  0.184434217635193
ROUGE-1 F1 :  0.19296703212373456


ROUGE-2 PRECISION:  0.08890513795264643
ROUGE-2 RECALL:  0.0539914623781837
ROUGE-2 F1 :  0.05600351238983955


ROUGE-L PRECISION:  0.25896537445455853
ROUGE-L RECALL:  0.1709123557727651
ROUGE-L F1 :  0.17881360611799996


*********************
* DISTINCT-N SCORES *
*********************
INTER DIST-1:  0.0015272276043960086
INTER DIST-2:  0.03548927346112302
INTRA DIST-1:  0.11903927570898948
INTRA DIST-2:  0.43035543407568877


******************************************************
* SEMANTIC TEXTUAL SIMILARITY (Sentence Transformer) *
******************************************************
COSINE SIMILARITY:  0.6128882431402439
{'bleu_1': 0.1080129638876945, 'bl

In [167]:
tuned_scores = evaluator.evaluate(tuned_candidates, references, verbose=True)
print(tuned_scores)

***************
* BLEU SCORES *
***************
BLEU-1:  0.10458544022959128
BLEU-2:  0.04248843653127929
BLEU-3:  0.02345984191215938
BLEU-4:  0.014390707453807598


****************
* ROUGE SCORES *
****************
ROUGE-1 PRECISION:  0.25326673070468825
ROUGE-1 RECALL:  0.17789095911876043
ROUGE-1 F1 :  0.18223478361172607


ROUGE-2 PRECISION:  0.055061102765300844
ROUGE-2 RECALL:  0.04190439405705797
ROUGE-2 F1 :  0.03836401633965376


ROUGE-L PRECISION:  0.23505666853683077
ROUGE-L RECALL:  0.16574717607271883
ROUGE-L F1 :  0.1690684979609612


*********************
* DISTINCT-N SCORES *
*********************
INTER DIST-1:  0.0015170786273477207
INTER DIST-2:  0.03172384126838158
INTRA DIST-1:  0.11913427505604801
INTRA DIST-2:  0.42174181634607605


******************************************************
* SEMANTIC TEXTUAL SIMILARITY (Sentence Transformer) *
******************************************************
COSINE SIMILARITY:  0.5751550720959175
{'bleu_1': 0.1045854402295912

In [172]:
print(re.sub(r'\<.*?\>', '', test_questions[1]))
print('-')
print(foundation_candidates[1])
print('-')
print(tuned_candidates[1])
print('-')
print(references[1])

Modifying a string under a specific conditionIn a Python program, I am trying to modify a string under a specific condition:
X = ('4c0')
Sig = ['a', 'b', 'c', 'e']

Sig is a list. Additionally, I have a tuple:
T = (4,'d',5)

If the second element (T[1]) is not in Sig, I must create another string, starting from X:

as T[1] ('d') is not in Sig, T[2] must replace T[0] in X ('5' replacing '4');
the last element in X must be added by 1 ('1' replacing '0').

In this case, the desired result should be:
Y = ('5c1')

I made this code but it is not add any string to Y:
Y = []
for i in TT: # TT has the tuple T
    i = list(i)
    if i[1] not in Sig:
        for j in TT:
            if type(j[2]) == str:
                if i[1] == j[1]:
                    Y.append(j[2][0]+i[1]+str(int(j[2][2]+1)))

Any ideas how I could solve this problem?
-
I think the problem is that you are not iterating over the list of tuples correctly. You should iterate over the list of tuples, and for each tuple, you sho

In [103]:
a = nltk.word_tokenize(references[1])

In [104]:
 b = nltk.word_tokenize(tuned_candidates[1])

In [105]:
import nltk

hypothesis = ['It', 'is', 'a', 'cat', 'at', 'room']
reference = ['It', 'is', 'a', 'cat', 'inside', 'the', 'room']
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu(a, b)
print(BLEUscore)

1.0450128389486753e-78


The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


## Define an AutoMLOps pipeline

In [70]:
@AutoMLOps.pipeline(
    name='bqml-automlops-retail-forecasting',
    description='This is an example of retail demand forecasting using AutoMLOps and BQML.')
def pipeline(confidence_lvl: float,
             dataset_id: str,
             forecast_horizon: int,
             machine_type: str,
             model_name: str,
             model_type: str,
             project_id: str,
             sales_table: str,
             year_range: int):
    
    prepare_sales_table_task = prepare_sales_table(
        dataset_id=dataset_id,
        project_id=project_id,
        sales_table=sales_table)        

    create_training_dataset_task = create_training_dataset(
        dataset_id=dataset_id,
        project_id=project_id,
        sales_table=sales_table,
        year_range=year_range).after(prepare_sales_table_task)

    train_model_task = train_model(
        dataset_id=dataset_id,
        model_name=model_name,
        model_type=model_type,
        project_id=project_id).after(create_training_dataset_task)

    evaluate_model_task = evaluate_model(
        dataset_id=dataset_id,
        model_name=model_name,
        project_id=project_id).after(train_model_task)
    
    forecast_task = forecast(
        confidence_lvl=confidence_lvl,
        dataset_id=dataset_id,
        forecast_horizon=forecast_horizon,
        project_id=project_id).after(evaluate_model_task)

## Define the Pipeline Arguments

In [71]:
pipeline_params = {
    'confidence_lvl': 0.90,
    'dataset_id': dataset_id,
    'forecast_horizon': 90,
    'machine_type': 'n1-standard-4',
    'model_name': 'arima_model',
    'model_type': 'ARIMA_PLUS',
    'project_id': PROJECT_ID,
    'sales_table': SALES_TABLE,
    'year_range': 1
}

## Generate and Run the pipeline
`AutoMLOps.generate` generates the code for the MLOps pipeline. `AutoMLOps.go` generates the code and runs the pipeline.

In [72]:
AutoMLOps.generate(project_id=PROJECT_ID,
                   pipeline_params=pipeline_params,
                   run_local=False,
                   schedule_pattern='59 11 * * 0' # retrain every Sunday at Midnight
)

INFO: Successfully saved requirements file in AutoMLOps/components/component_base/requirements.txt


In [73]:
AutoMLOps.go(project_id=PROJECT_ID,
             pipeline_params=pipeline_params,
             run_local=False,
             schedule_pattern='59 11 * * 0'
)

INFO: Successfully saved requirements file in AutoMLOps/components/component_base/requirements.txt
[0;32m Updating required API services in project automlops-sandbox [0m
Operation "operations/acat.p2-45373616427-e2045dc0-8a44-42d2-90bb-e636c7d6b101" finished successfully.
[0;32m Checking for Artifact Registry: vertex-mlops-af in project automlops-sandbox [0m
Listing items under project automlops-sandbox, location us-central1.

vertex-mlops-af  DOCKER  STANDARD_REPOSITORY  Artifact Registry vertex-mlops-af in us-central1.  us-central1          Google-managed key  2023-01-11T22:12:26  2023-06-14T13:36:39  59728.422
Artifact Registry: vertex-mlops-af already exists in project automlops-sandbox
[0;32m Checking for GS Bucket: automlops-sandbox-bucket in project automlops-sandbox [0m
gs://automlops-sandbox-bucket/
GS Bucket: automlops-sandbox-bucket already exists in project automlops-sandbox
[0;32m Checking for Service Account: vertex-pipelines in project automlops-sandbox [0m
Pipel