In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# AutoMLOps - LLMOps Finetuning Flan T5 Example

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/automlops/blob/main/examples/training/05_llmops_finetuning_flan_t5_example.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/automlops/blob/main/examples/training/05_llmops_finetuning_flan_t5_example.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/automlops/main/examples/training/05_llmops_finetuning_flan_t5_example.ipynb">
        <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

# Overview

In this tutorial you'll learn how to finetune a PyTorch Flan-T5-Base model (stored in Hugging Face) for SAMSum dataset (summary of conversations in English). You'll also learn how to deploy the finetuned model to Vertex Endpoints.

This tutorial will walk you through how to use AutoMLOps to define, create and run a MLOps pipeline around this finetuning and deployment. For finetuning the Flan-T5-Base model, we will use NVIDIA V100 GPUs.

This tutorial is derived from this [open-source example](https://github.com/rafaelsf80/vertex-flant5base-summarization/tree/master)

# Objective
In this tutorial, you will learn how to create and run MLOps pipelines integrated with CI/CD. This tutorial goes through finetuning a PyTorch Flan-T5-Base model using GPU accelerators; the pipeline goes through the following workflow:
1. fintune_t5_model: A custom component that finetunes a Flan T5 base model. 
2. deploy_and_test_model: A custom component that takes a finetuned T5 model, uploads it to Vertex Model Registry, deploys it to an endpoint, and runs a test prediction. 

# Prerequisites

In order to use AutoMLOps, the following are required:

- Python 3.7 - 3.10
- [Google Cloud SDK 407.0.0](https://cloud.google.com/sdk/gcloud/reference)
- [beta 2022.10.21](https://cloud.google.com/sdk/gcloud/reference/beta)
- `git` installed
- `git` logged-in:
```
  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"
```
- [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/provide-credentials-adc) are setup. This can be done through the following commands:
```
gcloud auth application-default login
gcloud config set account <account@example.com>
```

# APIs & IAM
Based on the user options selection, AutoMLOps will enable up to the following APIs during the provision step:
- [aiplatform.googleapis.com](https://cloud.google.com/vertex-ai/docs/reference/rest)
- [artifactregistry.googleapis.com](https://cloud.google.com/artifact-registry/docs/reference/rest)
- [cloudbuild.googleapis.com](https://cloud.google.com/build/docs/api/reference/rest)
- [cloudfunctions.googleapis.com](https://cloud.google.com/functions/docs/reference/rest)
- [cloudresourcemanager.googleapis.com](https://cloud.google.com/resource-manager/reference/rest)
- [cloudscheduler.googleapis.com](https://cloud.google.com/scheduler/docs/reference/rest)
- [compute.googleapis.com](https://cloud.google.com/compute/docs/reference/rest/v1)
- [iam.googleapis.com](https://cloud.google.com/iam/docs/reference/rest)
- [iamcredentials.googleapis.com](https://cloud.google.com/iam/docs/reference/credentials/rest)
- [logging.googleapis.com](https://cloud.google.com/logging/docs/reference/v2/rest)
- [pubsub.googleapis.com](https://cloud.google.com/pubsub/docs/reference/rest)
- [run.googleapis.com](https://cloud.google.com/run/docs/reference/rest)
- [storage.googleapis.com](https://cloud.google.com/storage/docs/apis)


AutoMLOps will create the following service account and update [IAM permissions](https://cloud.google.com/iam/docs/understanding-roles) during the provision step:
1. Pipeline Runner Service Account (defaults to: vertex-pipelines@PROJECT_ID.iam.gserviceaccount.com). Roles added:
- roles/aiplatform.user
- roles/artifactregistry.reader
- roles/bigquery.user
- roles/bigquery.dataEditor
- roles/iam.serviceAccountUser
- roles/storage.admin
- roles/cloudfunctions.admin

# User Guide

For a user-guide, please view these [slides](../../../AutoMLOps_User_Guide.pdf).

# Costs

This tutorial uses billable components of Google Cloud:
- Vertex AI
- Artifact Registry
- Cloud Storage
- Cloud Build
- Cloud Run
- Cloud Scheduler
- Cloud Pub/Sub

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

# Ground-rules for using AutoMLOps
1. Do not use variables, functions, code, etc. not defined within the scope of a custom component. These custom components will become containers and will have no reference to the out of scope code.
2. Import statements and helper functions must be added inside the function. Provide parameter type hints.
3. Test each of your components for accuracy and correctness before running them using AutoMLOps. We cannot fix bugs automatically; bugs are much more difficult to fix once they are made into pipelines.
4. If you are using Kubeflow, be sure to define all the requirements needed to run the custom component - it can be easy to leave out packages which will cause the container to fail when running within a pipeline. 


# Dataset
For training data, we are using the [SAMSum dataset](https://huggingface.co/datasets/samsum) which contains about 16k messenger-like conversations with summaries. Conversations were created and written down by linguists fluent in English. Linguists were asked to create conversations similar to those they write on a daily basis, reflecting the proportion of topics of their real-life messenger convesations. The style and register are diversified - conversations could be informal, semi-formal or formal, they may contain slang words, emoticons and typos. Then, the conversations were annotated with summaries. It was assumed that summaries should be a concise brief of what people talked about in the conversation in third person. 

# Setup Git
Set up your git configuration below

In [None]:
!git config --global user.email 'you@example.com'
!git config --global user.name 'Your Name'

# Install AutoMLOps

Install AutoMLOps from [PyPI](https://pypi.org/project/google-cloud-automlops/), or locally by cloning the repo and running `pip install .`

In [None]:
!pip3 install google-cloud-automlops --user

# Restart the kernel
Once you've installed the AutoMLOps package, you need to restart the notebook kernel so it can find the package.

**Note: Once this cell has finished running, continue on. You do not need to re-run any of the cells above.**

In [None]:
import os

if not os.getenv('IS_TESTING'):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

# Set variables
Set variables. If you don't know your project ID, leave the field blank and the following cells may be able to find it.

In [1]:
PROJECT_ID = '[your-project-id]'  # @param {type:"string"}

In [2]:
if PROJECT_ID == '' or PROJECT_ID is None or PROJECT_ID == '[your-project-id]':
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print('Project ID:', PROJECT_ID)

Project ID: automlops-sandbox


In [3]:
! gcloud config set project $PROJECT_ID

Updated property [core/project].


Set your Model_ID below:

In [4]:
MODEL_ID = 'flan-t5-samsum'

In [None]:
BUCKET_NAME = f'{PROJECT_ID}-{MODEL_ID}-bucket'
BUCKET_URI = f'gs://{BUCKET_NAME}/'
AF_REGISTRY_NAME = 'vertex-mlops-af'   # Artifact Registry name
REGION = 'us-central1'

Set service account and training/serving images:

In [5]:
TRAINING_IMAGE = 'us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-10:latest' # includes required cuda packages
SERVING_IMAGE = f'{REGION}-docker.pkg.dev/{PROJECT_ID}/{AF_REGISTRY_NAME}/finetuning_flan_t5_base:latest'

In [6]:
SERVICE_ACCOUNT = f'vertex-pipelines@{PROJECT_ID}.iam.gserviceaccount.com'

In [None]:
BUCKET_NAME

## Build the Custom Serving image - Update [serving/app/main.py](serving/app/main.py) first!

Create a custom serving image for running predictions using FastAPI. **Update [serving/app/main.py](serving/app/main.py) with your BUCKET_NAME from above.** Then build and push the custom serving image.

**NOTE: You must update serving/app/main.py and the Artifact Registry resource AF_REGISTRY_NAME must exist prior to submitting this build job.**

In [None]:
!gcloud builds submit serving --region $REGION --tag $SERVING_IMAGE

## Create a Tensorboard instance

In [7]:
from google.cloud import aiplatform

In [8]:
aiplatform.init(project=PROJECT_ID, location=REGION)

tensorboard = aiplatform.Tensorboard.create(
    display_name='flan-t5-tensorboard',
    project=PROJECT_ID,
    location=REGION,
)

print(tensorboard.display_name)
print(tensorboard.resource_name)

Creating Tensorboard
Create Tensorboard backing LRO: projects/45373616427/locations/us-central1/tensorboards/574622368861782016/operations/2712897263381774336
Tensorboard created. Resource name: projects/45373616427/locations/us-central1/tensorboards/574622368861782016
To use this Tensorboard in another session:
tb = aiplatform.Tensorboard('projects/45373616427/locations/us-central1/tensorboards/574622368861782016')
flan-t5-tensorboard
projects/45373616427/locations/us-central1/tensorboards/574622368861782016


# 1. AutoMLOps LLM Example
This workflow will define and generate a pipeline using AutoMLOps. AutoMLOps provides 2 functions for defining MLOps pipelines:

- `AutoMLOps.component(...)`: Defines a component, which is a containerized python function.
- `AutoMLOps.pipeline(...)`: Defines a pipeline, which is a series of components.

AutoMLOps provides 6 functions for building and maintaining MLOps pipelines:

- `AutoMLOps.generate(...)`: Generates the MLOps codebase. Users can specify the tooling and technologies they would like to use in their MLOps pipeline.
- `AutoMLOps.provision(...)`: Runs provisioning scripts to create and maintain necessary infra for MLOps.
- `AutoMLOps.deprovision(...)`: Runs deprovisioning scripts to tear down MLOps infra created using AutoMLOps.
- `AutoMLOps.deploy(...)`: Builds and pushes component container, then triggers the pipeline job.
- `AutoMLOps.launchAll(...)`: Runs `generate()`, `provision()`, and `deploy()` all in succession.
- `AutoMLOps.monitor(...)`: Creates model monitoring jobs on deployed endpoints.

Please see the [readme](https://github.com/GoogleCloudPlatform/automlops/blob/main/README.md) for more information.

## Imports

In [9]:
from google_cloud_automlops import AutoMLOps

## Other Imports

In [None]:
!pip3 install 'kfp>=2.0.0'

In [None]:
from kfp.dsl import Artifact, Input, Output

## Finetuning Flan-T5-Base Model
Define a Kubeflow custom component for finetuning the [Flan-T5-Base model](https://huggingface.co/google/flan-t5-base). Import statements and helper functions must be added inside the function.

In [11]:
@AutoMLOps.component(
    packages_to_install=[
        'accelerate==0.20.1',
        'py7zr==0.20.4',
        'nltk==3.7',
        'evaluate==0.4.0',
        'rouge_score==0.1.2', 
        'transformers==4.30.0',
        'tensorboard==2.11.2',
        'datasets==2.9.0',
        'google-cloud-storage==2.7.0'
    ]
)
def finetune_t5_model(
    dataset_id: str,
    hf_model_id: str,
    model_dir: str,
    epochs: int,
    eval_batch: int,
    logging_steps: int,
    lr: float,
    train_batch: int,
):
    """Custom component that finetunes a Flan T5 base model.

    Args:
        dataset_id: Full id of the training dataset.
        hf_model_id: Full id of the hugging face model.
        model_dir: GCS directory to save the model and training artifacts.
        epochs: Total number of training epochs to perform.
        eval_batch: The batch size per GPU/TPU core/CPU for evaluation.
        logging_steps: Number of update steps between two logs.
        lr: The initial learning rate for AdamW optimizer.
        train_batch: The batch size per GPU/TPU core/CPU for training.
    """
    import glob
    import logging
    import os

    from google.cloud import storage

    from datasets import concatenate_datasets, load_dataset
    from huggingface_hub import HfFolder
    from transformers import (
        AutoTokenizer,
        AutoModelForSeq2SeqLM,
        DataCollatorForSeq2Seq,
        Seq2SeqTrainer,
        Seq2SeqTrainingArguments
    )
    from transformers.integrations import TensorBoardCallback
    import evaluate
    import nltk
    import numpy as np
    from nltk.tokenize import sent_tokenize

    def preprocess_function(sample, padding='max_length'):
        # add prefix to the input for t5
        inputs = ['summarize: ' + item for item in sample['dialogue']]

        # tokenize inputs
        model_inputs = tokenizer(inputs, max_length=max_source_length, padding=padding, truncation=True)

        # Tokenize targets with the `text_target` keyword argument
        labels = tokenizer(text_target=sample['summary'], max_length=max_target_length, padding=padding, truncation=True)

        # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
        # padding in the loss.
        if padding == 'max_length':
            labels['input_ids'] = [
                [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels['input_ids']
            ]

        model_inputs['labels'] = labels['input_ids']
        return model_inputs

    # helper function to postprocess text
    def postprocess_text(preds, labels):
        preds = [pred.strip() for pred in preds]
        labels = [label.strip() for label in labels]

        # rougeLSum expects newline after each sentence
        preds = ['\n'.join(sent_tokenize(pred)) for pred in preds]
        labels = ['\n'.join(sent_tokenize(label)) for label in labels]

        return preds, labels

    def compute_metrics(eval_preds):
        preds, labels = eval_preds
        if isinstance(preds, tuple):
            preds = preds[0]
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
        # Replace -100 in the labels as we can't decode them.
        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        # Some simple post-processing
        decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

        result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
        result = {k: round(v * 100, 4) for k, v in result.items()}
        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        result['gen_len'] = np.mean(prediction_lens)
        return result

    def upload_to_gcs(local_directory_path: str, gs_directory_path: str):
        client = storage.Client()

        # extract GCS bucket_name
        bucket_name = gs_directory_path.split('/')[2] # without gs://
        # extract GCS object_name
        object_name = '/'.join(gs_directory_path.split('/')[3:])

        rel_paths = glob.glob(local_directory_path + '/**', recursive=True)
        bucket = client.get_bucket(bucket_name)
        for local_file in rel_paths:
            remote_path = f'''{object_name}{'/'.join(local_file.split(os.sep)[1:])}'''
            logging.info(remote_path)
            if os.path.isfile(local_file):
                blob = bucket.blob(remote_path)
                blob.upload_from_filename(local_file)

    # Load dataset
    dataset = load_dataset(dataset_id)
    # Load tokenizer of FLAN-t5-base
    tokenizer = AutoTokenizer.from_pretrained(hf_model_id)
    # load model from the hub
    model = AutoModelForSeq2SeqLM.from_pretrained(hf_model_id)
    
    nltk.download('punkt')
    # Metric
    metric = evaluate.load('rouge')

    # Hugging Face repository id
    repository_id = f'''{hf_model_id.split('/')[1]}-{dataset_id}'''

    # The maximum total input sequence length after tokenization.
    # Sequences longer than this will be truncated, sequences shorter will be padded.
    tokenized_inputs = concatenate_datasets([dataset['train'],
                                             dataset['test']]).map(lambda x: tokenizer(x['dialogue'],truncation=True),
                                                                   batched=True, remove_columns=['dialogue', 'summary'])
    max_source_length = max([len(x) for x in tokenized_inputs['input_ids']])
    print(f'Max source length: {max_source_length}')

    # The maximum total sequence length for target text after tokenization.
    # Sequences longer than this will be truncated, sequences shorter will be padded."
    tokenized_targets = concatenate_datasets([dataset['train'],
                                              dataset['test']]).map(lambda x: tokenizer(x['summary'], truncation=True),
                                                                    batched=True, remove_columns=['dialogue', 'summary'])
    max_target_length = max([len(x) for x in tokenized_targets['input_ids']])
    print(f'Max target length: {max_target_length}')

    tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=['dialogue', 'summary', 'id'])
    print(f'''Keys of tokenized dataset: {list(tokenized_dataset['train'].features)}''')

    # we want to ignore tokenizer pad token in the loss
    label_pad_token_id = -100
    # Data collator
    data_collator = DataCollatorForSeq2Seq(
        tokenizer,
        model=model,
        label_pad_token_id=label_pad_token_id,
        pad_to_multiple_of=8
    )

    # Define training args
    training_args = Seq2SeqTrainingArguments(
        output_dir=repository_id,
        per_device_train_batch_size=train_batch,
        per_device_eval_batch_size=eval_batch,
        predict_with_generate=True,
        fp16=False, # Overflows with fp16
        learning_rate=lr,
        num_train_epochs=epochs,
        # logging & evaluation strategies
        logging_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'],
        #logging_dir=f'{repository_id}/logs',
        logging_strategy='steps',
        logging_steps=logging_steps,
        evaluation_strategy='epoch',
        save_strategy='epoch',
        save_total_limit=2,
        load_best_model_at_end=True,
        # metric_for_best_model="overall_f1",
        # push to hub parameters
        report_to='tensorboard',
        push_to_hub=False,
        hub_strategy='every_save',
        hub_model_id=repository_id,
        hub_token=HfFolder.get_token(),
    )

    # Create Trainer instance
    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset['train'],
        eval_dataset=tokenized_dataset['test'],
        compute_metrics=compute_metrics,
        callbacks=[TensorBoardCallback()]
    )

    # Start training
    logging.info('Training ....')
    trainer.train()
    trainer.evaluate()

    # Save tokenizer and model locally
    tokenizer.save_pretrained(f'model_tokenizer')
    trainer.save_model(f'model_output')

    logging.info('Saving model and tokenizer to GCS ....')

    # Upload model to GCS
    upload_to_gcs('model_output', model_dir)
    # Upload tokenizer to GCS
    upload_to_gcs('model_tokenizer', model_dir)



## Uploading, Deploying, & Testing the Model
Define a custom component for uploading and deploying a model in Vertex AI.
Import statements and helper functions must be added inside the function.

In [12]:
@AutoMLOps.component(
    packages_to_install=[
        'google-cloud-aiplatform==1.34.0'
    ]
)
def model_deployment(
    endpoint_sa: str,
    project_id: str,
    region: str,
    serving_image_tag: str,
    vertex_endpoint: Output[Artifact],
):
    """Custom component that uploads a finetuned Flan-T5 from GCS to Vertex Model Registry,
       and deploys the model to an endpoint for online prediction.

    Args:
        endpoint_sa: Service account to run the endpoint prediction service with.
        project_id: Project_id.
        region: Region.
        serving_image_tag: Custom serving image uri.
        vertex_endpoint: Full resource ID of the endpoint.
    """
    from google.cloud import aiplatform

    aiplatform.init(project=project_id, location=region)
    # Check if model exists
    models = aiplatform.Model.list()
    model_name = 'finetuned-flan-t5'
    if 'finetuned-flan-t5' in (m.name for m in models):
        parent_model = model_name
        model_id = None
        is_default_version=False
        version_aliases=['experimental', 'finetuned', 'flan-t5']
        version_description='experimental version'
    else:
        parent_model = None
        model_id = model_name
        is_default_version=True
        version_aliases=['live', 'finetuned', 'flan-t5']
        version_description='live version'

    uploaded_model = aiplatform.Model.upload(
        model_id=model_id,
        display_name=model_name,
        parent_model=parent_model,
        is_default_version=is_default_version,
        version_aliases=version_aliases,
        version_description=version_description,
        serving_container_image_uri=serving_image_tag,
        serving_container_predict_route='/predict',
        serving_container_health_route='/health',
        serving_container_ports=[8080],
        labels={'created_by': 'automlops-team'},
    )

    endpoint = uploaded_model.deploy(
        machine_type='n1-standard-8',
        min_replica_count=1,
        max_replica_count=1,
        accelerator_type='NVIDIA_TESLA_V100',    
        accelerator_count=1,
        service_account=endpoint_sa, # This SA needs gcs permissions
        sync=True
    )

    vertex_endpoint.uri = endpoint.resource_name

## Testing the Model
Define a custom component for testing the predictions of the model.
Import statements and helper functions must be added inside the function.

In [None]:
@AutoMLOps.component(
    packages_to_install=[
        'datasets==2.9.0',
        'google-cloud-aiplatform==1.34.0'
    ]
)
def test_model_deployment(
    dataset_id: str,
    vertex_endpoint: Input[Artifact],
):
    """Custom component that tests the live deployed endpoint.

    Args:
        dataset_id: Full id of the dataset.
        vertex_endpoint: Full resource ID of the endpoint.
    """

    import pprint as pp
    from random import randrange

    from google.cloud import aiplatform

    from datasets import load_dataset

    # Load dataset from the hub
    dataset = load_dataset(dataset_id)
    # select a random test sample
    sample = dataset['test'][randrange(len(dataset["test"]))]

    # Get live endpoint
    live_endpoint = aiplatform.Endpoint(vertex_endpoint.uri)

    # Test predictions
    print('running prediction test...')
    try:
        resp = live_endpoint.predict([[sample['dialogue']]])
        print(sample['dialogue'])
        pp.pprint(resp)
    except Exception as ex:
        print('prediction request failed', ex)

## Define the Pipeline
Define your pipeline. You can optionally give the pipeline a name and description. Define the structure by listing the components to be called in your pipeline; use `.after` to specify the order of execution.

In [13]:
@AutoMLOps.pipeline(name='finetune-flan-t5-pipeline')
def pipeline(
    dataset_id: str,
    hf_model_id: str,
    endpoint_sa: str,
    project_id: str,
    eval_batch: int,
    train_batch: int,
    model_dir: str,
    lr: float,
    epochs: int,
    logging_steps: int,
    serving_image_tag: str,
    region: str):

    finetune_t5_model_task = finetune_t5_model(
        dataset_id=dataset_id,
        hf_model_id=hf_model_id,
        model_dir=model_dir,
        epochs=epochs,
        eval_batch=eval_batch,
        lr=lr,
        logging_steps=logging_steps,
        train_batch=train_batch)

    model_deployment_task = model_deployment(
        endpoint_sa=endpoint_sa,
        project_id=project_id,
        region=region,
        serving_image_tag=serving_image_tag).after(finetune_t5_model_task)
    
    test_model_deployment_task = test_model_deployment(
        dataset_id=dataset_id,
        vertex_endpoint=model_deployment_task.outputs['vertex_endpoint']).after(model_deployment_task)

## Define the Pipeline Arguments

In [14]:
pipeline_params = {
    'dataset_id': 'samsum',
    'hf_model_id': 'google/flan-t5-base',
    'endpoint_sa': SERVICE_ACCOUNT,
    'project_id': PROJECT_ID,
    'model_dir': f'{BUCKET_URI}model/',
    'lr': 5e-5,
    'epochs': 5,
    'logging_steps': 500,
    'serving_image_tag': SERVING_IMAGE,
    'eval_batch': 4,
    'region': 'us-central1',
    'train_batch': 4,
    'vertex_experiment_tracking_name': 'finetune-flan-t5'
}

## Generate and Run the pipeline
`AutoMLOps.generate(...)` generates the MLOps codebase. Users can specify the tooling and technologies they would like to use in their MLOps pipeline. In this case, we are specifying a custom job spec, where we will use Nvidia V100 GPUs to accelerate the finetuning of the model. If you are interested in integrating with Github and Github Actions, please follow the setup steps in [this doc](../../docs/Using%20Github%20With%20AMO.md) and uncomment the relevant code block below.

In [19]:
# Setup using local scripts and cloudbuild:
AutoMLOps.generate(project_id=PROJECT_ID,
                   pipeline_params=pipeline_params,
                   use_ci=False,
                   naming_prefix=MODEL_ID,
                   base_image=TRAINING_IMAGE, 
                   provisioning_framework='terraform',                  
                   custom_training_job_specs = [{
                    'component_spec': 'finetune_t5_model',
                    'display_name': 'flan-t5-base-finetuning-gpu-tensorboard',
                    'machine_type': 'n1-standard-8',
                    'accelerator_type': 'NVIDIA_TESLA_V100',
                    'accelerator_count': 1,
                    'replica_count': 1,
                    'service_account': SERVICE_ACCOUNT,
                    'tensorboard': tensorboard.resource_name,
                    'base_output_directory': BUCKET_URI
                   }],
                   deployment_framework='cloud-build',
)

# # Setup using Github, Github Actions, and Terraform:
# AutoMLOps.generate(project_id=PROJECT_ID,
#                    pipeline_params=pipeline_params,
#                    schedule_pattern='59 11 * * 0', # retrain every Sunday at Midnight
#                    use_ci=True,
#                    naming_prefix=MODEL_ID,
#                    base_image=TRAINING_IMAGE, 
#                    provisioning_framework='terraform',                  
#                    custom_training_job_specs = [{
#                     'component_spec': 'finetune_t5_model',
#                     'display_name': 'flan-t5-base-finetuning-gpu-tensorboard',
#                     'machine_type': 'n1-standard-8',
#                     'accelerator_type': 'NVIDIA_TESLA_V100',
#                     'accelerator_count': 1,
#                     'replica_count': 1,
#                     'service_account': SERVICE_ACCOUNT,
#                     'tensorboard': tensorboard.resource_name,
#                     'base_output_directory': BUCKET_URI
#                    }],
#                    deployment_framework='github-actions',
#                    provisioning_framework='terraform',   
#                    source_repo_type='github',
#                    project_number='<project_number>',
#                    source_repo_name='<source/repo/string>',
#                    workload_identity_pool='<identity_pool_string>',
#                    workload_identity_provider='<identity_provider_string>',
#                    workload_identity_service_account='<workload_identity_sa>'
# )

Writing directories under AutoMLOps/
Writing configurations to AutoMLOps/configs/defaults.yaml
Writing README.md to AutoMLOps/README.md
Writing kubeflow pipelines code to AutoMLOps/pipelines, AutoMLOps/components
Writing scripts to AutoMLOps/scripts
Writing submission service code to AutoMLOps/services
Writing gcloud provisioning code to AutoMLOps/provision
Writing cloud build config to AutoMLOps/cloudbuild.yaml
Code Generation Complete.


In [20]:
AutoMLOps.provision(hide_warnings=False)            # hide_warnings is optional, defaults to True

-cloudbuild.builds.create
-artifactregistry.repositories.list
-iam.serviceAccounts.create
-storage.buckets.get
-cloudscheduler.jobs.list
-cloudfunctions.functions.get
-serviceusage.services.use
-pubsub.subscriptions.list
-source.repos.create
-source.repos.list
-serviceusage.services.enable
-cloudfunctions.functions.create
-storage.buckets.create
-cloudbuild.builds.list
-cloudscheduler.jobs.create
-iam.serviceAccounts.list
-pubsub.topics.create
-pubsub.subscriptions.create
-pubsub.topics.list
-resourcemanager.projects.setIamPolicy
-iam.serviceAccounts.actAs
-artifactregistry.repositories.create

You are currently using: srastatter@google.com. Please check your account permissions.
The following are the recommended roles for provisioning:
-roles/source.admin
-roles/cloudscheduler.admin
-roles/iam.serviceAccountUser
-roles/artifactregistry.admin
-roles/pubsub.editor
-roles/cloudbuild.builds.editor
-roles/storage.admin
-roles/cloudfunctions.admin
-roles/resourcemanager.projectIamAdmin
-rol

`AutoMLOps.deploy(...)` builds and pushes component container, then triggers the pipeline job.

In [21]:
AutoMLOps.deploy(precheck=True,                     # precheck is optional, defaults to True
                 hide_warnings=False)               # hide_warnings is optional, defaults to True

-serviceusage.services.get
-cloudbuild.builds.get
-source.repos.update
-iam.serviceAccounts.get
-artifactregistry.repositories.get
-storage.buckets.update
-pubsub.subscriptions.get
-pubsub.topics.get
-cloudfunctions.functions.get
-resourcemanager.projects.getIamPolicy

You are currently using: srastatter@google.com. Please check your account permissions.
The following are the recommended roles for deploying with precheck:
-roles/serviceusage.serviceUsageViewer
-roles/iam.serviceAccountUser
-roles/artifactregistry.reader
-roles/cloudbuild.builds.editor
-roles/storage.admin
-roles/cloudfunctions.viewer
-roles/source.writer
-roles/pubsub.viewer
-roles/iam.roleViewer

Checking for required API services in project automlops-sandbox...
Checking for Artifact Registry in project automlops-sandbox...
Checking for Storage Bucket in project automlops-sandbox...
Checking for Pipeline Runner Service Account in project automlops-sandbox...
Checking for IAM roles on Pipeline Runner Service Account in