---
# Vertex AI
---

## Setup
---
Define constants

In [43]:
# Add installed library dependencies to Python PATH variable.
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Retrieve and set PROJECT_ID and REGION environment variables.
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID=PROJECT_ID[0]
REGION = "us-central1"

# TODO: Create a globally unique Google Cloud Storage bucket for artifact storage.
GCS_BUCKET = f"gs://{PROJECT_ID}-vertex-gcs"
!gsutil mb -l $REGION $GCS_BUCKET

env: PATH=/usr/local/cuda/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/jupyter/.local/bin:/home/jupyter/.local/bin:/home/jupyter/.local/bin
Creating gs://friendly-tower-338419-vertex-gcs/...
ServiceException: 409 A Cloud Storage bucket named 'friendly-tower-338419-vertex-gcs' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


In [None]:
BUCKET_NAME=f"gs://{PROJECT_ID}-ml-vt-use-cases-topic-classification"
BUCKET_NAME

---
## Import Libraries
---

In [44]:
import os
import shutil
import logging

# TensorFlow model building libraries.
import tensorflow as tf
import tensorflow_text as text
import tensorflow_hub as hub

# Re-create the AdamW optimizer used in the original BERT paper.
from official.nlp import optimization  

# Libraries for data and plot model training metrics.
import pandas as pd
import matplotlib.pyplot as plt

# Import the Vertex AI Python SDK.
from google.cloud import aiplatform as vertexai

---
## Initialize Vertex AI Python SDK
---
Initialize the Vertex AI Python SDK with your GCP Project, Region, and Google Cloud Storage Bucket.

In [45]:
vertexai.init(project=PROJECT_ID, location=REGION, staging_bucket=GCS_BUCKET)

Ora che hai addestrato e valutato il tuo modello localmente in un Vertex Notebook come parte di un flusso di lavoro di sperimentazione, il tuo prossimo passo è addestrare e distribuire il tuo modello sulla piattaforma Vertex AI di Google Cloud.

Per addestrare il tuo classificatore BERT su Google Cloud, dovrai impacchettare i tuoi script di addestramento Python e scrivere un Dockerfile che contiene le istruzioni sul codice del tuo modello ML, le dipendenze e le istruzioni di esecuzione. Potrete costruire il vostro contenitore personalizzato con Cloud Genera, le cui istruzioni sono specificati nella cloudbuild.yamle pubblicare il contenitore per il registro di Artefatto. Questo flusso di lavoro vi dà la possibilità di utilizzare lo stesso contenitore per l'esecuzione come parte di un portatile e scalabile Vertex Condotte del flusso di lavoro.

Camminerai attraverso la creazione della seguente struttura di progetto per il tuo codice in modalità ML:

1. Write a model.py training script
Innanzitutto, riordinerai il codice di addestramento del modello TensorFlow locale dall'alto in uno script di addestramento.
2. Scrivi un file task.py come punto di ingresso per il contenitore del modello personalizzato

In [46]:
MODEL_DIR = "bertclassifier"
!mkdir $MODEL_DIR
!mkdir $MODEL_DIR/trainer

mkdir: cannot create directory ‘bertclassifier’: File exists
mkdir: cannot create directory ‘bertclassifier/trainer’: File exists


---
## 1. Write trainer/task.py
---

In [48]:
%%writefile {MODEL_DIR}/trainer/task.py
import os
import shutil
import logging
import argparse
# from trainer import model
import tensorflow as tf
import tensorflow_text as text
import tensorflow_hub as hub
from official.nlp import optimization
import argparse
import numpy as np 
import pandas 
import csv
import warnings
from google.cloud import storage
from io import BytesIO
import os
import string
import time
import pickle
import seaborn as sns
from pathlib import Path
import transformers
import torch
import random
import pyarrow as pa
import datasets
from random import randint
from datetime import datetime
import pandas as pd
from transformers import BertTokenizerFast,BertTokenizer, BertModel,BertConfig, BertForSequenceClassification, BertForMultipleChoice, Trainer, TrainingArguments
from transformers import RobertaConfig, RobertaModel, RobertaTokenizer,RobertaTokenizerFast, RobertaForSequenceClassification
import argparse
import tensorflow as tf
# !pip install datasets
import datasets
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import TFAutoModelForSequenceClassification
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers.file_utils import is_tf_available, is_torch_available, is_torch_tpu_available
import matplotlib.pyplot as plt
from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.metrics import roc_curve,roc_auc_score
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split,cross_val_score,StratifiedKFold
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report,multilabel_confusion_matrix,plot_confusion_matrix
import warnings
warnings.filterwarnings("ignore")


PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID=PROJECT_ID[0]
REGION = "us-central1"
BUCKET_NAME=f"gs://{PROJECT_ID}-ml-vt-use-cases-topic-classification"

# DATA_URL = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'
# LOCAL_DATA_DIR = './tmp/data'
AUTOTUNE = tf.data.AUTOTUNE

HPARAMS = {
    "seed": 42,
    "batch-size": 32,
    # TF Hub BERT modules.
    "tfhub-bert-preprocessor": "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3",
    "tfhub-bert-encoder": "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2",
    "epochs": 3,
    "initial-learning-rate": 3e-5,
    "dropout": 0.1 ,
    "model-dir":"./bert-sentiment-classifier-local"
}

def get_gs_content_file(f):
    client = storage.Client()
    bucket_name = PROJECT_ID+'-ml-vt-use-cases-topic-classification'
    file='Dataset/'+f
    bucket = client.get_bucket(bucket_name)
    blob = storage.blob.Blob(file,bucket)
    content = blob.download_as_string()
    return BytesIO(content)

def crea_label_vector(Data_Set):
    y_label_code_reduced=[]
    for index, row in Data_Set.iterrows():
        row_t=(list(row['target']))
        row_t=[int(x) for x in row_t]
        y_label_code_reduced.append(row_t)
    return y_label_code_reduced

def stat_ds(Data_Set,y_label_code_reduced):
    x=np.zeros(len(y_label_code_reduced[0]),dtype=int)
    for v in y_label_code_reduced:
        x=x+v
    c=0
    for v in x:
        if v==0:
            c+=1
    #percentuali sull'intero dataset
    xx=x.copy()
    z=[]
    for l in xx:
        z.append(str(round((l/len(Data_Set))*100,1))+'%')
    return x#,[float(x[:4]) for x in z]
   
def indx_permutati(Data_Set):
    random.seed(42)
    ind_ds=random.sample(range(len(Data_Set)), len(Data_Set))
    return ind_ds

def _red_list_bal(Data_Set,y_label_code_reduced):
    c=stat_ds(Data_Set,y_label_code_reduced)
    min_l=min(c)
    red=[0,0,0,0]
    ind=0
    for i in c:
        if i!=min_l:
            val=c[ind]#int((i/100)*len(Data_Set))
            red[ind]=val-min_l#int((min_l/100)*len(Data_Set))
        else:
            red[ind]=1
        ind+=1
    return red

def bilancia(Data_Set,y_label_code_reduced):
    size_prima=len(Data_Set)
    conta1=0
    conta2=0
    conta3=0
    conta4=0
    index=0
    red=_red_list_bal(Data_Set,y_label_code_reduced)
    indx_perm=indx_permutati(Data_Set)
    index_list=[]
    row_list=Data_Set['target'].tolist()
    for row_ind in indx_perm:
        row=row_list[row_ind]
        row_t=[int(x) for x in row]
        if sum(row_t)==1:
            if row_t[0]==1 and conta1<red[0]:
                index_list.append(row_ind)
                conta1+=1
            if row_t[1]==1 and conta1<red[1]:
                index_list.append(row_ind)
                conta2+=1
            if row_t[2]==1 and conta3<red[2]:
                index_list.append(row_ind)
                conta3+=1
            if row_t[3]==1 and conta4<red[3]:
                index_list.append(row_ind)
                conta4+=1
        index+=1
    Data_Set.drop(index_list, inplace=True)
    Data_Set.reset_index(drop=True)


def create_datasets_split():
    '''Creates a tf.data.Dataset for train and evaluation.'''
   
    processed_text_final = pandas.read_csv(get_gs_content_file('outputfile_text_processed.csv'))
    y_label_code = np.loadtxt(get_gs_content_file('numeric_label_topic.txt'), dtype=int)
    y_label_code=y_label_code.tolist()
    new_dict = np.load(get_gs_content_file("myDictionary_labels.npy"), allow_pickle='TRUE')
    new_dict.item()
    
    y_label_code_column=[]
    for row in y_label_code:
        lab=(''.join(map(str, row)))
        y_label_code_column.append(lab)
    Data_Set=pandas.DataFrame({'text':list(processed_text_final['text']),'target':y_label_code_column})
    
    delete=0
    start_row=35
    end_row=150 #len(Data_Set)
    delete=len(Data_Set[(Data_Set['text'].map(lambda x: len(x.split(' ')))> end_row) ])+len(Data_Set[(Data_Set['text'].map(lambda x: len(x.split(' ')))< start_row)])
    Data_Set = Data_Set[(Data_Set['text'].map(lambda x: len(x.split(' ')))<= end_row) ]
    Data_Set = Data_Set[(Data_Set['text'].map(lambda x: len(x.split(' ')))>= start_row) ]
    Data_Set = Data_Set.reset_index(drop=True)
    
    class_names_list=new_dict.item()
    target_names=[x for x in class_names_list.keys()]
    y_label_code_reduced=crea_label_vector(Data_Set)
    
    bilancia(Data_Set,y_label_code_reduced)
    
    y_label_code_reduced=crea_label_vector(Data_Set)
    
    X = list(Data_Set['text'])
    y = y_label_code_reduced
    new_df = pandas.DataFrame({'text':X, 'topic':y})
    
    tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')

    train_dataset=new_df.sample(frac=0.7,random_state=42).reset_index(drop=True)
    val_dataset=new_df.drop(train_dataset.index).reset_index(drop=True)

    validation_dataset=val_dataset.sample(frac=0.66,random_state=42).reset_index(drop=True)
    test_dataset=val_dataset.drop(validation_dataset.index).reset_index(drop=True)
    
    
    dd = datasets.DatasetDict({'train':datasets.Dataset(pa.Table.from_pandas(train_dataset)),'validation':datasets.Dataset(pa.Table.from_pandas(validation_dataset)) , 'test':datasets.Dataset(pa.Table.from_pandas(test_dataset))})
    
    return dd




def build_text_classifier(hparams, optimizer):

    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
    print(text_input)
    preprocessor = hub.KerasLayer(hparams['tfhub-bert-preprocessor'], name='preprocessing')
    encoder_inputs = preprocessor(text_input)
    encoder = hub.KerasLayer(hparams['tfhub-bert-encoder'], trainable=True, name='BERT_encoder')
    outputs = encoder(encoder_inputs)
    classifier = outputs['pooled_output']
    classifier = tf.keras.layers.Dropout(hparams['dropout'], name='dropout')(classifier)
    classifier = tf.keras.layers.Dense(4, activation=None, name='classifier')(classifier)
    model = tf.keras.Model(text_input, classifier, name='bert-classifier')
    
    loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
    metrics = tf.metrics.BinaryAccuracy()    
    
    model.compile(optimizer=optimizer,
                  loss=loss,
                  metrics=metrics)    
    
    return model


def train_evaluate(hparams):

    dataset_dir=create_datasets_split()
    
    arg_text = tf.convert_to_tensor(dataset_dir['train']['text'])
    arg_topic = tf.convert_to_tensor([list(y) for y in dataset_dir['train']['topic']], dtype=tf.int32)
    train_ds=(arg_text,arg_topic)
    arg_text = tf.convert_to_tensor(dataset_dir['validation']['text'])
    arg_topic = tf.convert_to_tensor([list(y) for y in dataset_dir['validation']['topic']], dtype=tf.int32)
    val_ds=(arg_text,arg_topic)
    arg_text = tf.convert_to_tensor(dataset_dir['test']['text'])
    arg_topic = tf.convert_to_tensor([list(y) for y in dataset_dir['test']['topic']], dtype=tf.int32)
    test_ds=(arg_text,arg_topic)
    
    
    epochs = hparams['epochs']
    steps_per_epoch =len(train_ds)# tf.data.experimental.cardinality(train_ds.value).numpy()
    n_train_steps = steps_per_epoch * epochs
    n_warmup_steps = int(0.1 * n_train_steps)    
    
    optimizer = optimization.create_optimizer(init_lr=hparams['initial-learning-rate'],
                                              num_train_steps=n_train_steps,
                                              num_warmup_steps=n_warmup_steps,
                                              optimizer_type='adamw')    
    
    mirrored_strategy = tf.distribute.MirroredStrategy()
    with mirrored_strategy.scope():
        model = build_text_classifier(hparams=hparams, optimizer=optimizer)
        logging.info(model.summary())
    history = model.fit(x=train_ds[0],y=train_ds[1], validation_data=val_ds, epochs=epochs)  
    
    logging.info("Test accuracy: %s", model.evaluate(x=test_ds[0],y=test_ds[1]))

    # Export Keras model in TensorFlow SavedModel format.
    model.save(hparams['model-dir'])
    
    return history

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    # Vertex custom container training args. These are set by Vertex AI during training but can also be overwritten.
    parser.add_argument('--model-dir', dest='model-dir',
                        default=os.environ['AIP_MODEL_DIR'], type=str, help='GCS URI for saving model artifacts.')

    # Model training args.
    parser.add_argument('--tfhub-bert-preprocessor', dest='tfhub-bert-preprocessor', 
                        default='https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3', type=str, help='TF-Hub URL.')
    parser.add_argument('--tfhub-bert-encoder', dest='tfhub-bert-encoder', 
                        default='https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2', type=str, help='TF-Hub URL.')    
    parser.add_argument('--initial-learning-rate', dest='initial-learning-rate', default=1e-5, type=float, help='Learning rate for optimizer.')
    parser.add_argument('--epochs', dest='epochs', default=1, type=int, help='Training iterations.')    
    parser.add_argument('--batch-size', dest='batch-size', default=8, type=int, help='Number of examples during each training iteration.')    
    parser.add_argument('--dropout', dest='dropout', default=0.1, type=float, help='Float percentage of DNN nodes [0,1] to drop for regularization.')    
    parser.add_argument('--seed', dest='seed', default=42, type=int, help='Random number generator seed to prevent overlap between train and val sets.')
    
    
    # h = {'initial-learning-rate':3e-5,'epochs':1,'seed':42,'batch-size':32,'dropout':0.1,'tfhub-bert-encoder':'https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2',
    #      'tfhub-bert-preprocessor':'https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3','model-dir':'model-dir'}
    args = parser.parse_args()
    hparams = args.__dict__
    # hparams=h
    train_evaluate(hparams)

Overwriting bertclassifier/trainer/task.py


---
## 2. Write a Requirements.txt file to specify additional dependencies of the ML code
---
These are additional dependencies for the model code not included in the predefined Vertex TensorFlow images such as TF-Hub, TensorFlow AdamW optimization and TensorFlow Text needed to import and work with pre-trained TensorFlow BERT models.

In [49]:
%%writefile {MODEL_DIR}/requirements.txt
tensorflow_text
tf-models-official
transformers
datasets==1.18.2
torch==1.10.1
torchvision==0.11.2
torchaudio==0.10.1

Overwriting bertclassifier/requirements.txt


---
## 3. Write a Dockerfile 
---

In [52]:
%%writefile {MODEL_DIR}/Dockerfile
# Specifies base image and tag.
# https://cloud.google.com/vertex-ai/docs/training/pre-built-containers
FROM us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-6:latest

# Sets the container working directory.
WORKDIR /root

# Copies the requirements.txt into the container to reduce network calls.
COPY requirements.txt .

# Installs additional packages.
RUN pip3 install -U -r requirements.txt

# b/203105209 Removes unneeded file from TF2.5 CPU image for python_module CustomJob training. 
# Will be removed on subsequent public Vertex images.
RUN rm -rf /var/sitecustomize/sitecustomize.py

# Copies the trainer code to the docker image.
COPY . /trainer

# Sets the container working directory.
WORKDIR /trainer

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python", "-m", "trainer.task"]

Writing bertclassifier/Dockerfile


---
## RUN
---

In [51]:
%%bash
PROJECT_ID='friendly-tower-338419'
BUCKET_NAME="gs://${PROJECT_ID}-ml-vt-use-cases-topic-classification"
echo $BUCKET_NAME

gs://friendly-tower-338419-ml-vt-use-cases-topic-classification


### 1. Create Artifact Registry for custom container images
---

In [54]:
%%bash
gcloud artifacts repositories create "bertclassifier" --repository-format="docker" --location=us-central1 --description="Artifact registry for ML custom training images for multilingual classification"

Create request issued for: [bertclassifier]
Waiting for operation [projects/friendly-tower-338419/locations/us-central1/operations/2381bc20-a9a0-4d81-a1a7-ff30083d55de] to complete...
.....done.
Created repository [bertclassifier].


### 2. Create cloudbuild.yaml instructions
---

In [55]:
ARTIFACT_REGISTRY=MODEL_DIR
IMAGE_NAME=MODEL_DIR
IMAGE_TAG="latest"
IMAGE_URI=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{ARTIFACT_REGISTRY}/{IMAGE_NAME}:{IMAGE_TAG}"
cloudbuild_yaml = f"""steps:
- name: 'gcr.io/cloud-builders/docker'
  args: [ 'build', '-t', '{IMAGE_URI}', '.' ]
images: 
- '{IMAGE_URI}'"""

with open(f"{MODEL_DIR}/cloudbuild.yaml", "w") as fp:
    fp.write(cloudbuild_yaml)

### 3. Build and submit your container image to Artifact Registry using Cloud Build
---

In [56]:
MODEL_DIR=MODEL_DIR
# gcloud builds submit $MODEL_DIR --timeout=200m --config $MODEL_DIR/cloudbuild.yaml
!gcloud builds submit {MODEL_DIR} --timeout=150m --config {MODEL_DIR}/cloudbuild.yaml

Creating temporary tarball archive of 4 file(s) totalling 12.8 KiB before compression.
Uploading tarball of [bertclassifier] to [gs://friendly-tower-338419_cloudbuild/source/1645107995.222383-7ea5a3514cea4fa2ba02954fd0face40.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/friendly-tower-338419/locations/global/builds/497a7b03-64f2-47f3-b4d0-b7e4d8e41c2f].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/497a7b03-64f2-47f3-b4d0-b7e4d8e41c2f?project=480829964338].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "497a7b03-64f2-47f3-b4d0-b7e4d8e41c2f"

FETCHSOURCE
Fetching storage object: gs://friendly-tower-338419_cloudbuild/source/1645107995.222383-7ea5a3514cea4fa2ba02954fd0face40.tgz#1645107996087047
Copying gs://friendly-tower-338419_cloudbuild/source/1645107995.222383-7ea5a3514cea4fa2ba02954fd0face40.tgz#1645107996087047...
/ [1 files][  4.7 KiB/  4.7 KiB]                                             

---
# Define a pipeline using the KFP V2 SDK
---

In [57]:
import datetime
# google_cloud_pipeline_components includes pre-built KFP components for interfacing with Vertex AI services.
from google_cloud_pipeline_components import aiplatform as gcc_aip
from kfp.v2 import dsl
TIMESTAMP=datetime.datetime.now().strftime('%Y%m%d%H%M%S')
DISPLAY_NAME = MODEL_DIR+"-{}".format(TIMESTAMP)
GCS_BASE_OUTPUT_DIR= f"{GCS_BUCKET}/{MODEL_DIR}-{TIMESTAMP}"

USER = "mattia_gatto" 
PIPELINE_ROOT = "{}/pipeline_root/{}".format(GCS_BUCKET, USER)

print(f"Model display name: {DISPLAY_NAME}")
print(f"GCS dir for model training artifacts: {GCS_BASE_OUTPUT_DIR}")
print(f"GCS dir for pipeline artifacts: {PIPELINE_ROOT}")

Model display name: bertclassifier-20220217145448
GCS dir for model training artifacts: gs://friendly-tower-338419-vertex-gcs/bertclassifier-20220217145448
GCS dir for pipeline artifacts: gs://friendly-tower-338419-vertex-gcs/pipeline_root/mattia_gatto


In [58]:
# Pre-built Vertex model serving container for deployment.
# https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers
SERVING_IMAGE_URI = "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest"

In [59]:
@dsl.pipeline(name=MODEL_DIR, pipeline_root=PIPELINE_ROOT)
def pipeline(
    project: str = PROJECT_ID,
    location: str = REGION,
    staging_bucket: str = GCS_BUCKET,
    display_name: str = DISPLAY_NAME,    
    container_uri: str = IMAGE_URI,
    model_serving_container_image_uri: str = SERVING_IMAGE_URI,    
    base_output_dir: str = GCS_BASE_OUTPUT_DIR,
):
    
    #TODO: add and configure the pre-built KFP CustomContainerTrainingJobRunOp component using
    # the remaining arguments in the pipeline constructor. 
    # Hint: Refer to the component documentation link above if needed as well.
    model_train_evaluate_op = gcc_aip.CustomContainerTrainingJobRunOp(
        # Vertex AI Python SDK authentication parameters.        
        project=project,
        location=location,
        staging_bucket=staging_bucket,
        # WorkerPool arguments.
        replica_count=1,
        machine_type="c2-standard-4",
        # TODO: fill in the remaining arguments from the pipeline constructor.
        display_name=display_name,
        container_uri=container_uri,
        model_serving_container_image_uri=model_serving_container_image_uri,
        base_output_dir=base_output_dir,
    )    
    
    # Create a Vertex Endpoint resource in parallel with model training.
    endpoint_create_op = gcc_aip.EndpointCreateOp(
        # Vertex AI Python SDK authentication parameters.
        project=project,
        location=location,
        display_name=display_name
    
    )   
    
    # Deploy your model to the created Endpoint resource for online predictions.
    model_deploy_op = gcc_aip.ModelDeployOp(
        # Link to model training component through output model artifact.
        model=model_train_evaluate_op.outputs["model"],
        # Link to the created Endpoint.
        endpoint=endpoint_create_op.outputs["endpoint"],
        # Define prediction request routing. {"0": 100} indicates 100% of traffic 
        # to the ID of the current model being deployed.
        traffic_split={"0": 100},
        # WorkerPool arguments.        
        dedicated_resources_machine_type="n1-standard-4",
        dedicated_resources_min_replica_count=1,
        dedicated_resources_max_replica_count=2
    )

## Compile the pipeline
---

In [60]:
from kfp.v2 import compiler
compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=MODEL_DIR+".json"
)



---
# Run the pipeline on Vertex Pipelines
---

### ENABLE API:

1. https://console.cloud.google.com/apis/enableflow?apiid=ml.googleapis.com,compute_component,containerregistry.googleapis.com&redirect=https:%2F%2Fconsole.cloud.google.com&authuser=1&project=friendly-tower-338419

In [65]:
#passa a modalità uniforme gcs e abilita acs come permessi
vertex_pipelines_job = vertexai.pipeline_jobs.PipelineJob(
    display_name=MODEL_DIR,
    template_path=MODEL_DIR+".json",
    parameter_values={
        "project": PROJECT_ID,
        "location": REGION,
        "staging_bucket": GCS_BUCKET,
        "display_name": DISPLAY_NAME,        
        "container_uri": IMAGE_URI,
        "model_serving_container_image_uri": SERVING_IMAGE_URI,        
        "base_output_dir": GCS_BASE_OUTPUT_DIR},
    enable_caching=True,
)
vertex_pipelines_job.run()

INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/480829964338/locations/us-central1/pipelineJobs/bertclassifier-20220217165511
INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:
INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/480829964338/locations/us-central1/pipelineJobs/bertclassifier-20220217165511')
INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/bertclassifier-20220217165511?project=480829964338
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/480829964338/locations/us-central1/pipelineJobs/bertclassifier-20220217165511 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/480829964338/locations/us-central1/pipelineJobs/bertclass

RuntimeError: Job failed with:
code: 9
message: "The DAG failed because some tasks failed. The failed tasks are: [customcontainertrainingjob-run].; Job (project_id = friendly-tower-338419, job_id = 8146787425541160960) is failed due to the above error.; Failed to handle the job: {project_number = 480829964338, job_id = 8146787425541160960}"


---
# Query deployed model on Vertex Endpoint for online predictions
---

In [None]:
# Retrieve your deployed Endpoint name from your pipeline.
ENDPOINT_NAME = vertexai.Endpoint.list()[0].name
#TODO: Generate online predictions using your Vertex Endpoint.

endpoint = vertexai.Endpoint(
    endpoint_name=ENDPOINT_NAME,
    project=PROJECT_ID,
    location=REGION
)
#TODO: write a movie review to test your model e.g. "The Dark Knight is the best Batman movie!"
test_review = "The Dark Knight is the best Batman movie!"

prediction =endpoint.predict([test_review])
print(prediction)