# **Part 1: Data Exploration for Tuning a Foundation Model**

### **Project environment setup**

* Load credentials and relevant Python Libraries.
* If you were running this notebook locally, you would first install Vertex AI. `!pip install google-cloud-aiplatform`

In [1]:
try:
  from dotenv import load_dotenv
except:
  !pip install python-dotenv
  !pip install google-cloud-aiplatform
  from dotenv import load_dotenv

In [2]:
# !pip install python-dotenv
# !pip install google-cloud-aiplatform

In [3]:
import os
from dotenv import load_dotenv
import json
import base64
from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentials
from google.cloud import bigquery

def authenticate():
    # Load .env
    load_dotenv()

    # Decode key and store in .JSON
    SERVICE_ACCOUNT_KEY_STRING_B64 = os.getenv('SERVICE_ACCOUNT_KEY')
    SERVICE_ACCOUNT_KEY_BYTES_B64 = SERVICE_ACCOUNT_KEY_STRING_B64.encode("ascii")
    SERVICE_ACCOUNT_KEY_STRING_BYTES = base64.b64decode(SERVICE_ACCOUNT_KEY_BYTES_B64)
    SERVICE_ACCOUNT_KEY_STRING = SERVICE_ACCOUNT_KEY_STRING_BYTES.decode("ascii")

    SERVICE_ACCOUNT_KEY = json.loads(SERVICE_ACCOUNT_KEY_STRING)

    # Create credentials based on key from service account
    credentials = Credentials.from_service_account_info(
        SERVICE_ACCOUNT_KEY,
        scopes=['https://www.googleapis.com/auth/cloud-platform'])

    if credentials.expired:
        credentials.refresh(Request())

    # Set project ID according to environment variable
    PROJECT_ID = os.getenv('PROJECT_ID')

    return credentials, PROJECT_ID

# Authenticate and initialize BigQuery client
credentials, PROJECT_ID = authenticate()
bq_client = bigquery.Client(project=PROJECT_ID, credentials=credentials)

In [4]:
import sys
sys.path.append('/content/utils.py')

In [5]:
# import utils
# credentials, PROJECT_ID = authenticate()
REGION = "us-central1"

Import the Vertex AI SDK.
The library helps to interact with the Vertex AI services in the cloud.
Initialize it.

In [6]:
import vertexai

vertexai.init(project = PROJECT_ID,
              location = REGION,
              credentials = credentials)

Import BigQuery to use as your data warehouse.
Initialize the client to start interacting with the data warehouse, send SQL and retrieve data into the notebook.

In [7]:
# from google.cloud import bigquery
# bq_client = bigquery.Client(project=PROJECT_ID, credentials = credentials)

### **Stack Overflow Public Dataset**

* You will use Stack Overflow Data on BigQuery Public Datasets.
The datasets include questions, answers and metadata related to Stack Overflow questions. Within this dataset, there are tables with data.

* Create a SQL query.

In [8]:
QUERY_TABLES = """
SELECT
  table_name
FROM
  `bigquery-public-data.stackoverflow.INFORMATION_SCHEMA.TABLES`
"""

The query is asking to retrieve table_name of all the TABLES
Use the client to send your SQL and retrieve the data (tables names).

In [9]:
query_job = bq_client.query(QUERY_TABLES)

for row in query_job:
    for value in row.values():
        print(value)

posts_answers
users
posts_orphaned_tag_wiki
posts_tag_wiki
stackoverflow_posts
posts_questions
comments
posts_tag_wiki_excerpt
posts_wiki_placeholder
posts_privilege_wiki
post_history
badges
post_links
tags
votes
posts_moderator_nomination


### **Data Retrieval**

* You'll fetch some data from the data warehouse and store it in Pandas dataframe for visualization.

* Select all columns from  posts_questions and put the LIMIT as 3.

In [10]:
INSPECT_QUERY = """
SELECT
    *
FROM
    `bigquery-public-data.stackoverflow.posts_questions`
LIMIT 3
"""

In [11]:
import pandas as pd

query_job = bq_client.query(INSPECT_QUERY)

Take the results of the query --> create an arrow table (which is part of Apache Framework) --> which goes into a Pandas dataframe.
This allows for data to be in a format which is easier to read and explore with Pandas.

In [12]:
stack_overflow_df = query_job\
    .result()\
    .to_arrow()\
    .to_pandas()
stack_overflow_df.head()

Unnamed: 0,id,title,body,accepted_answer_id,answer_count,comment_count,community_owned_date,creation_date,favorite_count,last_activity_date,last_edit_date,last_editor_display_name,last_editor_user_id,owner_display_name,owner_user_id,parent_id,post_type_id,score,tags,view_count
0,320268,Html.ActionLink doesn’t render # properly,<p>When using Html.ActionLink passing a string...,,0,0,NaT,2008-11-26 10:42:37.477000+00:00,0,2009-02-06 20:13:54.370000+00:00,NaT,,,Paulo,,,1,0,asp.net-mvc,390
1,324003,Primitive recursion,<p>how will i define the function 'simplify' ...,,0,0,NaT,2008-11-27 15:12:37.497000+00:00,0,2012-09-25 19:54:40.597000+00:00,2012-09-25 19:54:40.597000+00:00,Marcin,1288.0,,41000.0,,1,0,haskell|lambda|functional-programming|lambda-c...,497
2,390605,While vs. Do While,<p>I've seen both the blocks of code in use se...,390608.0,0,0,NaT,2008-12-24 01:49:54.230000+00:00,2,2008-12-24 03:08:55.897000+00:00,NaT,,,Unkwntech,115.0,,1,0,language-agnostic|loops,11262


### **Dealing with Large Datasets**
* Large datasets for LLMs often don't fit into memory.
* Select all of the columns and rows of the table posts_questions.

In [13]:
QUERY_ALL = """
SELECT
    *
FROM
    `bigquery-public-data.stackoverflow.posts_questions` q
"""

query_job = bq_client.query(QUERY_ALL)

In [14]:
try:
    stack_overflow_df = query_job\
    .result()\
    .to_arrow()\
    .to_pandas()
except Exception as e:
    print('The DataFrame is too large to load into memory.', e)

The DataFrame is too large to load into memory. 403 GET https://bigquery.googleapis.com/bigquery/v2/projects/llmops-project-438807/queries/e29e11ca-5fdc-4a65-9539-ae3218e25fa8?maxResults=0&location=US&prettyPrint=false: Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors

Location: US
Job ID: e29e11ca-5fdc-4a65-9539-ae3218e25fa8



**Note:**

* The data is too large to return, as it is not fitting into memory.
* Joining Tables and Query Optimization
* When working with (large) data, query optimizing is needed in order to save time and resources.
* Select questions as input_text (column 1), answers as output_text (column 2).
* Take the questions from posts_questions and answers from posts_answers.
* Join the questions and their corresponding accepted answers based on their same unique ID.
* Making sure the question is about Python, and that it has an answer. And the date the question was posted is on or after 2020-01-01
* Limit as 10,000

In [15]:
QUERY = """
SELECT
    CONCAT(q.title, q.body) as input_text,
    a.body AS output_text
FROM
    `bigquery-public-data.stackoverflow.posts_questions` q
JOIN
    `bigquery-public-data.stackoverflow.posts_answers` a
ON
    q.accepted_answer_id = a.id
WHERE
    q.accepted_answer_id IS NOT NULL AND
    REGEXP_CONTAINS(q.tags, "python") AND
    a.creation_date >= "2020-01-01"
LIMIT
    10000
"""

query_job = bq_client.query(QUERY)

In [16]:
stack_overflow_df = query_job.result()\
                        .to_arrow()\
                        .to_pandas()

stack_overflow_df.head(2)

Unnamed: 0,input_text,output_text
0,Turn PyCharm package back to a directory in Pr...,<p>Right click the folder -&gt; Mark directory...
1,Pandas Select Rows from a dataframe with highe...,<p>use groupby and take the max</p>\n<pre><cod...


### **Adding Instructions**

* Instructions for LLMs have been shown to improve model performance and generalization to unseen tasks (Google, 2022).
* Wihtout the instruction, it is only question and answer. Model might not understand what to do.
* With the instructions, the model gets a guideline as to what task to perform.

In [17]:
INSTRUCTION_TEMPLATE = f"""\
Please answer the following Stackoverflow question on Python. \
Answer it like you are a developer answering Stackoverflow questions.
​
Stackoverflow question:
"""

A new column will combine INSTRUCTION_TEMPLATE and the question input_text.
This avoids overwritting of any existing column which might be needed.

In [18]:
stack_overflow_df['input_text_instruct'] = INSTRUCTION_TEMPLATE + ' '\
    + stack_overflow_df['input_text']

### **Dataset for Tuning**

* Divide the data into a training and evaluation. By default, 80/20 split is used.
* This (80/20 split) allows for more data to be used for tuning. The evaluation split is used as unseen data during tuning to evaluate performance.
* The random_state parameter is used to ensure random sampling for a fair comparison.

In [19]:
from sklearn.model_selection import train_test_split

train, evaluation = train_test_split(
    stack_overflow_df,
    test_size=0.2,
    random_state=42
)

### **Different Datasets and Flow**

* Versioning data is important.
* It allows for reproducibility, traceability, and maintainability of machine learning models.
* Get the timestamp.

In [20]:
import datetime
date = datetime.datetime.now().strftime("%H:%d:%m:%Y")

Generate a jsonl file.
Name it as tune_data_stack_overflow_python_qa-{date}

In [21]:
cols = ['input_text_instruct','output_text']
tune_jsonl = train[cols].to_json(orient="records", lines=True)

In [22]:
training_data_filename = f"tune_data_stack_overflow_\
                            python_qa-{date}.jsonl"

In [23]:
with open(training_data_filename, "w") as f:
    f.write(tune_jsonl)

### **For Evaluation Set**

The code above generted a jsonl file for the train set. Now, make the evaluation set, which you can name as tune_eval_data_stack_overflow_python_qa-{date}.jsonl.

In [24]:
cols = ['input_text_instruct','output_text']

### you need to use the "evaluation" set now
tune_jsonl = evaluation[cols].to_json(orient="records", lines=True)

### change the file name
evaluation_data_filename = f"tune_eval_data_stack_overflow_\
                            python_qa-{date}.jsonl"

### write the file
with open(evaluation_data_filename, "w") as f:
    f.write(tune_jsonl)

# **Part 2: Automation & Orchestration with Pipelines**

### **Setup Kubeflow**

We will use Kubeflow Pipelines to orchestrat and automate a workflow. Kubeflow Pipelines is an open source framework. It's like a construction kit for building machine learning pipelines, making it easy to orchestrate and automate complex tasks.

In [35]:
try:
  import kfp
  from kfp import local, dsl, compiler
except:
  !pip install kfp
  import kfp
  from kfp import local, dsl, compiler

In [26]:
# Ignore FutureWarnings in kfp
import warnings
warnings.filterwarnings("ignore",
                        category=FutureWarning,
                        module='kfp.*')

**Kubeflow Pipelines**

* Kubeflow pipelines consist of two key concepts: Components and pipelines
* Pipeline components are like self-contained sets of code that perform various steps in your ML workflow, such as, the first step could be preprocessing data, and second step could be training a model.

### **Simple Pipeline Example**


In [27]:
### Component 1
@dsl.component
def say_hello(name: str) -> str:
    hello_text = f'Hello, {name}!'
    return hello_text

Since we "wrapped" this say_hello function in the decorator @dsl.component, the function will not actually return a string.
The function will return a PipelineTask object.

In [39]:
local.init(runner=local.SubprocessRunner(use_venv=False))

In [40]:
hello_task = say_hello(name="Erwin")
print(hello_task, hello_task.output)

10:47:03.872 - INFO - Executing task [96m'say-hello'[0m
10:47:03.874 - INFO - Streamed logs:

    [KFP Executor 2024-10-16 10:47:04,631 INFO]: Looking for component `say_hello` in --component_module_path `/tmp/tmp.vSNvqhsMmT/ephemeral_component.py`
    [KFP Executor 2024-10-16 10:47:04,631 INFO]: Loading KFP component "say_hello" from /tmp/tmp.vSNvqhsMmT/ephemeral_component.py (directory "/tmp/tmp.vSNvqhsMmT" and module name "ephemeral_component")
    [KFP Executor 2024-10-16 10:47:04,632 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "name": "Erwin"
            }
        },
        "outputs": {
            "parameters": {
                "Output": {
                    "outputFile": "/content/local_outputs/say-hello-2024-10-16-10-47-03-870034/say-hello/Output"
                }
            },
            "outputFile": "/content/local_outputs/say-hello-2024-10-16-10-47-03-870034/say-hello/executor_output.json"
        }
    }
   

The object that we'll use to pass the information in hello_text to other components in the pipeline is PipelineTask.output.

Note when passing in values to the a dsl.component function, you have to specify the argument names (keyword arguments), and can't use positional arguments.

In [41]:
# this will give an error and ask you to specify the parameter name
try:
  hello_task = say_hello("Erwin")
except Exception as e:
  print("Just kidding, sorry :P\nHere's the error: ", e)

Just kidding, sorry :P
Here's the error:  Components must be instantiated using keyword arguments. Positional parameters are not allowed (found 1 such parameters for component "say-hello").


The second component is dependent on the first component.
Take the output of the first component and pass it to the second component.

In [42]:
### Component 2
@dsl.component
def how_are_you(hello_text: str) -> str:
    how_are_you = f"{hello_text}. How are you?"
    return how_are_you

Notice that when we pass in the return value from the say_hello function, we want to pass in the PipelineTask.output object, and not the PipelineTask object itself.

In [43]:
how_task = how_are_you(hello_text=hello_task.output)
print(how_task, how_task.output)

10:47:32.108 - INFO - Executing task [96m'how-are-you'[0m
10:47:32.111 - INFO - Streamed logs:

    [KFP Executor 2024-10-16 10:47:32,823 INFO]: Looking for component `how_are_you` in --component_module_path `/tmp/tmp.g2VxpQzvKg/ephemeral_component.py`
    [KFP Executor 2024-10-16 10:47:32,823 INFO]: Loading KFP component "how_are_you" from /tmp/tmp.g2VxpQzvKg/ephemeral_component.py (directory "/tmp/tmp.g2VxpQzvKg" and module name "ephemeral_component")
    [KFP Executor 2024-10-16 10:47:32,824 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "hello_text": "Hello, Erwin!"
            }
        },
        "outputs": {
            "parameters": {
                "Output": {
                    "outputFile": "/content/local_outputs/how-are-you-2024-10-16-10-47-32-106764/how-are-you/Output"
                }
            },
            "outputFile": "/content/local_outputs/how-are-you-2024-10-16-10-47-32-106764/how-are-you/executor_outp

In [44]:
# This will give an error and ask you to pass in a built-in data type
try:
  how_task = how_are_you(hello_text=hello_task)
  print(how_task, how_task.output)
except Exception as e:
  print("Just kidding, sorry :P\nHere's the error: ", e)

Just kidding, sorry :P
Here's the error:  Constant argument inputs must be one of type ['String', 'Integer', 'Float', 'Boolean', 'List', 'Dict'] Got: <kfp.dsl.pipeline_task.PipelineTask object at 0x7e2b9df32bf0> of type <class 'kfp.dsl.pipeline_task.PipelineTask'>.


**Define the pipeline**

Notice how the input to say_hello is just recipient, since that is already a built-in data type (a String).
Recall that to get the value from a PipelineTask object, you'll use PipelineTask.output to pass in that value to another Pipeline Component function.
Notice that Pipeline function should return the PipelineTask.output as well.

In [45]:
### Pipeline
@dsl.pipeline
def hello_pipeline(recipient: str) -> str:
    hello_task = say_hello(name=recipient)
    how_task = how_are_you(hello_text=hello_task.output)
    return how_task.output

In [46]:
pipeline_output = hello_pipeline(recipient="Erwin")
print(pipeline_output)

10:47:47.628 - INFO - Running pipeline: [95m'hello-pipeline'[0m
----------------------------------------------------------------------------------------------------
10:47:47.631 - INFO - Executing task [96m'say-hello'[0m
10:47:47.633 - INFO - Streamed logs:

    [KFP Executor 2024-10-16 10:47:48,332 INFO]: Looking for component `say_hello` in --component_module_path `/tmp/tmp.wBXnKSYlru/ephemeral_component.py`
    [KFP Executor 2024-10-16 10:47:48,332 INFO]: Loading KFP component "say_hello" from /tmp/tmp.wBXnKSYlru/ephemeral_component.py (directory "/tmp/tmp.wBXnKSYlru" and module name "ephemeral_component")
    [KFP Executor 2024-10-16 10:47:48,333 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "name": "Erwin"
            }
        },
        "outputs": {
            "parameters": {
                "Output": {
                    "outputFile": "/content/local_outputs/hello-pipeline-2024-10-16-10-47-47-627716/say-hello/Output"

Note that if you tried to return a PipelineTask object instead of the PipelineTask.output, you'd get an error message

In [51]:
### Pipeline with wrong return value type

# @dsl.pipeline
# def hello_pipeline_with_error(recipient: str) -> str:
#     hello_task = say_hello(name=recipient)
#     how_task = how_are_you(hello_text=hello_task.output)
#     return how_task

    # returning the PipelineTask object itself will give you an error

**Implement the pipeline**

* A pipeline is a set of components that you orchestrate.
* It lets you define the order of execution and how data flows from one step to another.
* Compile the pipeline into a yaml file, pipeline.yaml
* You can look at the pipeline.yaml file in your workspace by running the cat command or simply going and opening the file directly.

In [52]:
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')

Define the arguments, the input that goes into the pipeline.

In [71]:
pipeline_arguments = {
    "recipient": "World!",
}

In [69]:
# View the pipeline.yaml
!cat pipeline.yaml

# PIPELINE DEFINITION
# Name: hello-pipeline
# Inputs:
#    recipient: str
# Outputs:
#    Output: str
components:
  comp-how-are-you:
    executorLabel: exec-how-are-you
    inputDefinitions:
      parameters:
        hello_text:
          parameterType: STRING
    outputDefinitions:
      parameters:
        Output:
          parameterType: STRING
  comp-say-hello:
    executorLabel: exec-say-hello
    inputDefinitions:
      parameters:
        name:
          parameterType: STRING
    outputDefinitions:
      parameters:
        Output:
          parameterType: STRING
deploymentSpec:
  executors:
    exec-how-are-you:
      container:
        args:
        - --executor_input
        - '{{$}}'
        - --function_to_execute
        - how_are_you
        command:
        - sh
        - -c
        - "\nif ! [ -x \"$(command -v pip)\" ]; then\n    python3 -m ensurepip ||\
          \ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
   

You can use Vertex AI pipelines, a managed, serverless environment, to execute the yaml files.

In [None]:
from google.cloud.aiplatform import PipelineJob

job = PipelineJob(
        template_path="pipeline.yaml",
        display_name=f"deep_learning_ai_pipeline",
        parameter_values=pipeline_arguments,
        location="us-central1",
        pipeline_root="./"
)

# submit for execution
job.submit()

# check to see the status of the job
job.state

### **Real-life Pipeline Example**

* Automation and Orchestration of a Supervised Tuning Pipeline.
* Reuse an existing Kubeflow Pipeline for Parameter-Efficient Fine-Tuning (PEFT) for a foundation model from Google, called PaLM 2.
* Advantage of reusing a pipleline means you do not have to build it from scratch, you can only specify some of the parameters.

In [57]:
TRAINING_DATA_URI = "./tune_data_stack_overflow_python_qa.jsonl"
EVAUATION_DATA_URI = "./tune_eval_data_stack_overflow_python_qa.jsonl"

* Provide the model with a version.
* Versioning model allows for:
  * Reproducibility: Reproduce your results and ensure your models perform as expected.
  * Auditing: Track changes to your models.
  * Rollbacks: Roll back to a previous version of your model.

In [58]:
# path to the pipeline file to reuse
template_path = 'https://us-kfp.pkg.dev/ml-pipeline/large-language-model-pipelines/tune-large-model/v2.0.0'

In [59]:
import datetime
date = datetime.datetime.now().strftime("%H:%d:%m:%Y")

In [60]:
MODEL_NAME = f"deep-learning-ai-model-{date}"

This example uses two **PaLM** model parameters:
1. **TRAINING_STEPS:** Number of training steps to use when tuning the model. For extractive QA you can set it from 100-500.
2. **EVALUATION_INTERVAL:** The interval determines how frequently a trained model is evaluated against the created evaluation set to assess its performance and identify issues. Default will be 20, which means after every 20 training steps, the model is evaluated on the evaluation dataset.


In [61]:
TRAINING_STEPS = 200
EVALUATION_INTERVAL = 20

In [62]:
# Load the Project ID and credentials
from utils import authenticate
credentials, PROJECT_ID = authenticate()
REGION = "us-central1"

In [63]:
# Define the arguments, the input that goes into the pipeline.
pipeline_arguments = {
    "model_display_name": MODEL_NAME,
    "location": REGION,
    "large_model_reference": "text-bison@001",
    "project": PROJECT_ID,
    "train_steps": TRAINING_STEPS,
    "dataset_uri": TRAINING_DATA_URI,
    "evaluation_interval": EVALUATION_INTERVAL,
    "evaluation_data_uri": EVAUATION_DATA_URI,
}

In [64]:
job = PipelineJob(
        template_path=template_path,
        display_name=f"deep_learning_ai_pipeline-{date}",
        parameter_values=pipeline_arguments,
        location=REGION,
        pipeline_root="./",
        enable_caching=True
)

# submit for execution
job.submit()

# check to see the status of the job
job.state

# **Part 3: Predictions, Prompts and Safety**

### **Setup and Initialization**

In [None]:
# Load the Project ID and credentials.

from utils import authenticate
credentials, PROJECT_ID = authenticate()
REGION = "us-central1"

* Import the Vertex AI SDK.
* Import and load the model.
* Initialize it.

In [None]:
import vertexai
from vertexai.language_models import TextGenerationModel

vertexai.init(project = PROJECT_ID,
              location = REGION,
              credentials = credentials)

### **Deployment**
* Load Balancing
* Load from pre-trained text-bison@001
* Retrieve the endpoints (deployed as REST API)

In [None]:
model = TextGenerationModel.from_pretrained("text-bison@001")

* Get the list of multiple models executed and deployed.
* This helps to rout the traffic to different endpoints.


In [None]:
list_tuned_models = model.list_tuned_model_names()
for i in list_tuned_models:
    print (i)

Randomly select from one of the endpoints to divide the prediction load.

In [None]:
import random
tuned_model_select = random.choice(list_tuned_models)

### **Getting the Response**
* Load the endpoint of the randomly selected model to be called with a prompt.
* The prompt needs to be as similar as possible as the one you trained your model on (python questions from stack overflow dataset)


In [None]:
deployed_model = TextGenerationModel.get_tuned_model(tuned_model_select)

Use deployed_model.predit to call the API using the prompt.

In [None]:
PROMPT = "How can I load a csv file using Pandas?"
response = deployed_model.predict(PROMPT)
print(response)

pprint makes the response easily readable.

Sending multiple prompts can return multiple responses ([0], [1], [2]...)
In this example, only 1 prompt is being sent, and returning only 1 response ([0])

In [None]:
from pprint import pprint

In [None]:
# load the first object of the response
output1 = response._prediction_response[0]

# load the second object of the response
output2 = response._prediction_response[0][0]

# retrieve the "content" key from the second object
final_output = response._prediction_response[0][0]["content"]

# printing "content" key from the second object
print(final_output)

### **Prompt Management and Templates**
* Remember that the model was trained on data that had an Instruction and a Question as a Prompt.
* In the example above, only a Question as a Prompt was used for a response.
* It is important for the production data to be the same as the training data. Difference in data can effect the model performance.
* Add the same Instruction as it was used for training data, and combine it with a Question to be used as a Prompt.


In [None]:
INSTRUCTION = """\
Please answer the following Stackoverflow question on Python.\
Answer it like\
you are a developer answering Stackoverflow questions.\
Question:
"""

QUESTION = "How can I store my TensorFlow checkpoint on\
Google Cloud Storage? Python example?"

In [None]:
# Combine the intruction and the question to create the prompt.

PROMPT = f"""
{INSTRUCTION} {QUESTION}
"""

print(PROMPT)

In [None]:
# Get the response using the new prompt, which is consistent with the prompt used for training.

final_response = deployed_model.predict(PROMPT)
output = final_response._prediction_response[0][0]["content"]
print(output)   # Note how the response changed from earlier.

### **Safety Attributes**
* The reponse also includes safety scores.
* These scores can be used to make sure that the LLM's response is within the boundries of the expected behaviour.
* The first layer for this check, blocked, is by the model itself.

In [None]:
blocked = response._prediction_response[0][0]['safetyAttributes']['blocked']
print(blocked)

* The second layer of this check can be defined by you, as a practitioner, according to the thresholds you set.
* The response returns probabilities for each safety score category which can be used to design the thresholds.

In [None]:
# retrieve the "safetyAttributes" of the response
safety_attributes = response._prediction_response[0][0]['safetyAttributes']
pprint(safety_attributes)

### **Citations**
* Ideally, a LLM should generate as much original cotent as possible.
* The citationMetadata can be used to check and reduce the chances of a LLM generating a lot of existing content.

In [None]:
citation = response._prediction_response[0][0]['citationMetadata']['citations']
pprint(citation)

In [None]:
PROMPT = "Finish the sentence: To be, or not "
response = deployed_model.predict(PROMPT)

In [None]:
# output of the model
output = response._prediction_response[0][0]["content"]
print(output)

In [None]:
# check for citation
citation = response._prediction_response[0][0]['citationMetadata']['citations']
pprint(citation)