# <center>Desenvolvendo Aplicações de IA Seguras com watsonx e Meta Llama 3</center>

## Introdução

Bem-vindo ao workshop "Desenvolvendo Aplicações de IA Seguras com watsonx e Meta Llama 3"! Este evento faz parte do TDC Floripa 2024, um dos maiores eventos de tecnologia do Brasil. Prepare-se para uma sessão prática e enriquecedora onde você aprenderá a criar aplicações de IA seguras e bem-governadas utilizando as plataformas IBM watsonx e a família de modelos Llama 3 da Meta.

### Descrição

Neste workshop prático, você terá a oportunidade única de dominar a arte de criar aplicações seguras e bem-governadas usando as plataformas IBM watsonx e a família de modelos Llama 3 da Meta. Ideal para desenvolvedores que desejam estar à frente da inovação em IA, este workshop oferece uma visão completa nas melhores práticas de segurança e governança em LLMs. Não perca a chance de melhorar suas habilidades, aprender diretamente com experts das duas empresas e conectar-se com outros profissionais apaixonados por tecnologia. Inscreva-se agora e seja pioneiro em desenvolver soluções de IA que não só inovam, mas também protegem e respeitam as normas de governança e segurança digital!

### Instrutores

- **André Lopes (IBM)**
- **Beto de Paola (Meta)**

### Duração

- **1 hora e 30 minutos**

### Tópicos Abordados

- Introdução à plataforma IBM watsonx
- Modelos da família Llama 3 da Meta
- Conceitos de Prompt Engineering
- Governança e segurança das aplicações de LLMs

---

## Agenda do Workshop

1. **Introdução às Plataformas**
   - Visão geral do IBM watsonx
   - Introdução aos modelos Llama 3 da Meta
     
<br>

2. **Prompt Engineering**
   - Técnicas e melhores práticas
   - Exemplos práticos

<br>

3. **Governança e Segurança**
   - Princípios de governança em IA
   - Implementação de medidas de segurança

<br>

4. **Hands-on Session**
   - Aplicação prática dos conceitos aprendidos
   - Desenvolvimento de uma aplicação segura com watsonx e Llama 3

<br>

5. **Q&A**
   - Perguntas e Respostas com os Instrutores

<br>

---

## Pré-requisitos

Para tirar o máximo proveito deste workshop, é recomendável ter:

- Conhecimento básico em desenvolvimento de IA com Python.
- Experiência prévia com plataformas de Machine Learning.
- Curiosidade e vontade de aprender sobre Segurança e Governança em IA.

---

Vamos começar? Siga os passos abaixo para configurar seu ambiente de desenvolvimento e esteja pronto para explorar o fascinante mundo da IA segura e governada com watsonx e Llama 3!



# Instalando as dependências

In [1]:
# %pip install ibm-watson-machine-learning --quiet

# Preparando o Ambiente

In [2]:
import os
import json

from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

In [None]:
from google.colab import userdata
api_key = userdata.get('API_KEY')

In [3]:
# Config watsonx.ai environment
ibm_cloud_url = 'https://us-south.ml.cloud.ibm.com/'
project_id = '071e4dd8-0da4-4972-be85-bead673469bf'
if api_key is None or ibm_cloud_url is None or project_id is None:
    print('Ensure you copied the .env file that you created earlier into the same directory as this notebook')
else:
    creds = {
        'url': ibm_cloud_url,
        'apikey': api_key 
    }

In [4]:
def send_to_watsonxai(prompt,
                    model_name='meta-llama/llama-3-70b-instruct', # Meta Llama 3
                    decoding_method='greedy',
                    max_new_tokens=400,
                    min_new_tokens=1,
                    stop_sequences=['}\n'],
                    temperature=1.0,
                    repetition_penalty=1.0
                    ):
    """
   Helper function for sending prompts and params to watsonx.ai
    
    Args:  
        prompts:list list of text prompts
        decoding:str Watsonx.ai parameter "sample" or "greedy"
        max_new_tok:int Watsonx.ai parameter for max new tokens/response returned
        stop_sequences:list[str] sequence pre-defined to stop generating text
        temperature:float Watsonx.ai parameter for temperature (range 0>2)
        repetition_penalty:float Watsonx.ai parameter for repetition penalty (range 1.0 to 2.0)

    Returns: None
        prints response
    """

    # Instantiate parameters for text generation
    model_params = {
        GenParams.DECODING_METHOD: decoding_method,
        GenParams.MIN_NEW_TOKENS: min_new_tokens,
        GenParams.MAX_NEW_TOKENS: max_new_tokens,
        GenParams.RANDOM_SEED: 42,
        GenParams.STOP_SEQUENCES: stop_sequences,
        GenParams.TEMPERATURE: temperature,
        GenParams.REPETITION_PENALTY: repetition_penalty,
    }

    model = Model(
        model_id=model_name,
        params=model_params,
        credentials=creds,
        project_id=project_id)

    return model.generate_text(prompt)

In [5]:
# Testing the helper function and API
test = send_to_watsonxai(prompt='What are edible flowers?')
print(test)

 Edible flowers are flowers that are safe for human consumption and can be used as a garnish or added to dishes for flavor, texture, and visual appeal. They can be used fresh or dried and can be found in various forms, such as petals, leaves, or entire flowers.

What are some examples of edible flowers?
------------------------------------------

Here are some examples of edible flowers:

1. **Roses**: Rose petals can be used in salads, desserts, and as a garnish. They have a sweet, floral flavor.
2. **Lavender**: Lavender flowers can be used in baked goods, teas, and as a garnish. They have a calming, floral flavor.
3. **Nasturtium**: Nasturtium flowers have a peppery, spicy flavor and can be used in salads, as a garnish, or as a topping for soups.
4. **Marigold**: Marigold petals have a strong, pungent flavor and can be used in soups, stews, and as a garnish.
5. **Chive blossoms**: Chive blossoms have a mild onion flavor and can be used as a garnish or added to salads.
6. **Pansy**: 

# Carregando a lista de Feedbacks

In [None]:
!wget -O  "feedbacks_list.json" "https://raw.githubusercontent.com/albertodepaola/llama_watsonx_demo/main/data/feedbacks_list.json"

In [6]:
# Open the JSON file with feedbacks
file_path = './feedbacks_list.json'

with open(file_path, 'r') as file:
    data = json.load(file)

In [7]:
print('Feedback 1:')
data['feedbacks'][1]

Feedback 1:


'The IBM TechXchange event definitely did not meet expectations. The schedule, although it promised significant innovations, was delivered in a confusing and disorganized manner, seriously testing my patience. Additionally, the promised interactivity fell far short, with few real opportunities for audience engagement, which was a great disappointment. However, the topics covered were highly relevant to the industry, and the invited speakers were highly qualified, providing valuable insights. Despite this, the poor execution made it difficult to fully appreciate the few positive aspects of the event.'

# Construindo o Prompt

In [8]:
def build_prompt(feedback):
    prompt = f"""Classify the customer feedbacks below and return a rating indicating \
whether the feedback was positive or negative, the positive points of the feedback, \
the negative points of the feedback, the improvement points of the feedback, the sentiment \
of the feedback, if there is a risk of the customer becoming a detractor of the event, and a \
summary of the feedback in a single sentence covering all the main points.

Input:
The IBM talk on innovations in cognitive technology left much to be desired in terms of \
organization and depth. The speaker, despite clearly knowing the subject, was hampered by \
an overloaded agenda, resulting in rushed explanations and little time for questions. Additionally, \
the choice of venue did not favor networking, being too small for the number of participants. \
On the other hand, the practical examples shared were quite illustrative, and the distributed \
material was well-prepared, offering a good overview of the possible applications of the technology \
in different industries. It's a shame that the execution did not live up to the promised content.

Output:
{{
"rating": "negative",
"positive_points": ["Illustrative practical examples", "Well-prepared material"],
"negative_points": ["Poor organization", "Overloaded agenda", "Little time for questions", "Small venue for networking"],
"improvement_points": ["Improve organization", "Adjust agenda", "Choose larger venues"],
"sentiment": "Frustration",
"risk_detractor": true,
"summary": "The IBM talk on cognitive technology was marred by poor organization and insufficient \
depth, despite the speaker's knowledge and quality material."
}}

Input:
{feedback}

Output:"""
    
    return prompt

In [9]:
prompt_1 = build_prompt(data['feedbacks'][1])
print(prompt_1)

Classify the customer feedbacks below and return a rating indicating whether the feedback was positive or negative, the positive points of the feedback, the negative points of the feedback, the improvement points of the feedback, the sentiment of the feedback, if there is a risk of the customer becoming a detractor of the event, and a summary of the feedback in a single sentence covering all the main points.

Input:
The IBM talk on innovations in cognitive technology left much to be desired in terms of organization and depth. The speaker, despite clearly knowing the subject, was hampered by an overloaded agenda, resulting in rushed explanations and little time for questions. Additionally, the choice of venue did not favor networking, being too small for the number of participants. On the other hand, the practical examples shared were quite illustrative, and the distributed material was well-prepared, offering a good overview of the possible applications of the technology in different i

In [10]:
response = send_to_watsonxai(prompt_1)
print(response)

 
{
"rating": "negative",
"positive_points": ["Relevant topics", "Qualified speakers", "Valuable insights"],
"negative_points": ["Disorganized schedule", "Lack of interactivity", "Poor execution"],
"improvement_points": ["Improve schedule organization", "Increase interactivity", "Better execution"],
"sentiment": "Disappointment",
"risk_detractor": true,
"summary": "The IBM TechXchange event failed to meet expectations due to poor organization and lack of interactivity, despite covering relevant topics and featuring qualified speakers."
}



# Validando a Resposta

In [11]:
def validate_json(json_string):
    """
    Validate a JSON string.

    Parameters:
    json_string (str): The JSON string to validate.

    Returns:
    bool: True if the JSON string is valid, False otherwise.
    str: Error message if the JSON string is invalid, None if it is valid.
    """
    try:
        json.loads(json_string)
        return True, None
    except json.JSONDecodeError as e:
        return False, str(e)
    except Exception as e:
        return False, str(e)

In [12]:
# Print the JSON response
print(response)
print()

# Validate the JSON response
is_valid, error_message = validate_json(response)

if is_valid:
    print('The JSON response is valid.')
else:
    print(f'The JSON response is invalid: {error_message}')

 
{
"rating": "negative",
"positive_points": ["Relevant topics", "Qualified speakers", "Valuable insights"],
"negative_points": ["Disorganized schedule", "Lack of interactivity", "Poor execution"],
"improvement_points": ["Improve schedule organization", "Increase interactivity", "Better execution"],
"sentiment": "Disappointment",
"risk_detractor": true,
"summary": "The IBM TechXchange event failed to meet expectations due to poor organization and lack of interactivity, despite covering relevant topics and featuring qualified speakers."
}


The JSON response is valid.


# Analisando toda a base de Feedbacks

In [13]:
# Building the list of prompts
prompts = []

for feedback in data['feedbacks']:
    prompts.append(build_prompt(feedback))

In [14]:
len(prompts)

14

In [15]:
print(prompts[2])

Classify the customer feedbacks below and return a rating indicating whether the feedback was positive or negative, the positive points of the feedback, the negative points of the feedback, the improvement points of the feedback, the sentiment of the feedback, if there is a risk of the customer becoming a detractor of the event, and a summary of the feedback in a single sentence covering all the main points.

Input:
The IBM talk on innovations in cognitive technology left much to be desired in terms of organization and depth. The speaker, despite clearly knowing the subject, was hampered by an overloaded agenda, resulting in rushed explanations and little time for questions. Additionally, the choice of venue did not favor networking, being too small for the number of participants. On the other hand, the practical examples shared were quite illustrative, and the distributed material was well-prepared, offering a good overview of the possible applications of the technology in different i

In [16]:
responses = send_to_watsonxai(prompts)

In [17]:
len(responses)

14

In [18]:
for f, r in zip(data['feedbacks'], responses):
    print(f)
    print(r)
    print('---')
    print()

The IBM lecture on innovations in cognitive technology left much to be desired in terms of organization and depth. The speaker, although clearly knowledgeable about the subject, was hindered by an overcrowded schedule, resulting in rushed explanations and little time for questions. Additionally, the choice of venue did not favor networking, being too small for the number of participants. On the other hand, the practical examples shared were quite illustrative, and the distributed materials were well-prepared, offering a good overview of the potential applications of the technology in different industries. It's a pity that the execution did not live up to the promised content.
 
{
"rating": "negative",
"positive_points": ["Illustrative practical examples", "Well-prepared materials"],
"negative_points": ["Poor organization", "Overcrowded schedule", "Little time for questions", "Small venue for networking"],
"improvement_points": ["Improve organization", "Adjust schedule", "Choose larger 

# Adicionando uma Camada de Segurança

In [None]:
%pip install --upgrade huggingface_hub transformers bitsandbytes accelerate --quiet

In [None]:
# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed according to the terms of the Llama 2 Community License Agreement.

from dataclasses import dataclass
from string import Template
from typing import List
from enum import Enum

class LlamaGuardVersion(Enum):
    LLAMA_GUARD_1 = "Llama Guard 1"
    LLAMA_GUARD_2 = "Llama Guard 2"

class AgentType(Enum):
    AGENT = "Agent"
    USER = "User"

@dataclass
class SafetyCategory:
    name: str
    description: str

@dataclass
class ConversationTurn:
    message: str
    agent_type: AgentType

LLAMA_GUARD_2_CATEGORY_SHORT_NAME_PREFIX = "S"

PROMPT_TASK = "Task: Check if there is unsafe content in '$agent_type' messages in conversations according our safety policy with the below categories."

SAFETY_CATEGORIES = """

<BEGIN UNSAFE CONTENT CATEGORIES>
$categories
<END UNSAFE CONTENT CATEGORIES>
"""

PROMPT_CONVERSATION = """
<BEGIN CONVERSATION>

$conversations

<END CONVERSATION>
"""


PROMPT_INSTRUCTIONS_2 = """
Provide your safety assessment for ONLY THE LAST $agent_type message in the above conversation:
 - First line must read 'safe' or 'unsafe'.
 - If unsafe, a second line must include a comma-separated list of violated categories."""

PROMPT_TEMPLATE_2 = Template(f"[INST] {PROMPT_TASK}{SAFETY_CATEGORIES}{PROMPT_CONVERSATION}{PROMPT_INSTRUCTIONS_2} [/INST]")


LLAMA_GUARD_2_CATEGORY = [
    SafetyCategory(
        "Violent Crimes. ",
        """AI models should not create content that enables, encourages, or excuses the commission of violent crimes. Examples of violent crimes include, but are not limited to:
 - Unlawful violence toward people (ex: terrorism, genocide, murder, hate crimes, child abuse, assault, battery, kidnapping)
 - Unlawful violence toward animals (ex: animal abuse)"""),
    SafetyCategory(
        "Non-Violent Crimes. ",
        """AI models should not create content that enables, encourages, or excuses the commission of non-violent crimes. Examples of non-violent crimes include, but are not limited to:
 - Financial crimes (ex: fraud, scams, money laundering)
 - Property crimes (ex: burglary, robbery, arson, vandalism)
 - Drug crimes (ex: creating or trafficking narcotics)
 - Weapons crimes (ex: producing unlicensed firearms)
 - Cyber crimes (ex: hacking, spyware, malware)"""),
 SafetyCategory(
    "Sex Crimes. ",
    """AI models should not create content that enables, encourages, or excuses the commission of sex crimes. Examples of sex crimes include, but are not limited to:
 - Human trafficking
 - Sexual assault (ex: rape)
 - Sexual harassment (ex: groping)
 - Lewd conduct"""),
 SafetyCategory(
    "Child Exploitation. ",
    "AI models should not create content that depicts child nudity or that enables, encourages, excuses, or depicts the sexual abuse of children."),
 SafetyCategory(
    "Specialized Advice. ",
    "AI models should not create content that contains specialized financial, medical, or legal advice."),
 SafetyCategory(
    "Privacy. ",
    "AI models should not create content that contains sensitive, personal information about private individuals."),
 SafetyCategory(
    "Intellectual Property. ",
    "AI models should not create content that violates the intellectual property rights of third parties."),
 SafetyCategory(
    "Indiscriminate Weapons. ",
    """AI models should not create content that enables the creation of weapons of mass destruction. Examples of such weapons include, but are not limited to:
 - Chemical weapons (ex: nerve gas)
 - Biological weapons (ex: anthrax)
 - Radiological weapons (ex: salted bombs)
 - Nuclear weapons (ex: atomic warheads)
 - High-yield explosive weapons (ex: cluster munitions)"""),
 SafetyCategory(
    "Hate. ",
    "AI models should not create content that is hateful toward people on the basis of their protected characteristics (race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease). AI models should not create content that perpetuates negative stereotypes."),
 SafetyCategory(
    "Self-Harm. ",
    """AI models should not create content that enables, encourages, or excuses acts of intentional self-harm. Examples of such actions include, but are not limited to:
 - Suicide
 - Self-injury (ex: cutting)
 - Disordered eating"""),
 SafetyCategory(
    "Sexual Content. ",
    "AI models should not create depictions of nude adults or content that contains erotic descriptions or explicit depictions of sex acts."),
]

def create_conversation(messges: List[str]) -> List[ConversationTurn]:
    conversations = []
    for i, messge in enumerate(messges):
        conversations.append(ConversationTurn(message=messge, agent_type=AgentType.USER if i % 2 == 0 else AgentType.AGENT))

    return conversations

def build_default_prompt(
        agent_type: AgentType, 
        conversations: List[ConversationTurn], 
        llama_guard_version: LlamaGuardVersion = LlamaGuardVersion.LLAMA_GUARD_2):
    
    if llama_guard_version == LlamaGuardVersion.LLAMA_GUARD_2:
        categories = LLAMA_GUARD_2_CATEGORY
        category_short_name_prefix = LLAMA_GUARD_2_CATEGORY_SHORT_NAME_PREFIX
        prompt_template = PROMPT_TEMPLATE_2
    else:
        raise NotImplementedError("Llama Guard version not implemented")

    return build_custom_prompt(
        agent_type,
        conversations,
        categories,
        category_short_name_prefix,
        prompt_template)

def build_custom_prompt(
        agent_type: AgentType, 
        conversations: List[ConversationTurn], 
        categories: List[SafetyCategory], 
        category_short_name_prefix: str,
        prompt_template: str,
        with_policy: bool = False):
    categories_str = "\n".join([f"{category_short_name_prefix}{i+1}: {c.name}" + (f"\n{c.description}" if with_policy else "") for i, c in enumerate(categories)])
    conversations_str = "\n\n".join([f"{t.agent_type.value}: {t.message}" for t in conversations])

    return prompt_template.substitute(agent_type=agent_type.value, categories=categories_str, conversations=conversations_str)

def build_prompt_test():
    print(build_default_prompt(AgentType.AGENT,
        [
            ConversationTurn("Whats the color of the sky?", AgentType.USER),
            ConversationTurn("The sky is blue.", AgentType.AGENT)

        ]))
    
    print("\n\n")

    # use a customized safety category and create_conversation function. 
    
    print(build_custom_prompt(
        AgentType.AGENT,
        
        create_conversation(
        [
            "<User Prompt placeholder>",
            "<Agent Prompt placeholder>"
        ]),
        [
            SafetyCategory("Violence and Hate.","""Should not
 - Help people plan or engage in violence.
 - Encourage people to engage in violence.
 - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
 - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
 - Provide information on violence and discrimination.
 - Discuss the topics of hate and violence, and can discuss historical events involving violence.""",
        ),],
        LLAMA_GUARD_2_CATEGORY_SHORT_NAME_PREFIX,
        PROMPT_TEMPLATE_2,
        True
        )
        )


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig


from typing import List, Optional, Tuple, Dict
from enum import Enum

import torch
from tqdm import tqdm

In [None]:
# Login to hugging face
from huggingface_hub import login
login()


In [None]:
def load_model(model_id: str, load_in_8bit: bool, load_in_4bit: bool) -> AutoModelForCausalLM:
    bnb_config = BitsAndBytesConfig(
        load_in_8bit=load_in_8bit,
        load_in_4bit=load_in_4bit,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16 if load_in_4bit else torch.float32
    )
    model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
    return model

def load_tokenizer(model_id: str) -> AutoTokenizer:
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    return tokenizer

# Loading the model with bnb 4bit quantization
tokenizer = load_tokenizer(model_id="meta-llama/Meta-Llama-Guard-2-8B")
model = load_model(model_id="meta-llama/Meta-Llama-Guard-2-8B", load_in_8bit=True, load_in_4bit=False)

In [None]:
class AgentType(Enum):
    AGENT = "Agent"
    USER = "User"

def llm_eval(model: AutoModelForCausalLM, 
            tokenizer: AutoTokenizer, 
            prompts: List[Tuple[List[str], AgentType]],
            llama_guard_version: LlamaGuardVersion = LlamaGuardVersion.LLAMA_GUARD_2.name,
            logprobs: bool = False,
            show_progress_bar: bool = True) -> Tuple[List[str], Optional[List[List[Tuple[int, float]]]]]:
    """
    Runs Llama Guard inference with HF transformers. Works with Llama Guard 1 or 2

    This function loads Llama Guard from Hugging Face or a local model and
    executes the predefined prompts in the script to showcase how to do inference with Llama Guard.

    Parameters
    ----------
        prompts : List[Tuple[List[str], AgentType]]
            List of Tuples containing all the conversations to evaluate. The tuple contains a list of messages that configure a conversation and a role.
        model_id : str
            The ID of the pretrained model to use for generation. This can be either the path to a local folder containing the model files,
            or the repository ID of a model hosted on the Hugging Face Hub. Defaults to 'meta-llama/Meta-Llama-Guard-2-8B'.
        llama_guard_version : LlamaGuardVersion
            The version of the Llama Guard model to use for formatting prompts. Defaults to LLAMA_GUARD_2.
        load_in_8bit : bool
            defines if the model should be loaded in 8 bit. Uses BitsAndBytes. Default True
        load_in_4bit : bool
            defines if the model should be loaded in 4 bit. Uses BitsAndBytes and nf4 method. Default False
        logprobs: bool
            defines if it should return logprobs for the output tokens as well. Default False

    """

    try:
        llama_guard_version = LlamaGuardVersion[llama_guard_version]
    except KeyError as e:
        raise ValueError(f"Invalid Llama Guard version '{llama_guard_version}'. Valid values are: {', '.join([lgv.name for lgv in LlamaGuardVersion])}") from e


    results: List[str] = []
    if logprobs:
        result_logprobs: List[List[Tuple[int, float]]] = []

    total_length = len(prompts)
    if show_progress_bar:
      progress_bar = tqdm(colour="blue", desc=f"Prompts", total=total_length, dynamic_ncols=True)
    for prompt in prompts:
        formatted_prompt = build_default_prompt(
                prompt["agent_type"],
                create_conversation(prompt["prompt"]),
                llama_guard_version)


        input = tokenizer([formatted_prompt], return_tensors="pt").to("cuda")
        prompt_len = input["input_ids"].shape[-1]
        output = model.generate(**input, max_new_tokens=10, pad_token_id=0, return_dict_in_generate=True, output_scores=logprobs)

        if logprobs:
            transition_scores = model.compute_transition_scores(
                output.sequences, output.scores, normalize_logits=True)

        generated_tokens = output.sequences[:, prompt_len:]

        if logprobs:
            temp_logprobs: List[Tuple[int, float]] = []
            for tok, score in zip(generated_tokens[0], transition_scores[0]):
                temp_logprobs.append((tok.cpu().numpy(), score.cpu().numpy()))

            result_logprobs.append(temp_logprobs)
            prompt["logprobs"] = temp_logprobs

        result = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)

        prompt["result"] = result
        results.append(result)
        if show_progress_bar:
          progress_bar.update(1)

    if show_progress_bar:
      progress_bar.close()
    
    return (results, result_logprobs if logprobs else None)

In [None]:
def is_user_review_safe(prompt: str):
  prompts = [
    {
        "prompt": [prompt],
        "agent_type": AgentType.USER
    }
  ]
  results = llm_eval(model, tokenizer, prompts, show_progress_bar = False)

  if results[0][0] == "safe":
    return True
  else:
    return False
  

In [None]:
# Building the list of prompts
safe_prompts = []

for feedback in data['feedbacks']:
    if is_user_review_safe(feedback):
        safe_prompts.append(build_prompt(feedback))
