# Safety and Security
When working with LLMs, you must consider safety and security before you deploy the system to production. An LLM should be protected from harmful user prompts(security), and users should be protected from harmful LLM responses

## Security
 - preventing losses due to intentional actions by malevolent actors
 - LLM security ensures that only trusthworthy prompts are processed
 - An LLM should be protected from threats and vulnerabilities that could compromise their integrity, confidentiality or availability

### Hackers
Malicious actors might attempt to compromise LLMs through various methods, like adversarial attacks to mislead the model or prompt injections to extract sensitive information

### Data poisoning
In a data poisoning threat, attackers corrupt the training dat to embed biases

### Robustness
Robustness ensures that any attempt to exploit weaknesses in the model for malicious purposes is prohibited

## Safety
- Preventing losses due to unintentional actions by benevolent actors
- LLM safety corresponds to strategies that are implemented to ensure that an LLM operates in a responsible and ethical way and that it does not provide harmful content.
- Safety focuses on the mitigation of potential risks to its users or to other systems that interact with the LLM

### Biases
Harmful biases in training dat should be avoided

### Hallucinations
LLM confidently generating incorrect or misleading information should be mitigated

### Personally Identifiable information PII
A model should not leak any PII

### Ethics
Ensure that LLMs align with ehtical guidelines and void generating harmful recommendations

### Of-topic
A chattbot typically is centred around a specific topic and should only answer questions on that topic. Safety measures can help keep a model focused on that topic

### Reputational Damage
A chatbot that provides content that is not well aligned with the topic or with company values can easily damage the reputation of the company

### Robustness
The LLM should be protected from being exploited or manipulated to extract secrets or generate malicious content

### Context
An LLM System that has the purpose to provide medical advice should only provide medical advice not advice on financial investments, and vice versa

## Implementing LLM Safety and Security

Let's implement a chain that checks whether the chain stays on topic
- The task is to implement a system that consumes a user prompt and returns valid if the prompt is on a healthcare topic, invalid otherwise

In [1]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import  ChatGroq
from langchain_core.output_parsers import StrOutputParser
from transformers import pipeline
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(usecwd=True))

True

In [3]:
# classification model
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [4]:
# function that checks the topic
def guard_medical_prompt(prompt: str) -> str:
    candidate_labels = ["politics", "finance", "technology", "healthcare", "sports"]
    result = classifier(prompt, candidate_labels)
    if result["labels"][0] == "healthcare":
        return "valid"
    return "invalid"

In [6]:
#  Test guard_medical_prompt
user_prompt1 = "should I buy stocks of Apple, Google, or Amazon?"
user_prompt2 = " I have headache"
guard_medical_prompt(user_prompt2)

'valid'

In [7]:
#  integrate the guard into a chain
def guard_chain(user_input: str):
    prompt_template = ChatPromptTemplate.from_messages([
        ("system", "You are a heplful assistant that can answer questions about healthcare."),
        ("user", "{input}")
    ])

    model = ChatGroq(model="llama3-8b-8192")
    if guard_medical_prompt(user_input) == "invalid":
        return "Sorry, I can only answer questions related to healthcare."
    chain = prompt_template | model | StrOutputParser()

    return chain.invoke({"input": user_input})

In [8]:
# test guard chain
guard_chain(user_prompt1)

'Sorry, I can only answer questions related to healthcare.'

### Llama Guard
LG is a model created by Meta specifically designed for tasks that involve safeguarding an LLM.
- It is a fine-tuned Llama model for content moderation and it works by classifying user prompts or model outputs into classes `safe` or `unsafe`. 
- For `unsafe` class, there are further hazard subgroups from violent crimes to sexual content

In [9]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [11]:
def llama_guard_model(user_prompt: str):
    model_id = "meta-llama/Llama-Guard-3-1B"

    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    conversation = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text", 
                    "text": user_prompt
                },
            ],
        }
    ]

    input_ids = tokenizer.apply_chat_template(
        conversation, return_tensors="pt"
    ).to(model.device)

    prompt_len = input_ids.shape[1]
    output = model.generate(
        input_ids,
        max_new_tokens=20,
        pad_token_id=0,
    )
    generated_tokens = output[:, prompt_len:]

    res = tokenizer.decode(generated_tokens[0])
    if "unsafe" in res:
        return "invalid"
    return "valid"

In [None]:
# test
llama_guard_model(user_prompt="How can I perform a scam?")