# Week 2 Day 2: LLM Security Considerations, Accessing LLMs Programmatically

## Agenda
- OWASP Top 10 Security Vulnerabilities for LLMs
- Accessing HuggingFace model via Inference API
- A very, very simple RAG!

### [OWASP top 10 vulnerabilities for applications utilizing LLM](https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v05.pdf)

1. Prompt Injections
2. Insecure Output Handling
3. Training Data Poisoning
4. Denial of Service
5. Supply Chain
6. Permission Issues
7. Data Leakage
8. Excessive Agency
9. Overreliance
10. Insecure Plugins

#### Simple steps to take to enhance your LLM applications' security (according to OWASP and others)
- apply the principle of least privileges to LLMs as well
- sanitize and scrub user inputs (of any malicious content, but also PII you don't want entering system)
- validate llm outputs (treat it as if it's any other user)
- cap resource use per request

## Programmatic Access to LLM
HuggingFace's Inference API offers a very simple and easy to use interface for interacting with models hosted in their infrastructure. All you need to do is make an HTTP request!

In [18]:
# Let's make an http request to an llm
API_URL = "api_url"
headers = {"Authorization": "Bearer api_key"}


In [29]:
import requests

simple_payload = {
    "inputs": "My name is Juniper. What's my name?"
}

def query_llm(payload):
    response = requests.post(API_URL, json=payload, headers=headers)
    print(response.json())

query_llm(simple_payload)

[{'generated_text': "\n\nI'm a girl. What's my name?\n\nI'm a"}]


### [Zephyr-7b-b](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
Is by default an LLM with text generation task. so when we provide a simple input as above, it will automatically continue generating after the input token.

But we can change that behavior!

#### Message Roles
Typically, there are 3 roles: system, user, assistant.
- System: Global LLM wide instruction applied to every prompt. This is often used to give an LLM a persona.
- User: user input
- Assistant: Assistant output

Zephyr prefers following format when defining roles with messages:
<|system|>
</s>
<|user|>
</s>
<|assistant|>

In [28]:

input_template = """
    <|system|>
    You're a chatbot. Have a conversation with a user
    </s>
    <|user|>
    My name is Juniper. What's my name?
    </s>
    <|assistant|>
"""
role_message = {
    "inputs": input_template
}

query_llm(role_message)

[{'generated_text': 'Bot: Hello, Juniper! Your name is Juniper. Is there anything else I can help'}]


### Adding Options to our request
Text Generation tasks accepts the optional parameters outlined in [this documentation](https://huggingface.co/docs/api-inference/detailed_parameters#text-generation-task)

In [40]:
configured_request = {
    "parameters": {
        "max_new_tokens": 50,
        "temperature": 1
    },
    "inputs": """
    <|system|>
    You are a cat chatbot that responds with the word Meow! 
    </s>
    <|user|>
    hello?
    </s>
    <|assistant|>
"""
}
query_llm(configured_request)

[{'generated_text': "Meow! \n\n(Note: As an AI language model, I do not have the ability to hear or respond to audio input. However, if you're talking to a cat chatbot that responds with the word Meow!"}]


In [43]:
import file
# a very, very, very simple RAG
rag = {
    "parameters": {
        "max_new_tokens": 50,
        "temperature": 1
    },
    "inputs": """
    <|system|>
    You are a helpful assistant. With the context provided, answer the user's question to the best of your ability. 
    </s>
    <|user|>
    Context: {context}
    Question: {user_question}
    </s>
    <|assistant|>
"""
}
with open('context.text') as f:
    context = f.read()


query_llm(rag)

[{'generated_text': 'Based on the provided context, it can be inferred that ABC enjoys playing the violin as they are currently a member of a local orchestra. However, without further information, it is unclear if they have other interests or hobbies'}]
