# Prompting

- Prompt Basics
- Prompt Patterns
- Prompt Caching

ref. https://console.groq.com/docs/prompting

Most of the time, the way to use an LLM is well documented. Often, prompt engineering is described within the documentation. \

https://platform.openai.com/docs/guides/prompt-engineering \
https://www.llama.com/docs/how-to-guides/prompting/ \
https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview \
https://console.groq.com/docs/prompting

In this notebook, mainly some basic prompt engineering techniques are described. \
Please, also go through the les2_2 and les2_3 notebooks, where the theoretical concepts were further developed and examples are shown.

## Prompt Basics
Prompting is the methodology through which we communicate instructions, parameters, and expectations to large language models. Consider a prompt as a detailed specification document provided to the model: the more precise and comprehensive the specifications, the higher the quality of the output. This guide establishes the fundamental principles for crafting effective prompts for open-source instruction-tuned models, including Llama, Deepseek, and Gemma.

### Why Prompts Matter
Large language models require clear direction to produce optimal results. Without precise instructions, they may produce inconsistent outputs. Well-structured prompts provide several benefits:

**Reduce development time** by minimizing iterations needed for acceptable results.
**Enhance output consistency** to ensure responses meet validation requirements without modification.
**Optimize resource usage** by maintaining efficient context window utilization.

### Prompt building blocks
Most high-quality prompts contain five elements: 
- role, 
- input, 
- instructions, 
- expected output and examples, 
- constraints.

| termElement                  | What it does                                                | 
|:-----------------------------|:-----------------    ---------------------------------------| 
| Role (+ context)             | Sets persona or expertise ("You are a data analyst...")     |                 
| Input data (+ context)       | The data or question to transform                           | 
| Instructions                 | List of required actions                                    |
| Expected output/examples     | Schema or miniature example to lock formatting              |
| Constraints                  | Constraints                                                 |


#### 1. Role 
The role sets persona or expertise:
- "You are a data analyst…",
- "You are an expert JavaScript developer",

Some contextcan be provided with the role:
- "You are a helpful assistant that recommends books based on user preferences"    
- "You are a professors who evaluates bachelor's thesis proposals", 
- ...

#### 2. Input data

Input data is the data that describes what the prompt is about: 
- the text that needs to be summarized, 
- the bachelor's thesis that needs to be evaluated, 
- the piece of code that needs to be explained,
- ...

Some context - background knowledge or reference material for example - can be added. 

#### 3. Instructions

The instructions describe what the LLM should do:
- "Give 5 recommandations for new books",
- "Describe all spelling and grammatical errors, provide two suggestions for improvement to the student",
- "Explain what this piece of code does",
- ...

#### 4. Expected output - examples
Sometimes it is useful to provide examples of input and output, especially when a certain structure is expected (for example, JSON data with a specific format).

In this context, we speak of 'zero-shot prompting' (when no example is provided) and 'few-shot prompting' (when examples are included in the prompt)."

#### 5. Constraints

Some constraints can be added:
- "Only output a table",
- "Generate max. 500 tokens",
- "If you don't know the answer, do not make anything up, just say you don't know."

**Example 1**

![prompt_engineering_example_nl](/notebooks/img/prompt_engineering_example_nl.png)

**Example 2** 

You are an AI assistant that creates personalized book recommendations based on user preferences.

User: Man, 46 years old, enjoys science fiction and non-fiction, and reads in Dutch and English.
Favorite books: Predictably Irrational by Dan Ariely, Do Androids Dream of Electric Sheep by Philip K. Dick, and A Random Walk Down Wall Street by Burton Malkiel.

Can you come up with five recommendations for this user?

Example output: The Martian by Andy Weir, because you enjoy science fiction.

Make sure the recommendations align with the genres the user likes and are similar to his favorite books. Do not recommend books from the example. Be friendly and make the user eager to start reading the books you suggest.

**Exercise**
In the example above, identify the prompt building blocks / prompt techniques that have been used.

### Role channels

Most chat-style APIs expose three channels:

| Channel	    | Typical Use                                                                           |
|:----------|:--------------------------------------------------------------------------------------|
| system	    | High-level persona & non-negotiable rules ("You are a helpful financial assistant."). |
| user	        | The actual request or data, such as a user's message in a chat.                       |
| assistant	    | The model's response. In multi-turn conversations, the assistant role can be used     |
|               | to track the conversation history.                                                    | 


The following example demonstrates how to implement a customer service chatbot using role channels. Role channels provide a structured way for the model to maintain context and generate contextually appropriate responses throughout the conversation.

In [1]:
from groq import Groq

client = Groq()

system_prompt = """
You are a helpful IT support chatbot for 'Tech Solutions'.
Your role is to assist employees with common IT issues, provide guidance on using company software, and help troubleshoot basic technical problems.
Respond clearly and patiently. If an issue is complex, explain that you will create a support ticket for a human technician.
Keep responses brief and ask a maximum of one question at a time.
"""

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": system_prompt,
        },
        {
            "role": "user",
            "content": "My monitor isn't turning on.",
        },
        {
            "role": "assistant",
            "content": "Let's try to troubleshoot. Is the monitor properly plugged into a power source?",
        },
        {
            "role": "user",
            "content": "Yes, it's plugged in."
        }
    ],
    model="llama-3.3-70b-versatile",
)

print(chat_completion.choices[0].message.content)


Next, I'll check the connection to the computer. Is the monitor cable (VGA, HDMI, or DisplayPort) securely connected to both the monitor and the computer?


## Prompt patterns

- in context learning: zero shot inference, ref. les2_2_prompt_engineering_openai.ipynb, "3. examples"
- in context learning: one or few shot inference, , ref. les2_2_prompt_engineering_openai.ipynb, "3. examples"
- chain of thought reasoning, ref. les2_3_prompt_engineering_llama.ipynb, "Chain of thought prompting"

### Zero shot inference

Zero shot provides instructions without examples, relying on the model's existing knowledge.

The model leans on the general-purpose knowledge it absorbed during pre-training to infer the right output. You provide instructions but no examples, allowing the model to apply its existing understanding to the task.

## Prompt caching

(Sustainability related?)

Prompt caching is a technique used for optimizing both performance and cost.

The idea behind it, is: if you send a large prompt (for example, a long system message, context, or background document), you shouldn’t have to pay for it repeatedly or retransmit it fully on every API call.

How the concept of prompt caching is implemented in groq:

Model prompts often contain repetitive content, such as system prompts and tool definitions. Prompt caching automatically reuses computation from recent requests when they share a common prefix, delivering significant cost savings and improved response times while maintaining data privacy through volatile-only storage that expires automatically.

Prompt caching works automatically on all your API requests with no code changes required and no additional fees.

How It Works
Prefix Matching: When you send a request, the system examines and identifies matching prefixes from recently processed requests stored temporarily in volatile memory. Prefixes can include system prompts, tool definitions, few-shot examples, and more.

Cache Hit: If a matching prefix is found, cached computation is reused, dramatically reducing latency and token costs by 50% for cached portions.

Cache Miss: If no match exists, your prompt is processed normally, with the prefix temporarily cached for potential future matches.

Automatic Expiration: All cached data automatically expires within a few hours, which helps ensure privacy while maintaining the benefits.