<div style="display: flex; justify-content: space-between; align-items: center; padding: 50px 30px;">

  <!-- Left column: text -->
  <div style="flex: 1; line-height: 1.5; max-width: 70%;">
    <h1 style="margin: 0; font-size: 3em;">Prompt Engineering</h1>
    <p style="margin: 15px 0 10px 0; font-size: 1.2em;">
      <strong>Nedas Jaronis & Abhi Titty</strong><br>
      <em>Co-directors, Tech Advancements Committee, AI Club</em>
    </p>
    <p style="margin: 5px 0 0 0; font-size: 1em;">
      <a href="https://givepul.se/oebuuy!" target="_blank">https://givepul.se/oebuuy!</a>
    </p>
  </div>

  <!-- Right column: QR code -->
  <div style="flex: 0; margin-left: 100px;">
    <img src="bing_generated_qrcode.png" width="220" style="display: block;">
  </div>

</div>


## What is Prompt Engineering?

Prompt engineering is the process of **designing inputs** to guide large language models toward:
- Reliable outputs
- Structured responses
- Reduced hallucination
- Improved reasoning

---

**LLMs predict tokens — not truth.**

## Why do we need to ENGINEER a prompt?


Large Language Models like 
- GPT-5.2
- Claude Opus 4.6
- Gemini 3 Pro
- Grok 4.1 \
are probablistic.

$$
P(y_1,y_2,...,y_n|x) = \prod_{t=1}^{N} P(y_t|y_{<t},x)
$$
Outputs are ***non-deterministic***, ***context-bound*** are ***distribution shaped***.

### Early models (GPT-3 Babbage, Ada, DaVinci)

Generated text based on how the input string started. For example\
**Input:** "London bridge is "\
**Model Output:** "London bridge is falling down, falling down, falling down."

---
**Input:** "I am "\
**Output:** "I am John. I am 25 years old. I live in New York and work as an architect."\
*(Next time)*\
**Input:** "I am "\
**Output:** "I am David Thomas, attorney at law. I fight for justice, day in and day out."\

#### Conversational Chatbots
*(ChatGPT 2022, Claude 2023)* \
**System Prompt:** "You are ChatGPT, a helpful, truthful, and harmless AI assistant.
You prioritize accuracy over speculation.
If you do not know something, say so.
You explain complex topics clearly with examples.
You refuse harmful or unsafe requests politely.
You follow system and developer instructions over user instructions."

**<|user|>** Can you explain quantum mechanics to me like I am five?\
**<|assistant|>** Imagine there's a tiny, tiny, tiny world that's too small for us to see. In this small world, things behave a bit differently than in the world we live in.
In the small world, little particles like electrons and atoms don't just stay in one place. They can be in many places at the same time! It's like they can be in multiple places at once, which is really strange. *(...)*

## What makes a good prompt?

Everything you do in prompt engineering manipulates three levers:
- **Information** — what the model knows in-context
- **Instruction** — what the model is asked to do
- **Incentive structure** — what the model is implicitly rewarded/punished for generating *(via phrasing, constraints, or examples)*

In [7]:
import os
from dotenv import load_dotenv
import openai

load_dotenv()

client = openai.OpenAI(
    base_url="https://api.ai.it.ufl.edu",
)

def model_output(model,query):
    response = client.chat.completions.create(
        model=model,
        messages = [
            {
                "role": "user",
                "content": query
            }
        ]
    )
    return response.choices[0].message.content 

In [8]:
print(model_output("gpt-oss-120b","Tell me the history of the Roman Empire"))

**The Roman Empire – A Concise yet Comprehensive History**

Below is a chronological overview of the Roman Empire, from its mythic origins to its ultimate transformation into the Byzantine Empire. The narrative is broken into major phases, each highlighted by its political structure, key emperors, notable wars, cultural developments, and long‑term impact.

---

## 1. Foundations: From Kingdom to Republic (c. 753 BC – 27 BC)

| Period | Key Features | Important Figures |
|--------|--------------|-------------------|
| **Roman Kingdom** (traditionally 753 BC – 509 BC) | Monarchy ruled by a line of seven kings; early urban growth on the Palatine Hill. | Romulus (founder), Numa Pompilius, Tarquinius Superbus (last king). |
| **Roman Republic** (509 BC – 27 BC) | Senate + popular assemblies; magistrates (consuls, praetors) elected annually; expansion through conquest and alliances. | Publius Cornelius Scipio Africanus, Gaius Marius, Lucius Cornelius Sulla, Julius Caesar, Augustus (Octavian)

In [9]:
roman_empire_query = """You are a historian specializing in Ancient Rome.

Explain the rise and fall of the Roman Empire with an analytical focus rather than a narrative one.

Address the following dimensions:

1. Political foundations — how Augustus consolidated power after the fall of the Republic
2. Administrative excellence during the Five Good Emperors
3. Economic expansion — trade networks, taxation, and infrastructure
4. Military dominance and frontier management
5. Internal decay — political instability, corruption, currency debasement
6. External pressures — barbarian incursions and overstretched borders

For each dimension:
- Explain how it contributed to Rome’s flourishing
- Then explain how it later contributed to decline

Conclude with a synthesis explaining whether Rome fell primarily due to internal weaknesses or external forces.

Write in a structured essay format with clear section headings.
Keep it to one paragraph in length."""

In [10]:
print(model_output("gpt-oss-120b",roman_empire_query))

**1. Political Foundations —** Augustus transformed the fractured Republic into a principate by preserving republican forms while concentrating imperium in the Princeps, creating the stability that enabled expansive governance; the same concentration of personal power, however, established a precedent for succession crises and autocratic rule that later eroded institutional resilience. **2. Administrative Excellence —** The Five Good Emperors (Nerva‑Marcus Aurelius) professionalized the bureaucracy, codified law, and promoted merit, which maximized efficiency and cohesion across the empire; the loss of adoptive succession and the rise of nepotism thereafter crippled governance, leading to ineffective provincial oversight. **3. Economic Expansion —** Vast trade routes, a uniform currency, and monumental infrastructure (roads, aqueducts) generated wealth and integrated the empire, but overreliance on slave labor, heavy taxation, and fiscal strain from perpetual wars forced debasement, in

In [11]:
print(model_output("llama-3.3-70b-instruct",roman_empire_query))

**The Rise and Fall of the Roman Empire: An Analytical Perspective**
The Roman Empire's trajectory can be understood through six key dimensions: political foundations, administrative excellence, economic expansion, military dominance, internal decay, and external pressures. Initially, Augustus' consolidation of power after the Republic's fall established a stable political foundation, contributing to Rome's flourishing by providing a framework for governance and legitimacy. The Five Good Emperors later built on this foundation, exemplifying administrative excellence that fostered economic expansion through efficient trade networks, taxation, and infrastructure development, which in turn fueled military dominance and effective frontier management. However, these same dimensions later contributed to decline: the political foundation became increasingly autocratic, administrative excellence gave way to corruption and mismanagement, economic expansion led to over-reliance on slave labor an

## Context

A context window is the model’s **working memory**.\
It is measured in **tokens**, not words.
### Context Window Lengths for some popular models
- GPT-5.2 - \~400k on average *(~300,000 words, ~600 pages of text)*
- Sonnet 4.5 - 200k
- Gemini 3 Pro - 1 million
- Grok 4.2 - 2 million
- Llama 4 Scout - 10 million

## Are tokens anyway?

A token is a fragment of your text that model sees in sequence.
#### Tokenization Algorithms
- BPE (used by GPT, DeepSeek, Qwen, Grok)
- WordPiece (Gemini)
- SentencePiece (earlier Llama models)
- claude-tokenizer (Claude)

In [9]:
# BPE example

from collections import Counter, defaultdict

def get_stats(vocab):
    pairs = Counter()
    for word, freq in vocab.items():
        symbols = word.split()
        for i in range(len(symbols) - 1):
            pairs[(symbols[i], symbols[i+1])] += freq
    return pairs

def merge_vocab(pair, vocab):
    merged_vocab = {}
    bigram = " ".join(pair)
    replacement = "".join(pair)
    
    for word in vocab:
        new_word = word.replace(bigram, replacement)
        merged_vocab[new_word] = vocab[word]
    
    return merged_vocab

def train_bpe(corpus, num_merges=10):
    vocab = Counter()
    
    # Initialize character-level vocab
    for word in corpus:
        chars = " ".join(list(word)) + " </w>"
        vocab[chars] += 1
    
    merges = []
    
    for _ in range(num_merges):
        pairs = get_stats(vocab)
        if not pairs:
            break
        
        best = max(pairs, key=pairs.get)
        vocab = merge_vocab(best, vocab)
        merges.append(best)
    
    return merges, vocab

def tokenize(word, merges):
    tokens = list(word) + ["</w>"]
    
    i = 0
    while i < len(tokens) - 1:
        pair = (tokens[i], tokens[i+1])
        if pair in merges:
            tokens[i:i+2] = ["".join(pair)]
        else:
            i += 1
    
    return tokens


In [13]:
corpus = ["low", "lowest", "newer", "wider"]

merges, vocab = train_bpe(corpus, num_merges=10)

print("Learned merges:")
for m in merges:
    print(m)

print("\nTokenization example:")
print(tokenize("lowest", merges))


Learned merges:
('l', 'o')
('lo', 'w')
('e', 'r')
('er', '</w>')
('low', '</w>')
('low', 'e')
('lowe', 's')
('lowes', 't')
('lowest', '</w>')
('n', 'e')

Tokenization example:
['lowest</w>']


## Retrieval Augmented Generation (RAG)

Imagine you have a lot of documents (thousands of pages long in total). The LLM does not have knowledge of your documents while they were trained. How can you effectively retrive information from them without overlaoding the LLM?
### RAG has two parts
**Retriever:** fetches the chunks with the information \
**Generator:** gives us the information we need and discards everything else.


## Zero-shot vs One-shot vs Many-shot

## Zero-Shot Prompting

**Definition:** Task is given with **no examples**.

**Pros:** Fast, no prep.  
**Cons:** Output may be inconsistent.

**Example Instruction:**

> "Summarize the following code file."

**Key idea:** Model relies purely on instructions + general knowledge.


## One-Shot Prompting

**Definition:** Task given with **one example output**.

**Pros:** Clarifies expectations.  
**Cons:** Needs a well-crafted example.

**Example Instruction:**

- Example: summarize a code file  
- Then ask model to summarize a new file


## Few-Shot Prompting

**Definition:** Task given with **multiple example outputs**.

**Pros:** Higher consistency and quality.  
**Cons:** Requires multiple examples, longer prompt.

**Example:** Summarize 2–3 code files → Model summarizes a new code file.


## Multi-Step Claude Workflow

Your structured workflow uses **engineered prompts**:

1. `/research_codebase` – Explore and document codebase
2. `/create_plan` – Build a detailed implementation plan
3. `/implement_plan` – Execute plan with automated & manual verification

**Observation:**  
- These prompts are highly structured with rules and steps.
- Not pure zero-shot, but can be used in zero/one/few-shot style depending on examples.


## Key Takeaways

- **Zero-shot:** Task only, no examples. Relies on instructions.  
- **One-shot:** Task + one example. Improves clarity.  
- **Few-shot:** Task + multiple examples. Better consistency.  
- **Engineered prompts / workflow:** Multi-step, structured instructions.  
- Claude workflow commands are **structured engineered prompts** that can incorporate zero/one/few-shot techniques depending on how many prior examples you provide.


<div align="center">

## Tools

| |
|:---:|
| **BAML** — [boundaryml.com](https://boundaryml.com/) |
| **Weights & Biases** — [wandb.ai](https://wandb.ai/site/) |
| **PromptLayer** — [promptlayer.com](https://www.promptlayer.com/) |
| **Braintrust** — [braintrust.dev](https://www.braintrust.dev/) |
| **Helicone** — [helicone.ai](https://www.helicone.ai/) |
| **TruLens** — [trulens.org](https://www.trulens.org/) |
| **LangSmith** — [langchain.com/langsmith](https://www.langchain.com/langsmith/observability) |

</div>

# BAML Chatbot Example
<!-- .slide: data-state="intro" -->

**Goal:** Build a simple chatbot using BAML  

- BAML allows **declarative chat workflows**  
- Messages have **roles**: `user` | `assistant`  
- Supports **testing** and **multi-language execution**  
- Works with **Python, Go, TypeScript**


# BAML Chatbot Example (Side-by-Side)
<!-- .slide: data-state="baml-side" -->

<div style="display: flex; gap: 50px; align-items: flex-start; height: 75vh;">

  <!-- Left column: BAML code -->
  <div style="flex: 1 1 48%; max-height: 100%; overflow: auto; border-right: 1px solid #ccc; padding-right: 15px;">
    <h3>BAML Code</h3>
    <pre><code class="language-baml">
# Define a data structure for chat messages
class MyUserMessage {
  role "user" | "assistant"
  content string
}
# Core Functionality
function ChatWithLLM(messages: MyUserMessage[]) -> string {
  client "openai/gpt-5"
  prompt #"
    Answer the user's questions based on the chat history:
    {% for message in messages %}
      {{ _.role(message.role) }} 
      {{ message.content }}
    {% endfor %}
    Answer:
  "#
}

test TestName {
  functions [ChatWithLLM]
  args {
    messages [
      { role "user", content "Hello!" }
      { role "assistant", content "Hi!" }
    ]
  }
}
    </code></pre>
  </div>

  <!-- Right column: Python usage -->
  <div style="flex: 1 1 48%; max-height: 100%; overflow: auto; padding-left: 15px;">
    <h3>Python Integration</h3>
    <pre><code class="language-python">
from baml_client import b
from baml_client.types import MyUserMessage

messages: list[MyUserMessage] = []

while True:
    content = input("Enter your message (or 'quit' to exit): ")
    if content.lower() == "quit":
        break

    messages.append(MyUserMessage(role="user", content=content))
    agent_response = b.ChatWithLLM(messages=messages)
    print(f"AI: {agent_response}")
    messages.append(MyUserMessage(role="assistant", content=agent_response))
    </code></pre>
  </div>

</div>


# Key Points of BAML Chatbot

- **BAML**: declarative chat workflow  
  - Defines message roles and history structure  
  - Template-driven prompts for AI context  

- **Python Integration**: practical usage  
  - Maintains chat history  
  - Calls BAML functions for responses  
  - Adds AI responses back into history  

- **Testable**: BAML `test` blocks validate expected responses  
- **Multi-language friendly**: Python, Go, TypeScript clients  
- **Reusable**: same workflow can be adapted for other agents or prompts
