<div style="display: flex; justify-content: space-between; align-items: center; padding: 50px 30px;">

  <!-- Left column: text -->
  <div style="flex: 1; line-height: 1.5; max-width: 70%;">
    <h1 style="margin: 0; font-size: 3em;">Prompt Engineering</h1>
    <p style="margin: 15px 0 10px 0; font-size: 1.2em;">
      <strong>Nedas Jaronis & Abhi Titty</strong><br>
      <em>Co-directors, Tech Advancements Committee, AI Club</em>
    </p>
    <p style="margin: 5px 0 0 0; font-size: 1em;">
      <a href="https://givepul.se/oebuuy!" target="_blank">https://givepul.se/oebuuy!</a>
    </p>
  </div>

  <!-- Right column: QR code -->
  <div style="flex: 0; margin-left: 100px;">
    <img src="bing_generated_qrcode.png" width="220" style="display: block;">
  </div>

</div>


## What is Prompt Engineering?

Prompt engineering is the process of **designing inputs** to guide large language models toward:
- Reliable outputs
- Structured responses
- Reduced hallucination
- Improved reasoning

---

**LLMs predict tokens ‚Äî not truth.**

## Why do we need to ENGINEER a prompt?


Large Language Models like 
- GPT-5.2
- Claude Opus 4.6
- Gemini 3 Pro
- Grok 4.1 \
are probablistic.

$$
P(y_1,y_2,...,y_n|x) = \prod_{t=1}^{N} P(y_t|y_{<t},x)
$$
Outputs are ***non-deterministic***, ***context-bound*** are ***distribution shaped***.

### Early models (GPT-3 Babbage, Ada, DaVinci)

Generated text based on how the input string started. For example\
**Input:** "London bridge is "\
**Model Output:** "London bridge is falling down, falling down, falling down."

---
**Input:** "I am "\
**Output:** "I am John. I am 25 years old. I live in New York and work as an architect."\
*(Next time)*\
**Input:** "I am "\
**Output:** "I am David Thomas, attorney at law. I fight for justice, day in and day out."\

#### Conversational Chatbots
*(ChatGPT 2022, Claude 2023)* \
**System Prompt:** "You are ChatGPT, a helpful, truthful, and harmless AI assistant.
You prioritize accuracy over speculation.
If you do not know something, say so.
You explain complex topics clearly with examples.
You refuse harmful or unsafe requests politely.
You follow system and developer instructions over user instructions."

**<|user|>** Can you explain quantum mechanics to me like I am five?\
**<|assistant|>** Imagine there's a tiny, tiny, tiny world that's too small for us to see. In this small world, things behave a bit differently than in the world we live in.
In the small world, little particles like electrons and atoms don't just stay in one place. They can be in many places at the same time! It's like they can be in multiple places at once, which is really strange. *(...)*

## What makes a good prompt?

Everything you do in prompt engineering manipulates three levers:
- **Information** ‚Äî what the model knows in-context
- **Instruction** ‚Äî what the model is asked to do
- **Incentive structure** ‚Äî what the model is implicitly rewarded/punished for generating *(via phrasing, constraints, or examples)*

In [1]:
import os
from dotenv import load_dotenv
import openai

load_dotenv()

client = openai.OpenAI(
    base_url="https://api.ai.it.ufl.edu",
)

def model_output(model,query):
    response = client.chat.completions.create(
        model=model,
        messages = [
            {
                "role": "user",
                "content": query
            }
        ]
    )
    return response.choices[0].message.content 

In [2]:
print(model_output("gpt-oss-120b","Tell me about Marcus Aurelius"))

**Marcus Aurelius (121‚ÄØ‚Äì‚ÄØ180‚ÄØCE) ‚Äì Emperor, Soldier, Stoic Philosopher**

---

## 1. Quick Facts
| Item | Detail |
|------|--------|
| **Full name** | Marcus Aurelius Antoninus Augustus (born **Marcus Annius Verus**). |
| **Born** | 26‚ÄØApril‚ÄØ121, Rome (or possibly a nearby town in the province of Hispania Tarraconensis). |
| **Died** | 17‚ÄØMarch‚ÄØ180, Vindobona (modern Vienna) or Sirmium (modern Serbia) while on campaign. |
| **Reign** | 161‚ÄØ‚Äì‚ÄØ180‚ÄØCE (co‚Äëemperor with Lucius Verus 161‚Äë169; sole emperor 169‚Äë180). |
| **Dynasty** | Adoptive (Nerva‚ÄëAntonine) dynasty. |
| **Key works** | *Meditations* (Greek: **Œ§·Ω∞ Œµ·º∞œÇ ·ºëŒ±œÖœÑœåŒΩ**, ‚ÄúTo Himself‚Äù). |
| **Main titles** | Roman Emperor, Pontifex Maximus, Consul (multiple times). |
| **Family** | Father: Marcus Annius Verus (high‚Äëranking senator). Mother: Domitia Lucilla. Married **Faustina the Younger** (daughter of Antoninus Pius); children included **Commodus** (future emperor). |

---

## 2. Ea

In [3]:
roman_empire_query = """You are a historian specializing in Ancient Rome.

Explain the rise and fall of the Roman Empire with an analytical focus rather than a narrative one.

Address the following dimensions:

1. Political foundations ‚Äî how Augustus consolidated power after the fall of the Republic
2. Administrative excellence during the Five Good Emperors
3. Economic expansion ‚Äî trade networks, taxation, and infrastructure
4. Military dominance and frontier management
5. Internal decay ‚Äî political instability, corruption, currency debasement
6. External pressures ‚Äî barbarian incursions and overstretched borders

For each dimension:
- Explain how it contributed to Rome‚Äôs flourishing
- Then explain how it later contributed to decline

Conclude with a synthesis explaining whether Rome fell primarily due to internal weaknesses or external forces.

Write in a structured essay format with clear section headings.
Keep it to one paragraph in length."""

In [4]:
print(model_output("gpt-oss-120b",roman_empire_query))

**1. Political foundations** ‚Äì Augustus turned civil war into stability by concentrating imperium in the princeps office, instituting a veneer of republican forms while wielding supreme control of the military and finances, which allowed Rome to project authority and enact coherent policy; however, the concentration of power created a fragile succession system dependent on personal favor, so the later lack of clear mechanisms led to coups, civil wars, and the erosion of imperial legitimacy. **2. Administrative excellence** ‚Äì Under the Five Good Emperors (Nerva‚ÄìMarcus‚ÄØAurelius) a meritocratic bureaucracy, provincial councils, and codified legal reforms streamlined tax collection, justice, and public works, fostering civic order and economic integration; the later abandonment of these practices, coupled with nepotistic appointments and bureaucratic corruption, produced inefficiency, fiscal waste, and alienated provincial elites. **3. Economic expansion** ‚Äì Vast trade networks a

In [5]:
print(model_output("llama-3.3-70b-instruct",roman_empire_query))

**The Rise and Fall of the Roman Empire: An Analytical Perspective**
The Roman Empire's trajectory can be understood through six key dimensions: political foundations, administrative excellence, economic expansion, military dominance, internal decay, and external pressures. Initially, Augustus' consolidation of power after the Republic's fall established a stable political foundation, contributing to Rome's flourishing by providing a framework for governance and legitimacy. The Five Good Emperors later exemplified administrative excellence, fostering prosperity through effective governance and wise decision-making. Economic expansion, driven by extensive trade networks, taxation, and infrastructure development, further fueled growth. Military dominance and effective frontier management secured Rome's borders, allowing for peaceful expansion. However, these same dimensions later contributed to decline: political instability and corruption eroded the foundation, administrative excellence

## Context

A context window is the model‚Äôs **working memory**.\
It is measured in **tokens**, not words.
### Context Window Lengths for some popular models
- GPT-5.2 - \~400k on average *(~300,000 words, ~600 pages of text)*
- Sonnet 4.5 - 200k
- Gemini 3 Pro - 1 million
- Grok 4.2 - 2 million
- Llama 4 Scout - 10 million

## What are tokens anyway?

A token is a fragment of your text that model sees in sequence.
#### Tokenization Algorithms
- BPE (used by GPT, DeepSeek, Qwen, Grok)
- WordPiece (Gemini)
- SentencePiece (earlier Llama models)
- claude-tokenizer (Claude)

In [7]:
# BPE example

import tiktoken

enc = tiktoken.get_encoding("cl100k_base")
tokens = enc.encode(roman_empire_query)

# print(tokens)
print([enc.decode([t]) for t in tokens])

['You', ' are', ' a', ' historian', ' specializing', ' in', ' Ancient', ' Rome', '.\n\n', 'Ex', 'plain', ' the', ' rise', ' and', ' fall', ' of', ' the', ' Roman', ' Empire', ' with', ' an', ' analytical', ' focus', ' rather', ' than', ' a', ' narrative', ' one', '.\n\n', 'Address', ' the', ' following', ' dimensions', ':\n\n', '1', '.', ' Political', ' foundations', ' ‚Äî', ' how', ' August', 'us', ' consolidated', ' power', ' after', ' the', ' fall', ' of', ' the', ' Republic', '\n', '2', '.', ' Administrative', ' excellence', ' during', ' the', ' Five', ' Good', ' Em', 'per', 'ors', '\n', '3', '.', ' Economic', ' expansion', ' ‚Äî', ' trade', ' networks', ',', ' taxation', ',', ' and', ' infrastructure', '\n', '4', '.', ' Military', ' dominance', ' and', ' frontier', ' management', '\n', '5', '.', ' Internal', ' decay', ' ‚Äî', ' political', ' instability', ',', ' corruption', ',', ' currency', ' deb', 'as', 'ement', '\n', '6', '.', ' External', ' pressures', ' ‚Äî', ' barbar', 'ian

## Retrieval Augmented Generation (RAG)

Imagine you have a lot of documents (thousands of pages long in total). The LLM does not have knowledge of your documents while they were trained. How can you effectively retrive information from them without overlaoding the LLM?
### RAG has two parts
**Retriever:** fetches the chunks with the information \
**Generator:** gives us the information we need and discards everything else.


In [8]:
print(model_output("gpt-oss-120b","What is the latest news today?"))

I‚Äôm not able to pull in live, real‚Äëtime feeds, so I can‚Äôt give you the exact headlines that are being published this very minute on‚ÄØFebruary‚ÄØ15‚ÄØ2026.‚ÄØHowever, I can point you toward the best ways to get up‚Äëto‚Äëthe‚Äëminute news and give a quick overview of the major story‚Äëlines that have been shaping the headlines in the weeks leading up to today, based on the information available up to my knowledge cutoff.

---

## How to Get the‚ÄØ‚ÄúLatest‚Äù‚ÄØNews Right Now

| Platform | Why It‚Äôs Reliable | How to Use It |
|----------|-------------------|---------------|
| **Major International Wire Services** (Reuters, Associated Press, AFP) | Fast, fact‚Äëchecked, globally‚Äësourced | Visit their homepages or follow their Twitter/X handles: `@Reuters`, `@AP`, `@AFPnews`. |
| **National Newspapers with Strong Digital Ops** (The New York Times, The Washington Post, The Guardian, Le Monde, S√ºddeutsche Zeitung) | Deep reporting, editorial standards, newsletters | Subscribe to 

In [10]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

from duckduckgo_search import DDGS

def web_search(model, query):
    """
    Lightweight version: uses search snippets but processes them intelligently
    """
    
    # Search for articles
    print(f"üîç Searching for: {query}")
    with DDGS() as ddgs:
        search_results = list(ddgs.text(query, max_results=8))
    
    # Format context from snippets
    context = "Here are the latest news items found:\n\n"
    for i, result in enumerate(search_results, 1):
        context += f"[{i}] {result['title']}\n"
        context += f"Source: {result['href']}\n"
        context += f"Summary: {result['body']}\n\n"
    
    # Create comprehensive prompt
    prompt = f"""You are a news assistant summarizing the latest information about: "{query}"

{context}

Write a detailed news summary with:
- **Bold headlines** for major developments
- One concise explanatory sentence under each headline
- Proper citations [1], [2], etc.
- Organized by topic/theme
- Professional, journalistic tone

Provide your news summary:"""
    
    print("ü§ñ Generating summary...\n")
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    
    summary = response.choices[0].message.content
    
    # Append sources
    sources = "\n\n" + "="*60 + "\nüìö Sources:\n" + "="*60 + "\n"
    for i, result in enumerate(search_results, 1):
        sources += f"[{i}] {result['title']}\n    {result['href']}\n\n"
    
    return summary + sources

## Zero-shot vs One-shot vs Many-shot

## Zero-Shot Prompting

**Definition:** Task is given with **no examples**.

**Pros:** Fast, no prep.  
**Cons:** Output may be inconsistent.

**Example Instruction:**

> "Summarize the following code file."

**Key idea:** Model relies purely on instructions + general knowledge.


## One-Shot Prompting

**Definition:** Task given with **one example output**.

**Pros:** Clarifies expectations.  
**Cons:** Needs a well-crafted example.

**Example Instruction:**

- Example: summarize a code file  
- Then ask model to summarize a new file


## Few-Shot Prompting

**Definition:** Task given with **multiple example outputs**.

**Pros:** Higher consistency and quality.  
**Cons:** Requires multiple examples, longer prompt.

**Example:** Summarize 2‚Äì3 code files ‚Üí Model summarizes a new code file.


## How is Prompt Engineering used in the industry?

Prompts are
- Versioned
- Tested
- Evaluated
- Monitored
- Iterated

### Versioning

Classify tickets into categories.\

Example:\

> "I was charged twice" ‚Üí Billing\
"I can't log in" ‚Üí Account Access\
"The app crashes" ‚Üí Technical

>Ticket: *{ticket}* \
>Category: *{prediction}*

#### Keep track
Using git: `/prompts/support_classifier_v3.txt`

They log:
- Inputs
- Outputs
- Latency
- Failure rates

### AB Testing
**Example:** \
Prompt A: "Summarize this article." \
Prompt B: "Summarize this article in:

1. Key thesis
2. Supporting arguments
3. Implications
4. Limitations"


Feed both prompts 100 articles.
- Measure:
- Coverage
- Factual accuracy
- Structure compliance

## Prompt Optimization
Initial: Write a product description.\
Next time: Write a product description for budget-conscious college students.\
After that: Write a product description including:\

- Key features
- Price value
- Use cases\
Finally: Tone: Persuasive but not exaggerated. Avoid marketing clich√©s.

### This is increasingly automated
- LLM critic loops
- Prompt Mutation (rewrite the prompt a little differently)
- Evolutionary (Systems generate dozens of prompt variants ‚Üí keep top performers)\
Prompt space becomes a search problem.

### Metrics
- Accuracy
- Hallucination rate
- Format Compliance
- Latency
- Cost

## Multi-Step Claude Workflow

Your structured workflow uses **engineered prompts**:

1. `/research_codebase` ‚Äì Explore and document codebase
2. `/create_plan` ‚Äì Build a detailed implementation plan
3. `/implement_plan` ‚Äì Execute plan with automated & manual verification

**Observation:**  
- These prompts are highly structured with rules and steps.
- Not pure zero-shot, but can be used in zero/one/few-shot style depending on examples.


## Key Takeaways

- **Zero-shot:** Task only, no examples. Relies on instructions.  
- **One-shot:** Task + one example. Improves clarity.  
- **Few-shot:** Task + multiple examples. Better consistency.  
- **Engineered prompts / workflow:** Multi-step, structured instructions.  
- Claude workflow commands are **structured engineered prompts** that can incorporate zero/one/few-shot techniques depending on how many prior examples you provide.


<div align="center">

## Tools

| |
|:---:|
| **BAML** ‚Äî [boundaryml.com](https://boundaryml.com/) |
| **Weights & Biases** ‚Äî [wandb.ai](https://wandb.ai/site/) |
| **PromptLayer** ‚Äî [promptlayer.com](https://www.promptlayer.com/) |
| **Braintrust** ‚Äî [braintrust.dev](https://www.braintrust.dev/) |
| **Helicone** ‚Äî [helicone.ai](https://www.helicone.ai/) |
| **TruLens** ‚Äî [trulens.org](https://www.trulens.org/) |
| **LangSmith** ‚Äî [langchain.com/langsmith](https://www.langchain.com/langsmith/observability) |

</div>

# BAML Chatbot Example
<!-- .slide: data-state="intro" -->

**Goal:** Build a simple chatbot using BAML  

- BAML allows **declarative chat workflows**  
- Messages have **roles**: `user` | `assistant`  
- Supports **testing** and **multi-language execution**  
- Works with **Python, Go, TypeScript**


# BAML Chatbot Example (Side-by-Side)
<!-- .slide: data-state="baml-side" -->

<div style="display: flex; gap: 50px; align-items: flex-start; height: 75vh;">

  <!-- Left column: BAML code -->
  <div style="flex: 1 1 48%; max-height: 100%; overflow: auto; border-right: 1px solid #ccc; padding-right: 15px;">
    <h3>BAML Code</h3>
    <pre><code class="language-baml">
# Define a data structure for chat messages
class MyUserMessage {
  role "user" | "assistant"
  content string
}
# Core Functionality
function ChatWithLLM(messages: MyUserMessage[]) -> string {
  client "openai/gpt-5"
  prompt #"
    Answer the user's questions based on the chat history:
    {% for message in messages %}
      {{ _.role(message.role) }} 
      {{ message.content }}
    {% endfor %}
    Answer:
  "#
}

test TestName {
  functions [ChatWithLLM]
  args {
    messages [
      { role "user", content "Hello!" }
      { role "assistant", content "Hi!" }
    ]
  }
}
    </code></pre>
  </div>

  <!-- Right column: Python usage -->
  <div style="flex: 1 1 48%; max-height: 100%; overflow: auto; padding-left: 15px;">
    <h3>Python Integration</h3>
    <pre><code class="language-python">
from baml_client import b
from baml_client.types import MyUserMessage

messages: list[MyUserMessage] = []

while True:
    content = input("Enter your message (or 'quit' to exit): ")
    if content.lower() == "quit":
        break

    messages.append(MyUserMessage(role="user", content=content))
    agent_response = b.ChatWithLLM(messages=messages)
    print(f"AI: {agent_response}")
    messages.append(MyUserMessage(role="assistant", content=agent_response))
    </code></pre>
  </div>

</div>


# Key Points of BAML Chatbot

- **BAML**: declarative chat workflow  
  - Defines message roles and history structure  
  - Template-driven prompts for AI context  

- **Python Integration**: practical usage  
  - Maintains chat history  
  - Calls BAML functions for responses  
  - Adds AI responses back into history  

- **Testable**: BAML `test` blocks validate expected responses  
- **Multi-language friendly**: Python, Go, TypeScript clients  
- **Reusable**: same workflow can be adapted for other agents or prompts
