# Part 3: Zero-Shot, Few-Shot, and Chain-of-Thought Prompting

Large language models (LLMs) can often perform tasks directly from instructions. This ability to follow a new instruction without additional examples is known as **zero-shot prompting**.  
However, as tasks become more nuanced or reasoning-intensive, models benefit from additional context or demonstrations provided directly in the prompt.  
This leads to more advanced techniques: **few-shot prompting** and **chain-of-thought prompting**.


## 1. Zero-Shot Prompting

Zero-shot prompting relies purely on the model’s pretraining knowledge and the clarity of your instruction.  
You describe what you want, and the model attempts to infer the task without examples.

### Example: Classifying a San Antonio Community Message

**Prompt:**
```

Classify the following message as one of: Public Works, Community Events, or Public Safety.

Message: There was flooding near the San Antonio River Walk last night, and debris is blocking part of the trail.

```

**Model Output:**
```

Public Works

```

The model correctly identifies the topic without needing prior examples.  
Zero-shot prompting is ideal for straightforward, well-defined tasks where the model can rely on general world knowledge.

### Limitations
Zero-shot prompting becomes less reliable when:
- The task has an unusual format or label space.
- The instructions are ambiguous.
- The problem requires reasoning across multiple facts or steps.

In [13]:
from openai import OpenAI
from utils import accuracy, ner_accuracy

# Point this at your vLLM (OpenAI‑compatible) endpoint
client = OpenAI(
    base_url="http://10.246.100.142:8000/v1",
    api_key="token-abc123"
)

# Small helper for convenience
def call_model(messages, model="meta-llama/Llama-3.1-70B-Instruct", max_tokens=512, temperature=0.0):
    resp = client.chat.completions.create(
        model=model,
        max_tokens=max_tokens,
        temperature=temperature,
        messages=messages
    )
    return resp.choices[0].message.content

In [3]:
messages = [
    {"role": "system", "content": "You are a kind, encouraging assistant who keeps answers brief. Classify the following message as one of: Public Works, Community Events, or Public Safety."},
    {"role": "user", "content": "There was flooding near the San Antonio River Walk last night, and debris is blocking part of the trail."}

]

print(call_model(messages))


I'd classify that message as Public Works.


## 2. Few-Shot Prompting

Few-shot prompting teaches a model by including a few examples of the desired behavior directly in the prompt.  
Each example contains both an input and an output, allowing the model to learn the intended pattern through **in-context learning**.

### Example: Classifying Community Messages with Examples

Suppose the City of San Antonio wants to automatically route resident messages to the appropriate department.

**Prompt:**
```

Message: The trash pickup on my street was missed this week.
Category: Public Works

Message: Where can I find information about this year’s Fiesta events?
Category: Community Events

Message: The streetlight on my block has been flickering for several nights.
Category:

```

**Model Output:**
```

Public Works

```

By including a few examples, the model infers how to perform the classification task and outputs the correct category.

Research (Brown et al., 2020; Min et al., 2022) shows that:
- The format and consistency of examples matter more than the exact content.
- Even a small number of demonstrations (e.g., three to five) can greatly improve results.
- Few-shot prompting works best for structured or repetitive tasks.

### Limitations
Few-shot prompting may still fail on tasks requiring deeper reasoning or multi-step analysis.  
In those cases, we can use **chain-of-thought prompting** to guide the model’s reasoning process.

In [3]:
messages = [
    {"role": "system", "content": "You are a kind, encouraging assistant who keeps answers brief. Classify the following message as one of: Public Works, Community Events, or Public Safety."},
    {"role": "user", "content": "The trash pickup on my street was missed this week."},
    {"role": "assistant", "content": "Public Works"},
    {"role": "user", "content": "Where can I find information about this year’s Fiesta events?"},
    {"role": "assistant", "content": "Community Events"},
    {"role": "user", "content": "There was flooding near the San Antonio River Walk last night, and debris is blocking part of the trail."}

]

print(call_model(messages))


Public Works


## 3. Chain-of-Thought Prompting

Chain-of-thought (CoT) prompting (Wei et al., 2022) asks the model to think through the problem step by step before answering.
It works well for planning, decision making, and contextual reasoning where multiple clues must be combined.

### Example: Classifying a Civic Report for Public Works

You are routing resident reports to the right city team. Ask the model to reason first, then give the label on a new line.

**Prompt:**

```
You are a kind, encouraging assistant who keeps answers brief.
Classify the following message as one of: Public Works, Community Events, or Public Safety.
Reason step-by-step before returning the answer on a new line.

Message: "There was flooding near the San Antonio River Walk last night, and debris is blocking part of the trail."
A:
```

**Model Output (with chain-of-thought reasoning):**

```
The report concerns flooding impacts on city infrastructure (trail) and debris removal.
This aligns with maintenance/repair and right-of-way cleanup handled by public works.
No immediate criminal activity or police/fire response indicated; not an event announcement.

Public Works
```

The model connects the clues (infrastructure, debris removal, non-emergency) and then gives the final label on a separate line. This mirrors how a dispatcher or 311 triage system reasons before routing.


In [69]:
messages = [
    {"role": "system", "content": "You are a kind, encouraging assistant who keeps answers brief. Classify the following message as one of: Public Works, Community Events, or Public Safety. Reason step-by-step before returning the answer on a new line."},
    {"role": "user", "content": "There was robbery near the San Antonio River Walk last night."}

]

print(call_model(messages))

To classify the message, let's break it down:

1. The message mentions a specific location, the San Antonio River Walk, but this doesn't necessarily point to a specific category.
2. The message reports an incident, which suggests it might be related to safety or a community issue.
3. The incident is a robbery, which implies a crime occurred.

Based on these steps, the message is most closely related to safety and crime.

 
Public Safety



## 4. Summary

| Technique | Description | Best For | Example Cue |
|------------|--------------|-----------|--------------|
| **Zero-Shot Prompting** | Uses no examples to teach the model a task or format | Classification, writing style, structured output | “Message: ... Category: ...” with several examples |
| **Few-Shot Prompting** | Uses examples to teach the model a task or format | Classification, writing style, structured output | “Message: ... Category: ...” with several examples |
| **Chain-of-Thought Prompting** | Encourages explicit reasoning before answering | Planning, decision-making, multi-factor analysis | “Let’s think step by step.” |

# Exercise 1: Classifying Public Opinions on Project Marvel

## Background
**Project Marvel** is a major redevelopment proposal in San Antonio centered around a new Spurs arena and entertainment district downtown. The plan involves both public and private funding and has sparked intense community discussion. Supporters emphasize economic growth, tourism, and long-term city identity. Critics highlight cost, fairness, and the city’s past experiences with similar promises.

To study how large language models understand local political sentiment, we’ll use real comments collected from **Reddit (r/SanAntonio)** about Project Marvel. Each comment expresses a stance either **for (pro)** or **against** the project.


## Objective
Your task is to design a prompt that most accurately classifies each comment as **pro** or **against** Project Marvel.

You’ll start simple and progressively improve your results using:
1. **Prompt detail**: refine how the task is described.
2. **Few-shot prompting**: add a few examples to guide the model.
3. **Chain-of-thought reasoning**: let the model reason privately before deciding.

**TASK**: Experiment to get the highest accuracy you can acheive!

**IMPORTANT**: You must ensure that the word either 1) appears alone with nothing else; or 2) appears alone an a new line at the very end of the response.

In [37]:
import json
import pprint
#from sklearn.metrics import f1_score

with open('data/project_marvel.json') as iFile:
    data = json.load(iFile)
pprint.pprint(data[0:5])

[{'comment': 'My question is when have SSE done ANYTHING in the past to screw '
             'this city? Why all of this distrust for a team that has '
             'ingrained itself into the identity of this community?',
  'label': 'pro'},
 {'comment': 'The opposition to these props is not coming from blind distrust. '
             'It could be argued that the PR campaign paid for by SSE is '
             'dishonest for a few reasons.  Big arenas / stadiums like this '
             'never generate meaningful revitalization or net gain in economic '
             'activity, yet that’s one of the main selling points... ‘Good '
             'jobs’ is a dubious claim. What exactly are the net new jobs from '
             'this project? The Spurs and SSE are already here, no new jobs '
             'there. So, likely just the temporary construction jobs.',
  'label': 'against'},
 {'comment': 'The property tax revenue can’t go up - show me how that can '
             'happen, we are committi

In [55]:
# EDIT THIS CELL WITH NEW PROMPTS
messages_cla = [
    {"role": "system", "content": "Only return either pro or against."}
]

In [56]:
# NOTHING TO CHANGE HERE
preds_cla = []
true_cla = []
for example in data:
    true_cla.append(example['label'])
    response = call_model(messages_cla+[{"role": "user", "content": example['comment']}], temperature=0.0, max_tokens=1000)
    preds_cla.append(response)

acc = accuracy(true_cla, preds_cla) # This cleans the preds before calcluating the accuracy
print(f"Accuracy: {acc}")

Accuracy: 0.625


In [57]:
print("Response 1:")
print(data[0])
print()
print(preds_cla[0])
print()
print("Response 2:")
print(data[1])
print()
print(preds_cla[1])
print()
print("Response 3:")
print(data[2])
print()
print(preds_cla[2])

Response 1:
{'label': 'pro', 'comment': 'My question is when have SSE done ANYTHING in the past to screw this city? Why all of this distrust for a team that has ingrained itself into the identity of this community?'}

Against

Response 2:
{'label': 'against', 'comment': 'The opposition to these props is not coming from blind distrust. It could be argued that the PR campaign paid for by SSE is dishonest for a few reasons.  Big arenas / stadiums like this never generate meaningful revitalization or net gain in economic activity, yet that’s one of the main selling points... ‘Good jobs’ is a dubious claim. What exactly are the net new jobs from this project? The Spurs and SSE are already here, no new jobs there. So, likely just the temporary construction jobs.'}

Against

Response 3:
{'label': 'against', 'comment': 'The property tax revenue can’t go up - show me how that can happen, we are committing the TIRZ to the arena indefinitely.  We funded that in 2017, we are giving the revenue gai

# Exercise 2: Named Entity Recognition (NER) for San Antonio Restaurant Names

### Overview

In this exercise, you’ll build prompts to extract **restaurant names** from short sentences gathered from **r/SanAntonio**. The goal is to identify restaurant entities accurately and consistently using prompt design.

### Task

Given a sentence, return a list of spans for **restaurant names** (proper names of eateries, food trucks, cafes, bakeries, bars that serve food). Ignore cuisine types, general food words, and locations unless they are part of the restaurant’s name.

### Your Goal

Maximize extraction quality (precision and recall) on held-out examples by improving your prompts through:

1. **Prompt details** (clear instructions, definitions, counter-examples)
2. **Few-shot prompting** (a handful of labeled examples)
3. **Chain-of-thought reasoning** (brief reasoning steps that justify which spans are names vs. not)


In [6]:
import json
import pprint

with open('data/restaurants.json') as iFile:
    ner_data = json.load(iFile)
pprint.pprint(ner_data[0:5])

[{'entities': [{'label': 'RESTAURANT', 'name': 'comfort cafe'}],
  'text': 'I love comfort cafe off Starcrest in that little shopping center '
          'area I forgot the name of the center.'},
 {'entities': [{'label': 'RESTAURANT', 'name': 'Comfort Cafe'}],
  'text': 'Comfort Cafe was really good.'},
 {'entities': [{'label': 'RESTAURANT', 'name': 'Mi Tierra'}],
  'text': 'Mi Tierra for Breakfast? A lot of people say that place sucks '
          'altogether.'},
 {'entities': [{'label': 'RESTAURANT', 'name': 'Garcia’s'}],
  'text': 'ETA: Garcia’s for their brisket taco.'},
 {'entities': [{'label': 'RESTAURANT', 'name': 'Blanco Cafe'}],
  'text': 'Blanco Cafe (the one on Blanco) has great enchiladas too.'}]


**IMPORTANT**: Your prompt must have all entities listed as a list as the last line in the response with **nothing** else on the line. Multiple entites must be seperated by a comma. 

Give the following input:

``I love the food at McDonalds and Comfort Cafe.''

Here are valid and invalid outputs

**VALID OUTPUTS**:

```
McDonalds,Comfort Cafe
```

```
The author is talking about food at two restaurants:
McDonalds,Comfort Cafe
```

**INVALID OUTPUTS**:

```
The answer is: McDonalds,Comfort Cafe
```

```
The author is talking about food at two restaurants: McDonalds,Comfort Cafe
```


In [63]:
# EDIT THIS CELL WITH NEW PROMPTS
messages = [
    {"role": "system", "content": "Extract all names comma-seperated list return nothing else."}
]

In [64]:
# NOTHING TO CHANGE HERE
preds = []
true = []
for example in ner_data:
    true.append([label['name'].lower() for label in example['entities']])
    response = call_model(messages+[{"role": "user", "content": example['text']}], temperature=0.0, max_tokens=1000)
    preds.append(response)
print(preds[-1])

acc = ner_accuracy(true, preds) # This cleans the preds before calcluating the accuracy
print(f"Accuracy: {acc}")

Royal Inn
Accuracy: 0.3111111111111111


In [66]:
print("Response 1:")
print(ner_data[0])
print()
print(preds[0])
print()
print("Response 2:")
print(ner_data[1])
print()
print(preds[1])
print()
print("Response 3:")
print(ner_data[2])
print()
print(preds[2])

Response 1:
{'text': 'I love comfort cafe off Starcrest in that little shopping center area I forgot the name of the center.', 'entities': [{'name': 'comfort cafe', 'label': 'RESTAURANT'}]}

No names were mentioned.

Response 2:
{'text': 'Comfort Cafe was really good.', 'entities': [{'name': 'Comfort Cafe', 'label': 'RESTAURANT'}]}

Comfort Cafe

Response 3:
{'text': 'Mi Tierra for Breakfast? A lot of people say that place sucks altogether.', 'entities': [{'name': 'Mi Tierra', 'label': 'RESTAURANT'}]}

Mi Tierra
