## Homework 2: Prompting Techniques

You will practice prompting techniques by crafting prompts to complete specific tasks. Each task's instructions are at the top of its corresponding cell.

##### Submission Instructions
1. Complete all the functions marked with "Your code here".
2. Run all cells to make sure your outputs are correct.
3. Answer all output questions in markdown.
4. Due on 27th Feb. 2026, Friday at 23:59pm
5. Submit your solution notebook to "Canvas -> HW2-Assignment"
6. **Prepend your NUS userID to the filename, i.e., "`a0123456b_HW.ipynb`"**. Only ipython notebooks are accepted.

### Installation
Make sure you have the required packages installed:

```bash
pip install langchain langchain-ollama python-dotenv
```

### Ollama Setup
We will be using Ollama with LangChain to run the Qwen 2.5 3B model locally.

1. Install Ollama:
   - macOS (Homebrew): `brew install --cask ollama`
   - Linux: `curl -fsSL https://ollama.com/install.sh | sh`
   - Windows: Download from [ollama.com/download](https://ollama.com/download)

2. Start Ollama service:
   ```bash
   ollama serve
   ```

3. Pull the Qwen 2.5 3B model:
   ```bash
   ollama pull qwen2.5:3b
   ```

### Deliverables
- Read the task description in each section.
- Design and run prompts in the designated cells.
- Iterate to improve results until the test script passes.
- Save your final prompt(s) and output for each technique.

### Evaluation rubric (10 pts total)
- 10 for completing the K-shot prompting tasks and output questions


In [None]:
model_card = "qwen2.5:3b"

## Part 1: K-shot Prompting (10 pts)
In this section, you will craft prompts that force strict output formats. Use the examples to guide behavior.

### Task 1.1: Structured JSON Extraction
Extract and format product information from text into a structured JSON format.

In [None]:
import os
import json
from dotenv import load_dotenv
from langchain_ollama import ChatOllama

load_dotenv()

NUM_RUNS_TIMES = 5

YOUR_SYSTEM_PROMPT = """
"""
USER_PROMPT = """
Extract product information and format as JSON:

The iPhone 15 Pro is priced at $999 and comes in 3 colors: titanium, black, and white. It has a 6.1 inch display.
"""

EXPECTED_OUTPUT = '{"name": "iPhone 15 Pro", "price": 999, "colors": ["titanium", "black", "white"], "display": "6.1 inch"}'


def test_your_prompt(system_prompt: str) -> bool:
    """Run the prompt up to NUM_RUNS_TIMES and return True if any output matches EXPECTED_OUTPUT.

    Prints "SUCCESS" when a match is found.
    """
    # Initialize ChatOllama
    llm = ChatOllama(
        model=model_card,
        temperature=0.2,
    )

    for idx in range(NUM_RUNS_TIMES):
        print(f"Running test {idx + 1} of {NUM_RUNS_TIMES}")

        # Create messages with system and user prompts
        messages = [
            ("system", system_prompt),
            ("human", USER_PROMPT),
        ]

        response = llm.invoke(messages)
        output_text = response.content.strip()

        # Try to parse both as JSON and compare
        try:
            output_json = json.loads(output_text)
            expected_json = json.loads(EXPECTED_OUTPUT)
            if output_json == expected_json:
                print("SUCCESS")
                print(f"Output: {output_text}")
                return True
        except json.JSONDecodeError:
            pass

        print(f"Expected output: {EXPECTED_OUTPUT}")
        print(f"Actual output: {output_text}")
    return False


In [None]:
test_your_prompt(YOUR_SYSTEM_PROMPT)


### Task 1.2: Reverse a Word
Reverse the order of letters in a word. Only output the reversed word, no other text.

In [None]:
import os
from dotenv import load_dotenv
from langchain_ollama import ChatOllama

load_dotenv()

NUM_RUNS_TIMES = 5

YOUR_SYSTEM_PROMPT = """
"""
USER_PROMPT = """
Reverse the order of letters in the following word. Only output the reversed word, no other text:

lexical
"""

EXPECTED_OUTPUT = "lacixel"


def test_your_prompt(system_prompt: str) -> bool:
    """Run the prompt up to NUM_RUNS_TIMES and return True if any output matches EXPECTED_OUTPUT.

    Prints "SUCCESS" when a match is found.
    """
    # Initialize ChatOllama
    llm = ChatOllama(
        model=model_card,
        temperature=0.2,
    )

    for idx in range(NUM_RUNS_TIMES):
        print(f"Running test {idx + 1} of {NUM_RUNS_TIMES}")

        # Create messages with system and user prompts
        messages = [
            ("system", system_prompt),
            ("human", USER_PROMPT),
        ]

        response = llm.invoke(messages)
        output_text = response.content.strip()

        if output_text.strip() == EXPECTED_OUTPUT.strip():
            print("SUCCESS")
            print(f"Output: {output_text}")
            return True
        else:
            print(f"Expected output: {EXPECTED_OUTPUT}")
            print(f"Actual output: {output_text}")
    return False

In [None]:
test_your_prompt(YOUR_SYSTEM_PROMPT)

Running test 1 of 5
Expected output: lacixel
Actual output: lacixeL
Running test 2 of 5
Expected output: lacixel
Actual output: lacixeL
Running test 3 of 5
Expected output: lacixel
Actual output: ylacexl
Running test 4 of 5
Expected output: lacixel
Actual output: ylacxeL
Running test 5 of 5
Expected output: lacixel
Actual output: ylacexl


False

### Task 1.3: Email Subject Classification
Classify an email subject into one of: billing, support, sales. Output only the label.

In [None]:
import os
from dotenv import load_dotenv
from langchain_ollama import ChatOllama

load_dotenv()

NUM_RUNS_TIMES = 5

YOUR_SYSTEM_PROMPT = """
"""

USER_PROMPT = """
Subject: Invoice overdue for account 4521
"""

EXPECTED_OUTPUT = "billing"


def test_your_prompt(system_prompt: str) -> bool:
    """Run the prompt up to NUM_RUNS_TIMES and return True if any output matches EXPECTED_OUTPUT.

    Prints "SUCCESS" when a match is found.
    """
    # Initialize ChatOllama
    llm = ChatOllama(
        model=model_card,
        temperature=0.2,
    )

    for idx in range(NUM_RUNS_TIMES):
        print(f"Running test {idx + 1} of {NUM_RUNS_TIMES}")

        messages = [
            ("system", system_prompt),
            ("human", USER_PROMPT),
        ]

        response = llm.invoke(messages)
        output_text = response.content.strip().lower()

        if output_text == EXPECTED_OUTPUT:
            print("SUCCESS")
            print(f"Output: {output_text}")
            return True
        else:
            print(f"Expected output: {EXPECTED_OUTPUT}")
            print(f"Actual output: {output_text}")
    return False

In [None]:
test_your_prompt(YOUR_SYSTEM_PROMPT)

Running test 1 of 5
SUCCESS
Output: billing


True

### Task 1.4: Output Questions
Answer the following based on the outputs you observe:
1. Which k-shot task shows the biggest improvement from adding examples? Why?
2. For the JSON task, what kinds of errors do you see without strong prompts (extra keys, formatting, types)?
3. Compare a small model vs. a larger model (from OpenAI or Genimi). Does the larger model still benefit from k-shot examples?
4. Which examples seem to help the most, and how would you refine them?

##### your answer here

#### End of HW