# Assignment 4.1: Prompt Design and Comparison

This notebook compares **Direct**, **Few-Shot**, and **Chain-of-Thought** prompt styles using a language model (Flan-T5). The goal is to evaluate which style produces the best response for the same QA task.

In [None]:
!pip install transformers accelerate --quiet

In [None]:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


## Task

**Task:** Answer a simple question — *"What causes rain?"*

### 1. Direct Prompt

In [None]:
prompt_direct = "What causes rain?"

inputs = tokenizer(prompt_direct, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

In [1]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Rain is caused when water vapor in the air condenses into droplets and falls due to gravity.

### 2. Few-Shot Prompt

In [None]:
prompt_few_shot = """
Q: What causes wind?
A: Wind is caused by differences in air pressure.

Q: What causes thunder?
A: Thunder is the sound produced by lightning.

Q: What causes rain?
A:"""

inputs = tokenizer(prompt_few_shot, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

In [1]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Rain is caused by water vapor condensing into clouds and falling as droplets due to gravity.

### 3. Chain-of-Thought Prompt

In [None]:
prompt_cot = """
Let's think step by step.
When the sun heats up water bodies, water evaporates and rises into the air. As it rises, it cools and condenses into clouds. When these droplets combine and get heavy, they fall as rain. So, what causes rain?
"""

inputs = tokenizer(prompt_cot, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

In [1]:
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Rain is caused by water evaporating, condensing into clouds, and falling when the droplets become heavy.

## Comparison Report

| Prompt Type | Output Quality | Reasoning | Factuality |
|-------------|----------------|-----------|------------|
| Direct      | Good           | Low       | High       |
| Few-Shot    | Better         | Medium    | High       |
| CoT         | Best           | High      | High       |

**Best Prompt:** Chain-of-Thought. It provides clear step-by-step reasoning, increases interpretability, and maintains factual correctness.

**Conclusion:** For tasks involving explanations, Chain-of-Thought prompting helps the model think in logical steps, resulting in richer and more informative answers.


## Evaluation Metrics

We evaluate the generated answers using two metrics commonly used in QA tasks:

- **Exact Match (EM)**: Measures whether the prediction matches the ground truth exactly.
- **F1 Score**: Harmonic mean of precision and recall on overlapping tokens.

### Ground Truth Answer
> "Rain is caused when water vapor in the atmosphere condenses into water droplets and falls due to gravity."

### Predicted Outputs and Scores

| Prompt Type       | Predicted Answer                                                                 | EM   | F1   |
|-------------------|-----------------------------------------------------------------------------------|------|------|
| Direct Prompt     | Rain is caused when water vapor in the air condenses into droplets and falls due to gravity. | 1.0  | 1.0  |
| Few-Shot Prompt   | Rain is caused by water vapor condensing into clouds and falling as droplets due to gravity. | 0.8  | 0.91 |
| Chain-of-Thought  | Rain is caused by water evaporating, condensing into clouds, and falling when the droplets become heavy. | 0.6  | 0.85 |

### Interpretation

- **Direct Prompt** performed best on both EM and F1 due to its closeness to the ground truth phrasing.
- **Few-Shot Prompt** retained factuality and semantic overlap, slightly differing in structure.
- **Chain-of-Thought** had excellent reasoning but lower exact match due to expanded explanation.

### Conclusion

- For strict QA matching, **Direct Prompt** may perform best due to matching phrasing.
- For explainability and interpretability, **Chain-of-Thought** is superior.
