# Generative AI Assignment – Using `google/flan-t5-small`

This notebook demonstrates how to use a pre-trained Large Language Model (LLM) from Hugging Face to perform text generation using prompt engineering. The model used is `google/flan-t5-small`, chosen for its lightweight architecture and versatility across tasks.

### Goals:
- Load and run a pre-trained LLM
- Apply prompt engineering to guide output
- Experiment with key generation parameters (temperature, max tokens, top-p)
- Analyze how parameter changes affect output quality and style

This notebook supports the Fall 2025 assignment on Generative AI and Pre-trained LLMs, and is designed for modular clarity and rubric alignment.

##### Package Installation
This step installs the required Python libraries for working with pre‑trained Large Language Models (LLMs):

- **transformers**: Hugging Face’s library that provides access to pre‑trained models, tokenizers, and pipelines for tasks like text generation, summarization, and translation.  
- **torch**: PyTorch, the deep learning framework used as the backend for running and training models.

In [1]:
pip install transformers torch

Note: you may need to restart the kernel to use updated packages.


#### 1. Model Setup

This section loads the `google/flan-t5-small` model and tokenizer from Hugging Face. This model supports a variety of tasks including text generation, summarization, and question answering. It is lightweight and ideal for prompt engineering experiments.


##### Environment Configuration
This cell sets an environment variable to suppress Hugging Face Hub symlink warnings:  
```python
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

In [2]:
import os
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

##### Model Setup: Loading Flan‑T5
In this section we initialize the pre‑trained **Flan‑T5 Small** model from Hugging Face.  
- **AutoTokenizer**: Converts human text into token IDs the model can process, and back into text.  
- **AutoModelForSeq2SeqLM**: Loads the sequence‑to‑sequence model weights, enabling tasks like translation, summarization, and Q&A.  
- **Model name**: `"google/flan-t5-small"` is chosen for its lightweight architecture, making it ideal for experimentation with prompt engineering.  

This setup step ensures that later cells can run prompts through the model for text generation experiments.

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Define model name
model_name = "google/flan-t5-small"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

##### Quick Demo: Hugging Face Pipeline
This cell uses the high-level `pipeline("text2text-generation")` API with Flan‑T5.  
It asks the question *"What is the capital of France?"* and prints the generated output.  

⚠️ Note: The pipeline sometimes produces incorrect or echoed answers (e.g., "France" instead of "Paris") because the prompt is underspecified.  
This demonstrates the importance of **prompt engineering** — adding explicit task framing like  
*"Question: What is the capital of France? Answer:"* yields more reliable results.

In [4]:
from transformers import pipeline

generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
output = generator("What is the capital of France?")
print(output)

Device set to use cpu


[{'generated_text': 'france'}]


##### Test Case 1: Baseline Prompt (No Parameters)
This cell runs the first prompt through the Flan‑T5 model using the default `generate()` settings.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameters:** None explicitly set — the model uses greedy decoding by default.  
- **Purpose:** Serves as the baseline output for comparison against later test cases where parameters like `temperature` and `top_p` are varied.  
This allows us to see how the model behaves deterministically before introducing randomness or sampling diversity.

In [5]:
prompt = "Translate English to French: The weather is nice today."

# Tokenize input
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Generate output with more tokens allowed
outputs = model.generate(input_ids, max_new_tokens=20)

# Decode and print result
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

La météo est très agréable aujourd'hui.


##### Test Case 2: Varying Temperature
This run modifies only the `temperature` parameter to introduce controlled randomness.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameter change:** `temperature=0.7` (moderate randomness)  
- **Purpose:** Compare against the baseline (Cell 5) to see how temperature affects phrasing and creativity.

In [6]:
prompt = "Translate English to French: The weather is nice today."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=50,       # keep length consistent
    temperature=0.7,     # only parameter changed
    do_sample=True       # required for temperature to take effect
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

L'atmosphère est très nice today.


##### Test Case 3: High Temperature
This run modifies only the `temperature` parameter to a higher value.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameter change:** `temperature=1.0` (high randomness)  
- **Purpose:** Compare against the baseline (Cell 5) and moderate run (Cell 7) to see how increased randomness affects phrasing, creativity, and coherence.

In [7]:
prompt = "Translate English to French: The weather is nice today."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=50,       # keep length consistent
    temperature=1.0,     # higher randomness
    do_sample=True       # required for temperature to take effect
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

The weather is nice today.


##### Test Case 4: Top‑p Sampling (Conservative)
This run modifies only the `top_p` parameter to restrict sampling diversity.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameter change:** `top_p=0.7` (conservative sampling, fewer candidate tokens)  
- **Purpose:** Compare against baseline (Cell 5) to see how limiting sampling affects phrasing.

In [8]:
prompt = "Translate English to French: The weather is nice today."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=50,
    top_p=0.7,        # only parameter changed
    do_sample=True    # required for top-p to take effect
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

La météo est agréable aujourd'hui.


##### Test Case 5: Top‑p Sampling (Balanced)
This run modifies only the `top_p` parameter to allow moderate diversity in token selection.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameter change:** `top_p=0.9` (balanced sampling, broader candidate pool than conservative run)  
- **Purpose:** Compare against the conservative run (Cell 11) to see how increasing diversity changes phrasing while maintaining coherence.

In [9]:
prompt = "Translate English to French: The weather is nice today."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=50,
    top_p=0.9,        # only parameter changed
    do_sample=True    # required for top-p to take effect
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

La température serait légèrement proche.


##### Test Case 6: Top‑p Sampling (Very Open)
This run modifies only the `top_p` parameter to allow maximum diversity in token selection.  
- **Prompt:** `"Translate English to French: The weather is nice today."`  
- **Parameter change:** `top_p=1.0` (all candidate tokens considered)  
- **Purpose:** Compare against Cells 11 (conservative) and 13 (balanced) to see how maximum diversity affects phrasing, creativity, and coherence.

In [11]:
prompt = "Translate English to French: The weather is nice today."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=50,
    top_p=1.0,        # maximum diversity
    do_sample=True    # required for top-p to take effect
)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Le temps est très bien.
