# üß† In-Class Exercise: Building Your First LLM Chatbot

Welcome!  
This notebook is your hands-on lab for **Session 3 ‚Äì Building and Tuning LLMs**.  
You‚Äôll go step-by-step through concepts we discussed in class ‚Äî pipelines, parameters, and model behavior ‚Äî and try small experiments to understand how LLMs actually ‚Äúthink.‚Äù  

#### What you will practice
1) Choosing the right model for the task  
2) Controlling outputs using generation parameters  
3) Creating a simple Streamlit LLM app

Let‚Äôs get started üöÄ  

In [1]:
from transformers import pipeline

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Classroom-safe prompt wrapper (use for all tasks)
def safe_prompt(task: str) -> str:
    return (
        "You are a helpful assistant for students. "
        "Keep responses polite, non-explicit, and suitable for a classroom.\n"
        f"Task: {task}\n"
        "Answer:"
    )


## üß© Concept 1: The Hugging Face Pipeline

**Theory Recap:**  
A pipeline is like a ‚Äúready-made tool‚Äù that connects your text input to an AI model.  
Instead of manually loading weights and tokenizers, we use a *pipeline* for common tasks such as summarization, translation, and text generation.

Different models are trained for different purposes:
- `flan-t5-small` ‚Üí instruction-following / Q&A  
- `distilgpt2` ‚Üí text continuation  
- `microsoft/DialoGPT-small` ‚Üí dialogue/chat  

Each model has its own strengths.  

In [12]:
# ‚úÖ Example: Create a simple pipeline and use it

# Step 1: Choose your task and model
task = "text2text-generation"
model_name = "google/flan-t5-small"

# Step 2: Create the pipeline
gen = pipeline(task, model=model_name)

# Step 3: Try it out
prompt = "Summarize: Artificial intelligence helps automate tasks."
response = gen(prompt, max_new_tokens=60)
print("Output:", response[0]['generated_text'])

Output: Use artificial intelligence to help automate tasks.


In [None]:
# üß† TASK 1 (Guided Practice)
# Use a different model - distilgpt2
# 1. Change the task to "text-generation"
# 2. Use model_name = "distilgpt2"
# 3. Create your own prompt like "Once upon a time..."

# Your code below üëá


In [11]:
# ‚úÖ Example: Compare FLAN vs DialoGPT on the same input

# Model 1: Instruction-following model
flan = pipeline("text2text-generation", model="google/flan-t5-small")

# Model 2: Autocomplete-style text generation model
distilgpt = pipeline("text-generation", model="distilgpt2")

# Same intent, different prompt styles
prompt_flan = "Give 3 tips for making a good first impression:"
prompt_gpt  = "How do I make a good first impression?"

response_flan = flan(prompt_flan, max_new_tokens=60)
response_gpt  = distilgpt(prompt_gpt, max_new_tokens=60)

print("FLAN says:")
print(response_flan[0]["generated_text"])

print("\ndistilGPT2 says:")
print(response_gpt[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


FLAN says:
Make sure you have a good first impression.

distilGPT2 says:
How do I make a good first impression? Or perhaps someone else could provide some guidance as to the concept.


The first part of the script is that the script is going to look like this:
What the hell is going to happen to the man who wrote it?
It has been announced that if the men in the film


In [None]:
# üí° Task 2 (Critical Thinking)
# What happens if you swap the prompts for the two models?
# If you want **bullet points**, which model + prompt style is better?
# Try using "text2text-generation" instead of "text-generation" for distilgpt2. What happens?
# Write your observation in a comment below & discuss with your groupüëá


## üß© Concept 2: Controlling Model Creativity

**Theory Recap:**  
Parameters like `temperature`, `top_p`, and `max_new_tokens` control how ‚Äúcreative‚Äù or ‚Äúfocused‚Äù the model‚Äôs output is.  
- **Temperature**: randomness (0 = deterministic, 1 = more creative).  
- **Top-p**: diversity of words considered.  
- **Max new tokens**: how long the response can be.

#### A) `max_new_tokens` (controls response length)

Syntax:
```python
response = gen_base(prompt, max_new_tokens=60)


In [27]:
# Code example

flan = pipeline("text2text-generation", model="google/flan-t5-base")

prompt = "Explain cloud computing in simple terms (student-friendly):"

response_short = flan(prompt, max_new_tokens=20)
response_long = flan(prompt, max_new_tokens=120)
# prompt = safe_prompt(task)

print("max_new_tokens = 20")
print(response_short[0]["generated_text"])

print("\nmax_new_tokens = 120")
print(response_long[0]["generated_text"])


max_new_tokens = 20
Cloud computing is a type of computing that uses a lot of resources, including computer resources,

max_new_tokens = 120
Cloud computing is a type of computing that uses a lot of resources, including computer resources, to store and store data.


In [14]:
### Task 1 (max_new_tokens)
# Play with 'max_new_tokens'
# 1. Generate a short version (20 tokens)
# 2. Generate a longer version (80 tokens)
# Observe the difference in length and tone.

prompt = "Describe a sunset."
# Your code below üëá



In [13]:
### Task 2 (max_new_tokens)
# You want a **short title generator**.
# Update the prompt + `max_new_tokens` so the output is:
# - **one short title only**

# Hint: ask for ‚Äúexactly 1 title‚Äù + keep tokens low.


----------------------

#### B) `do_sample` (controls variation)

- `do_sample=False` ‚Üí more deterministic
- `do_sample=True`  ‚Üí more varied

Syntax:
```python
flan(prompt, do_sample=True)


In [36]:
# Code Example

flan_base = pipeline("text2text-generation", model="google/flan-t5-base")

prompt = "Give 1 study tip for learning Python:"

response_false = flan_base(prompt, do_sample = False, max_new_tokens=60)
response_true = flan_base(prompt, do_sample = True, max_new_tokens=60)
# prompt = safe_prompt(task)

print("do_sample = False")
print(response_false[0]["generated_text"])

print("\ndo_sample = True")
print(response_true[0]["generated_text"])

do_sample = False
Using a calculator, you can find the number of lines in a line.

do_sample = True
The best way of practicing a concept is to use a mixture of several notes. Practice is key; and the best way to learn it is to study it often


##### Task 1 (do_sample)
Run `do_sample=True` **3 times**. Did you get different outputs?

Then run `do_sample=False` **3 times**. Did it stay similar?

##### üí° Task 2 (Critical Thinking)
You are building a **FAQ bot** for students where answers must be consistent.
Which is better: `do_sample=True` or `do_sample=False`? 

Discuss with your group. What other cases can you think of when do_sample is true or when do_sample is false?
Hint: consistency ‚Üí deterministic.

-------------------------------------

#### C) `temperature` (randomness when sampling)

Works best when `do_sample=True`.

- low (0.2) ‚Üí safer / more predictable
- higher (1.2) ‚Üí more variety / higher risk

Syntax:
```python
gen(prompt, do_sample=True, temperature=0.7)


In [38]:
# ‚úÖ Example: Comparing low vs high temperature

generator = pipeline("text2text-generation", model="google/flan-t5-small")

prompt = "Write a one-line quote about teamwork."

response_low = generator(prompt, do_sample = True, temperature=0.2, max_new_tokens=30)
response_high = generator(prompt, do_sample = True, temperature=1.2, max_new_tokens=30)

print("Low temperature:", response_low[0]["generated_text"])
print("High temperature:", response_high[0]["generated_text"])



Low temperature: The teamwork is a key part of the team's success.
High temperature: The company was going to get their approval for the upcoming campaign in March 2017, this time with a larger group than expected, and we saw 


In [39]:
# üß† Task 1 (Guided Practice)
# Play with 'temperature'
# 1. Generate responses with temperature values equal to 0.2, 0.7 and 1.2
# Observe the difference, discuss in your group how it differs.

prompt = "Describe a sunset."
# Your code below üëá


In [40]:
# üí° Task 2 (Critical Thinking)
# You want a **creative tagline generator** for a college event poster.
# Should temperature go up or down? Why?
# Code your creative tagline generator and discuss with the group.
# Hint: creativity usually increases when temperature increases.


-------------------------------

#### D) `top_p` (nucleus sampling: size of choice pool)

Works best when `do_sample=True`.

- `top_p=0.1` ‚Üí narrow choices (more focused)
- `top_p=0.9` ‚Üí wider choices (more diverse)

Syntax:
```python
gen(prompt, do_sample=True, top_p=0.9)


In [44]:
# Code Example

generator = pipeline("text2text-generation", model="google/flan-t5-base")

prompt = "Give exactly 3 bullet points on why teamwork matters:"

response_low_p = generator(prompt, do_sample=True, temperature=0.3, top_p=0.1, max_new_tokens=80)
response_high_p = generator(prompt, do_sample=True, temperature=0.3, top_p=0.9, max_new_tokens=80)

print("top_p = 0.1")
print(response_low_p[0]["generated_text"])

print("\ntop_p = 0.9")
print(response_high_p[0]["generated_text"])


top_p = 0.1
Teamwork is the key to success.

top_p = 0.9
Teamwork is a key component of a successful business.


In [None]:
# üß† Task 1 (Guided Practice)
# Play with 'top_p'
# 1. Generate responses with top_p values equal to 0.2, 0.5 and 0.9
# Which gives the best balance of clarity + variety? Discuss in your group.

prompt = "Enter your prompt here"
# Your code below üëá


In [None]:
# üí° Task 2 (Critical Thinking)
# You are generating **formal email replies** for students.
# Should `top_p` be lower or higher? Why?
# Hint: formal + consistent ‚Üí narrower pool.


---------------------------

### Use all parameters together (applied)

Use case: ‚ÄúStudy helper‚Äù
Output should be:
- polite
- exactly 3 bullet points
- not too long
- not too random

Tune:
- `max_new_tokens`
- `do_sample`
- `temperature`
- `top_p`


In [54]:
prompt = (
    "Give exactly 3 bullet points on how to prepare for exams effectively:"
)
# prompt = safe_prompt(task)

response = flan(
    prompt,
    max_new_tokens=60,
    do_sample=True,
    temperature=0.5,
    top_p=0.7
)[0]["generated_text"]

print(response)


Exams are a huge part of our lives, so it is important to make sure you are prepared for them.


-------------------------------

## üß© Concept 3: Build a simple Streamlit App

**Theory Recap:**  
Streamlit helps us build simple web UIs for our chatbot ‚Äî  
students can type questions and see AI responses in real time.  

We won‚Äôt build the full app here ‚Äî but let‚Äôs preview how the logic works.


- **Step 1**: Create a new file: `student_app.py`

- **Step 2**: Copy the code from the next cell into that file.

- **Step 3 (run in terminal)**: 
```bash
streamlit run student_app.py


If your file is inside a folder (example: Demo/):
```bash
streamlit run Demo/student_app.py


#### ‚úÖ Example: Basic Streamlit chatbot (run later as .py)

##### Save this as llm_chatbot_app.py and run: streamlit run llm_chatbot_app.py

##### Template students copy into .py
```python
# Copy this into: student_app.py

import streamlit as st
from transformers import pipeline

st.set_page_config(page_title="Student LLM App", page_icon="üß†")
st.title("üß† Student LLM App")

gen = pipeline("text2text-generation", model="google/flan-t5-small")

def safe_prompt(task: str) -> str:
    return (
        "You are a helpful assistant for students. "
        "Keep responses polite, non-explicit, and suitable for a classroom.\n"
        f"Task: {task}\n"
        "Answer:"
    )

max_new_tokens = st.slider("Max new tokens", 16, 200, 80, 8)
user_text = st.text_area("Enter a classroom-safe question/task:")

if st.button("Generate"):
    if not user_text.strip():
        st.warning("Please type something first.")
    else:
        prompt = safe_prompt(user_text)
        output = gen(prompt, max_new_tokens=max_new_tokens)[0]["generated_text"]
        st.subheader("LLM Output")
        st.write(output)

# Easy edits:
# 1) Switch model to google/flan-t5-base
# 2) Change the prompt to request bullets / steps / 1 sentence
# 3) Change max_new_tokens default

"""


In [None]:
# üß† Task 1 (Homework Practice)
# 1. In your Streamlit app file, add a sidebar slider for 'max_new_tokens'.
# 2. Let the user control the answer length interactively.
# 3. Test how the response changes for small vs large values.
# (You don‚Äôt have to run Streamlit here, just plan the code.)


In [None]:
# üí° Task 2 (Critical Thinking)
# Think about a new feature you‚Äôd add if you had more time.
# Example ideas:
# - A dropdown to choose between models
# - A toggle for ‚Äúcreative‚Äù vs ‚Äúprecise‚Äù mode
# - Saving previous chat responses
# Write your idea below üëá


## üß≠ Wrap-Up & Look-Ahead Reflection

### üéì What You Learned Today
- How to use the **Hugging Face pipeline** to connect prompts ‚Üí models  
- How **parameters** like temperature, top-p, and tokens change model behavior  
- How to pick the **right model** for a given task (Flan vs GPT vs DialoGPT)  
- How a simple **Streamlit UI** turns code into an interactive chatbot  

---

üéØ **Challenge for the Curious:**  
Write down one ‚Äúpain point‚Äù you noticed while testing your chatbot today.  
What felt limited or frustrating ‚Äî and what would you love to improve if you could?


---

### üí¨ Think About‚Ä¶
1. Our chatbot only knows what‚Äôs inside its model ‚Äî it can‚Äôt answer about *your* documents or notes.  
   - How could we make it read PDFs or data files and respond using that knowledge?  

2. Today‚Äôs bot handles one message at a time.  
   - What if you wanted several ‚Äúmini-bots‚Äù ‚Äî one to search, one to plan, one to answer ‚Äî all working together?  

3. Our model always starts fresh ‚Äî it forgets previous questions.  
   - How could a chatbot remember your last conversation or build on context?  

4. Curious minds only üöÄ  
   - Ever wondered how these models can be **fine-tuned** on your own data, or how voice assistants use them in real time?  