# 🚀 BDAA Project Series: Intro to AI - LLMs
## Live Demo Notebook (matches slide presentation)

**This notebook is designed for live coding demos alongside the slide presentation.**  
🎯 **Goal:** Hands-on understanding through experimentation

---

## 📋 Agenda (follows slides exactly)
1. **Setup** → Get environment ready
2. **NLP & LLMs** → The big picture (concepts)
3. **🚀 Transformers Pipeline Tour** → Live coding session
4. **🎛️ Inference Basics** → Temperature, sampling, beams
5. **⚠️ Bias & Limitations** → Critical awareness demo
6. **🎯 Wrap-up** → Key takeaways

---

### 🎬 Demo Flow
- **Presenter:** Run cells during slides, invite audience input
- **Audience:** Suggest text examples, parameters to try
- **Format:** Interactive coding, not just showing results

---

### 🔗 Quick Navigation
- [🛠️ Setup](#setup-cell)
- [🚀 Pipeline Demos](#pipeline-demos)
- [🎛️ Inference Tuning](#inference-tuning)  
- [⚠️ Bias Examples](#bias-examples)

## 🛠️ Setup {#setup-cell}

**📢 LIVE DEMO START:** Run this cell first to verify environment

In [1]:
# 🛠️ Environment Setup
# Run this first! Installs core libraries if needed.

print("🔄 Setting up environment...")

try:
    import transformers, datasets, accelerate, sentencepiece  # noqa: F401
    print("✅ All packages already installed!")
except Exception as e:
    print("📦 Installing required packages...")
    %pip -q install -U transformers datasets accelerate sentencepiece
    import transformers, datasets, accelerate, sentencepiece  # noqa: F401

from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')  # Keep demo clean

print(f"✅ Environment ready!")
print(f"🤖 Transformers version: {transformers.__version__}")
print("🎯 Ready for live demos!")

🔄 Setting up environment...
✅ All packages already installed!
✅ Environment ready!
🤖 Transformers version: 4.56.2
🎯 Ready for live demos!


## 🚀 Pipeline Demos {#pipeline-demos}

**📢 LIVE CODING SECTION:** This matches the "Transformers in Practice" slides.

**🎬 Demo Format:**
- I'll run each cell live
- **Audience:** Suggest your own text examples!  
- **Goal:** See how easy transformers are to use

# Hugging Face Course — Chapters 1–2 (Combined)

**Source notebooks merged:**
- `Transformers,_what_can_they_do_.ipynb`
- `Bias_and_limitations.ipynb`

> All code cells remain unchanged from the originals. Cells are grouped by source notebook.

# Part A — Transformers: what can they do?

# Transformers, what can they do?

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [2]:
!pip install datasets evaluate transformers[sentencepiece]

zsh:1: no matches found: transformers[sentencepiece]


In [3]:
# 💻 LIVE DEMO: Sentiment Analysis (Quick Win!)
# 🎯 Slide match: "Sentiment Analysis (quick win)"

print("🚀 Demo 1: Sentiment Analysis Pipeline")
print("💡 This automatically downloads and uses a pre-trained model")
print()

from transformers import pipeline

# Create the pipeline (first run downloads model)
classifier = pipeline("sentiment-analysis")

# Try with a positive example
result = classifier("I've been waiting for a HuggingFace course my whole life.")
print("📝 Input: 'I've been waiting for a HuggingFace course my whole life.'")
print(f"🎯 Result: {result}")
print()

# 🎬 AUDIENCE INTERACTION PROMPT
print("💻 YOUR TURN: What text should we analyze next?")
print("   (I'll take suggestions from the audience!)")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


🚀 Demo 1: Sentiment Analysis Pipeline
💡 This automatically downloads and uses a pre-trained model



Device set to use mps:0


📝 Input: 'I've been waiting for a HuggingFace course my whole life.'
🎯 Result: [{'label': 'POSITIVE', 'score': 0.9598049521446228}]

💻 YOUR TURN: What text should we analyze next?
   (I'll take suggestions from the audience!)


In [4]:
# 🎯 Multiple examples at once
# 🎬 LIVE: Let's try both positive and negative examples

# Batch processing - more efficient!
examples = [
    "I've been waiting for a HuggingFace course my whole life.",  # Positive
    "I hate this so much!"  # Negative
]

results = classifier(examples)

print("📊 Batch Results:")
for text, result in zip(examples, results):
    emoji = "😊" if result['label'] == 'POSITIVE' else "😞" 
    print(f"{emoji} '{text[:50]}...' → {result['label']} ({result['score']:.3f})")

print()
print("💡 Key insight: Works great for English text!")
print("🤔 Question for audience: What languages might this struggle with?")

📊 Batch Results:
😊 'I've been waiting for a HuggingFace course my whol...' → POSITIVE (0.960)
😞 'I hate this so much!...' → NEGATIVE (0.997)

💡 Key insight: Works great for English text!
🤔 Question for audience: What languages might this struggle with?


In [5]:
# 🎯 LIVE DEMO: Zero-shot Classification
# 🎯 Slide match: "Zero-shot Classification"

print("🚀 Demo 2: Zero-shot Classification")
print("🎯 Use your own labels - no training needed!")
print()

from transformers import pipeline

classifier = pipeline("zero-shot-classification")

# Example from slides
result = classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

print("📝 Text: 'This is a course about the Transformers library'")
print("🏷️ Candidate labels: education, politics, business")
print("📊 Results:")
for label, score in zip(result['labels'], result['scores']):
    bar = "█" * int(score * 20)  # Visual bar
    print(f"  {label:10} {score:.3f} {bar}")

print()
print("💻 AUDIENCE CHALLENGE:")
print("   1. Give me any text")  
print("   2. Give me 3-4 categories")
print("   3. Let's see what happens!")

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


🚀 Demo 2: Zero-shot Classification
🎯 Use your own labels - no training needed!



Device set to use mps:0


📝 Text: 'This is a course about the Transformers library'
🏷️ Candidate labels: education, politics, business
📊 Results:
  education  0.845 ████████████████
  business   0.112 ██
  politics   0.043 

💻 AUDIENCE CHALLENGE:
   1. Give me any text
   2. Give me 3-4 categories
   3. Let's see what happens!


In [6]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use a large, high-speed computer to create simple and easy to use user interfaces.\n\nIn this course, we will show you how to create basic user interfaces using PowerShell. We will show you how to create a user interface using a PowerShell script.\n\nIn this course, we will show you how to create an interface using PowerShell. We will show you how to create a user interface using a PowerShell script.\n\nIn this course, we will show you how to create an interface using PowerShell. We will show you how to create an interface using a PowerShell script.\n\nIn this course, we will show you how to create an interface using PowerShell. We will show you how to create an interface using a PowerShell script.\n\nIn this course, we will show you how to create an interface using PowerShell. We will show you how to create an interface using a PowerShell script.\n\nIn this course, we will show you how to create an interface using PowerShel

In [7]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'In this course, we will teach you how to set up the OpenOffice 365 project.\n\n\n\n\n\nIf you prefer to learn more about OpenOffice 365, you can follow the full course here.\n\nIf you are interested in learning more about OpenOffice 365, you can follow the full course here.\nWith the latest release, you can learn about the new features of OpenOffice 365, and how to implement them in your projects.\nThis course will help you learn about the new features of OpenOffice 365, and how to implement them in your projects.'},
 {'generated_text': 'In this course, we will teach you how to create a virtual reality headset in real-time.\n\n\n\nIf you like this course, please go to the courses listed below or subscribe to our newsletter for more courses.'}]

In [8]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


[{'score': 0.19620011746883392,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052743315696716,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [9]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [10]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


{'score': 0.6949756145477295, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

In [11]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'summary_text': ' The number of engineering graduates in the United States has declined in recent years . China and India graduate six and eight times as many traditional engineers as the U.S. does . Rapidly developing economies such as China continue to encourage and advance the teaching of engineering . There are declining offerings in engineering subjects dealing with infrastructure, infrastructure, the environment, and related issues .'}]

In [12]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

Device set to use mps:0


[{'translation_text': 'This course is produced by Hugging Face.'}]

---

## 🎛️ Inference Tuning {#inference-tuning}

**📢 LIVE DEMO:** This matches the "Inference Basics" slides

**🎯 Learning Goals:**
- See how temperature affects creativity
- Compare top-k vs top-p sampling  
- Understand beam search vs sampling
- Feel the difference in generation quality

**🎬 Demo Strategy:** Run same prompt with different settings, compare outputs live!

**Notes**
- *Prefill* = initial pass over the prompt (impacts **TTFT**: time-to-first-token).
- *Decode* = token-by-token generation (impacts **TPOT**: time-per-output-token).
- Sampling: `temperature`, `top_k`, `top_p` (nucleus), plus repetition penalties.
- Beam search explores multiple candidates for more coherent text.
- KV cache reuses attention keys/values to speed up decoding (model/pipeline may manage this internally).


In [13]:
# 🎛️ LIVE DEMO: Sampling Controls
# 🎯 Slide match: "Sampling Controls"

print("🚀 Demo: Temperature & Sampling Effects")
print("🎯 Same prompt, different parameters = different personalities!")
print()

from transformers import pipeline

# Use a small, fast model for demos
gen = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")

# Fixed prompt for comparison
prompt = "Write a single friendly sentence about learning Transformers:"

print(f"📝 Prompt: '{prompt}'")
print("=" * 60)

# 🥶 Conservative/Deterministic (low temperature)
print("🥶 LOW TEMPERATURE (0.2) - Conservative & Predictable")
out_calm = gen(prompt, max_new_tokens=30, temperature=0.2, top_p=0.95, do_sample=True)
print(f"   Result: {out_calm[0]['generated_text'].split(prompt)[1].strip()}")
print()

# 🔥 Creative (higher temperature) 
print("🔥 HIGH TEMPERATURE (1.1) - Creative & Wild")
out_creative = gen(prompt, max_new_tokens=30, temperature=1.1, top_p=0.9, do_sample=True)
print(f"   Result: {out_creative[0]['generated_text'].split(prompt)[1].strip()}")
print()

print("💡 Key insight: Temperature controls creativity vs consistency!")
print("🤔 Audience question: When would you want high vs low temperature?")

🚀 Demo: Temperature & Sampling Effects
🎯 Same prompt, different parameters = different personalities!



Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


📝 Prompt: 'Write a single friendly sentence about learning Transformers:'
🥶 LOW TEMPERATURE (0.2) - Conservative & Predictable


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


   Result: The Last Knight.

I’m not sure if I’ve ever written a sentence like that before. I’m not sure if I

🔥 HIGH TEMPERATURE (1.1) - Creative & Wild
   Result: Optimus Prime's "Eh, he's a robot, so that means it's probably going to be a shitty Transformers cartoon".

💡 Key insight: Temperature controls creativity vs consistency!
🤔 Audience question: When would you want high vs low temperature?


In [14]:
# 🎯 LIVE DEMO: Beam Search vs Sampling
# 🎯 Slide match: "Beam Search"

print("🚀 Demo: Beam Search - More Coherent, Higher Cost")
print("🎯 Explores multiple possibilities, picks the best overall")
print()

# Same model, beam search settings
gen_beam = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
prompt = "Summarize the benefits of transfer learning in one sentence:"

print(f"📝 Prompt: '{prompt}'")
print("=" * 60)

# 🎯 Beam search (deterministic, more coherent)
print("🎯 BEAM SEARCH (num_beams=4) - Coherent & Structured")
out_beam = gen_beam(prompt, max_new_tokens=35, num_beams=4, do_sample=False)
print(f"   Result: {out_beam[0]['generated_text'].split(prompt)[1].strip()}")
print()

# 🎲 Compare with sampling
print("🎲 SAMPLING (temperature=0.7) - More Varied")  
out_sample = gen_beam(prompt, max_new_tokens=35, temperature=0.7, do_sample=True)
print(f"   Result: {out_sample[0]['generated_text'].split(prompt)[1].strip()}")
print()

print("💡 Trade-off: Beam search = coherent but slower, Sampling = faster but variable")
print("🤔 Which would you use for a chatbot vs creative writing?")

🚀 Demo: Beam Search - More Coherent, Higher Cost
🎯 Explores multiple possibilities, picks the best overall



Device set to use mps:0
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


📝 Prompt: 'Summarize the benefits of transfer learning in one sentence:'
🎯 BEAM SEARCH (num_beams=4) - Coherent & Structured


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


   Result: Transfer learning is a powerful technique that allows us to leverage the knowledge gained from one task to improve performance on another task. It involves using pre-trained models to

🎲 SAMPLING (temperature=0.7) - More Varied
   Result: Transfer learning is a powerful technique that allows us to leverage the knowledge gained from one task to solve a similar task. These benefits can be summarized as follows:

💡 Trade-off: Beam search = coherent but slower, Sampling = faster but variable
🤔 Which would you use for a chatbot vs creative writing?


**KV Cache note**  
Most modern generation backends (including `transformers` with compatible models) maintain a key‑value cache under the hood during decoding to avoid recomputing attention for previous tokens. You don't usually need to toggle anything explicitly in a basic `pipeline`, but when building custom generation loops, look for `use_cache=True` and check memory usage when pushing long contexts.


## ⚠️ Bias Examples {#bias-examples}

**📢 CRITICAL DEMO:** This matches the "Bias and Limitations" slides

**⚠️ Important Context:**
- We're going to see problematic bias in action
- This is **educational** - to raise awareness  
- **Goal:** Understand why evaluation and mitigation matter
- **Real-world impact:** These biases affect actual applications

**🎯 Learning Objective:** Recognize that even powerful models carry societal biases

# Bias and limitations

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [15]:
!pip install datasets evaluate transformers[sentencepiece]

zsh:1: no matches found: transformers[sentencepiece]


In [16]:
# ⚠️ CRITICAL DEMO: Gender Bias in Fill-Mask
# 🎯 Slide match: "Bias Example (Fill-Mask)"

print("⚠️ BIAS AWARENESS DEMO")
print("🎯 We'll see problematic stereotypes in a pre-trained model")
print("📚 This is EDUCATIONAL - to understand why bias matters")
print()

from transformers import pipeline

# Use BERT for fill-mask
unmasker = pipeline("fill-mask", model="bert-base-uncased")

# Test with gendered sentences
print("🔍 Testing: Occupation predictions by gender")
print("=" * 50)

# Male version
print("👨 Input: 'This man works as a [MASK].'")
result_man = unmasker("This man works as a [MASK].")
man_jobs = [r["token_str"] for r in result_man]
print(f"   Predictions: {man_jobs}")
print()

# Female version  
print("👩 Input: 'This woman works as a [MASK].'")
result_woman = unmasker("This woman works as a [MASK].")
woman_jobs = [r["token_str"] for r in result_woman]
print(f"   Predictions: {woman_jobs}")
print()

print("⚠️ NOTICE THE BIAS:")
print(f"   Men → {', '.join(man_jobs[:3])}")
print(f"   Women → {', '.join(woman_jobs[:3])}")
print()
print("💡 This reflects biases in training data (text from the internet)")
print("🎯 Why this matters: Real applications can perpetuate stereotypes")
print("✅ Solutions: Bias evaluation, filtering, diverse teams, human oversight")

⚠️ BIAS AWARENESS DEMO
🎯 We'll see problematic stereotypes in a pre-trained model
📚 This is EDUCATIONAL - to understand why bias matters



Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


🔍 Testing: Occupation predictions by gender
👨 Input: 'This man works as a [MASK].'
   Predictions: ['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']

👩 Input: 'This woman works as a [MASK].'
   Predictions: ['nurse', 'maid', 'teacher', 'waitress', 'prostitute']

⚠️ NOTICE THE BIAS:
   Men → carpenter, lawyer, farmer
   Women → nurse, maid, teacher

💡 This reflects biases in training data (text from the internet)
🎯 Why this matters: Real applications can perpetuate stereotypes
✅ Solutions: Bias evaluation, filtering, diverse teams, human oversight


## 🎯 Wrap-up & Next Steps

**🎉 What we accomplished in this live demo:**

✅ **Environment Setup** → Got transformers working  
✅ **Pipeline Tour** → Tried 6+ different tasks with zero training  
✅ **Inference Tuning** → Saw how parameters change behavior  
✅ **Bias Awareness** → Recognized real-world challenges  

---

### 🚀 Key Takeaways
1. **`pipeline()` = your best friend** → Quick path to working demos
2. **Choose architecture by task** → Encoder/Decoder/Seq2Seq
3. **Tune inference parameters** → Temperature, sampling, beams
4. **Always evaluate for bias** → Don't deploy without checking

---

### 🎯 Interactive Q&A
**💻 Let's try YOUR examples!**
- Bring any text you want to classify, generate, or analyze
- Suggest parameter combinations to experiment with
- Ask about specific use cases for your projects

---

### 📚 Continue Learning
- **Hugging Face Course:** Full chapters with more advanced topics
- **Model Hub:** 100,000+ models to explore
- **Datasets:** Pre-built datasets for training/evaluation
- **Community:** Forums, Discord, tutorials

**🎯 Remember:** Start simple with `pipeline()`, then dive deeper!