### üßë‚Äçüè´ Practical Lesson: Python for LLMs (Step 1 after OOP)

---

#### 1. Strings & Tokenization

üìå LLMs work with tokens (sub-pieces of text). Before huggingface/transformers, show them basic text processing.

In [None]:
text = "ChatGPT is amazing at Python!"

# Simple tokenization (split by spaces)
tokens = text.split()
print("Tokens:", tokens)

# Lowercasing
tokens = [t.lower() for t in tokens]
print("Lowercased:", tokens)

# Count tokens
print("Number of tokens:", len(tokens))


üëâ Teaching point: LLM tokenizers are more advanced, but conceptually similar.

--- 

#### 2. Word Embeddings (Manual)

üìå Show how words can be represented as vectors.

In [None]:
import numpy as np

# Fake word embeddings (in real LLMs these are learned)
embeddings = {
    "cat": np.array([1, 0, 0]),
    "dog": np.array([0.9, 0.1, 0]),
    "apple": np.array([0, 1, 0])
}

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

print("cat vs dog:", cosine_similarity(embeddings["cat"], embeddings["dog"]))
print("cat vs apple:", cosine_similarity(embeddings["cat"], embeddings["apple"]))


üëâ Teaching point: This shows why ‚Äúcat‚Äù and ‚Äúdog‚Äù are closer in meaning than ‚Äúcat‚Äù and ‚Äúapple.‚Äù

---

#### 3. Using a Real Tokenizer (Hugging Face)

üìå Transition to actual tools used in LLMs.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

text = "Large Language Models are powerful."
tokens = tokenizer.tokenize(text)
ids = tokenizer.convert_tokens_to_ids(tokens)

print("Tokens:", tokens)
print("Token IDs:", ids)

üëâ Teaching point: Show them subword tokenization ("powerful" ‚Üí "power", "##ful").

---

#### 4. Using a Small LLM for Inference

üìå First taste of running a model.

In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")

result = generator("Artificial intelligence is", max_length=20, num_return_sequences=1)
print(result[0]["generated_text"])


üëâ Teaching point: They see how LLMs actually produce text.

---

#### 5. Mini-Project: Build a Q&A Assistant

üìå Show retrieval + LLM answering (very simplified).

In [None]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

context = """
Python is a programming language created by Guido van Rossum.
It is widely used in AI, machine learning, and data science.
"""

question = "Who created Python?"
answer = qa_pipeline(question=question, context=context)

print("Answer:", answer["answer"])


üëâ Teaching point: This connects text processing ‚Üí embeddings ‚Üí LLMs ‚Üí actual AI use case.

---

#### 6. Exercises for Students

Tokenize their own text using Hugging Face tokenizer.

Compare cosine similarity between words they choose.

Try changing the prompt for distilgpt2 and see what it generates.

Create their own mini knowledge base (a few sentences of context) and ask it questions.