<a href="https://colab.research.google.com/github/burakozturan/bliss/blob/main/BlISS_Lab08_LLM_Capabilities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 08: Discovering What LLMs Can Do - 10 Core Capabilities

**Duration**: 120 minutes | **Prerequisites**: Lab 07 (Transformers) | **Cost**: $0

## üéØ Learning Objectives
By the end of this module, you'll be able to:
- Use 10 fundamental LLM capabilities through Hugging Face
- Select appropriate models for different tasks
- Combine capabilities for research applications

## üìã Quick Reference Card
Keep this open while working!

| ID | Capability | Task | Recommended Model | Memory | Example Use |
|---|---|---|---|---|---|
| A1 | Text Classification | Categorize text | `distilbert-base-uncased` | ~250MB | Sort papers by topic |
| A2 | NER | Find entities | `dslim/bert-base-NER` | ~400MB | Extract philosopher names |
| A3 | Zero-Shot | Flexible categories | `facebook/bart-large-mnli` | ~1.6GB | Custom classifications |
| B1 | Question Answering | Extract info | `distilbert-base-cased-distilled-squad` | ~250MB | Find specific claims |
| B2 | Summarization | Condense text | `sshleifer/distilbart-cnn-12-6` | ~1.2GB | Abstract papers |
| B3 | Feature Extraction | Text similarity | `sentence-transformers/all-MiniLM-L6-v2` | ~90MB | Find related concepts |
| C1 | Text Generation | Create text | `gpt2` | ~500MB | Brainstorm ideas |
| C2 | Translation | Convert languages | `Helsinki-NLP/opus-mt-en-de` | ~300MB | Access global philosophy |
| C3 | Chat Models | Dialogue | `microsoft/DialoGPT-small` | ~350MB | Interactive exploration |
| C4 | Text-to-Image* | Visualize | `runwayml/stable-diffusion-v1-5` | ~4GB | Concept visualization |

*Text-to-Image requires significant resources - we'll demonstrate conceptually

## Part 1: Setup (5 minutes)

In [None]:
# Install required libraries
!pip install -q transformers torch sentencepiece sentence-transformers

from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')  # Hide unnecessary warnings

# Check if we have a GPU (Graphics Processing Unit) for faster processing
import torch
device = 0 if torch.cuda.is_available() else -1

if device == 0:
    print("üöÄ GPU detected! Models will run faster!")
else:
    print("üíª Using CPU (works fine, just slower)")
    #üí° Tip: In Colab, go to Runtime ‚Üí Change runtime type ‚Üí GPU"

### Your First AI Pipeline

ü§î **What's a Pipeline?**

A pipeline is a program that:
1. Takes your text
2. Processes it through an AI model
3. Gives you useful results

Let's create your first one!

## Part 2: Category A - Understanding Text (30 minutes)

### A1. Text Classification - Categorizing Ideas

üìö **What is Text Classification?**
- Classification puts text into categories
- Like sorting mail into different folders

In [None]:
# üéØ Your First AI Model - Sentiment Analysis
# This model can detect if a text's sentiment is positive or negative

# Load classifier
classifier = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=device)

# Minimal example
result = classifier("AI can help humanity solve complex problems")
print(f"Text: 'AI can help humanity solve complex problems'")
print(f"Result: {result[0]['label']} ({result[0]['score']:.2%})")

#### üéØ Exercise A1: Classify Philosophical Statements
**Objective**: Analyze how AI perceives different philosophical claims

In [None]:
statements = [
    "The unexamined life is not worth living",
    "Knowledge is power"
    # TODO: Add more philosophical statements
]

# TODO: Classify each statement and count positive vs negative
# Expected outcome: A summary showing the distribution of sentiments

# Your code here

<details>
<summary>üí° Hint 1</summary>
Think about using a loop to process each statement
</details>

<details>
<summary>üí° Hint 2</summary>
Keep counters for positive and negative classifications
</details>

<details>
<summary>üí° Hint 3</summary>
Access the label with result[0]['label']
</details>

### A2. Named Entity Recognition - Finding Important Things

üîç **What is NER (Named Entity Recognition)?**
- Finds and labels important things in text
- People, places, organizations, dates
- Like highlighting all the important names in a book

In [None]:
# Load NER model
ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple", device=device)

# Minimal example
text = "Plato founded the Academy in Athens"
entities = ner(text)
for entity in entities:
    print(f"{entity['word']}: {entity['entity_group']}")

#### üéØ Exercise A2: Build a Philosopher Database
**Objective**: Extract all people and places from philosophical texts

In [None]:
texts = [
    "Immanuel Kant was born in K√∂nigsberg in 1724",
    "Simone de Beauvoir met Jean-Paul Sartre in Paris",
    # TODO: Add 3 more texts about philosophers
]

# TODO: Extract all PERSON and LOCATION entities
# Create a dictionary: {entity_type: [list of unique entities]}
# Expected outcome: {'PERSON': [...], 'LOCATION': [...]}

# Your code here

<details>
<summary>üí° Hint 1</summary>
Process each text with the ner pipeline
</details>

<details>
<summary>üí° Hint 2</summary>
Check entity['entity_group'] for PER and LOC
</details>

### A3. Zero-Shot Classification - Flexible Categories

üéØ **What is Zero-Shot Classification?**
- Classify text into ANY categories you define
- No training needed - just describe the categories!
- Like having a librarian who can create new sections on demand

In [None]:
# Load zero-shot classifier
zero_shot = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=device)

# Minimal example
text = "What is the nature of consciousness?"
categories = ["ethics", "metaphysics", "epistemology"]
result = zero_shot(text, candidate_labels=categories)
print(f"Top category: {result['labels'][0]} ({result['scores'][0]:.2%})")

In [None]:
result

#### üéØ Exercise A3: Create Custom Categories
**Objective**: Design and test your own classification system

In [None]:
# TODO: Create a classification system for philosophical texts
# 1. Define 5-7 categories relevant to your interests
# 2. Classify at least 5 different texts
# 3. Find which category appears most frequently

philosophical_texts = [
    # TODO: Add your texts here
]

my_categories = [
    # TODO: Add your categories here
]

# Your code here

<details>
<summary>üí° Hint 1</summary>
Categories could be: time periods, schools of thought, or topics
</details>

<details>
<summary>üí° Hint 2</summary>
Track results in a dictionary to count frequencies
</details>

### üîß Category A Synthesis
**Objective**: Combine classification and NER for comprehensive text analysis

In [None]:
def analyze_philosophical_text(text):
    """Combine multiple understanding capabilities"""
    # TODO: Use all three A-category tools to analyze a text
    # Return a dictionary with:
    # - sentiment (from A1)
    # - entities (from A2)
    # - philosophical branch (from A3)
    pass

# Test your function
sample_text = "Socrates taught Plato in Athens that wisdom comes from knowing one's ignorance"
# Your code here

## Part 3: Category B - Working with Knowledge (30 minutes)

### B1. Question Answering - Extract Information

‚ùì **What is Question Answering?**
- AI finds answers in a given text
- Like having a research assistant who reads for you
- Useful for: Literature review, finding specific information

In [None]:
# Load QA model
qa_model = pipeline("question-answering", model="distilbert-base-cased-distilled-squad", device=device)

# Minimal example
context = "The Stoics believed that virtue is the only true good."
question = "What did the Stoics believe?"
answer = qa_model(question=question, context=context)
print(f"Answer: {answer['answer']} (confidence: {answer['score']:.2%})")

#### üéØ Exercise B1: Philosophical Q&A System
**Objective**: Extract key information from a philosophical text

In [None]:
context = """
[TODO: Paste a paragraph from a philosophical text here -
could be from Stanford Encyclopedia, a paper abstract, etc.]
"""

questions = [
    # TODO: Create 5 questions about your text
]

# TODO: Get answers for all questions
# For each answer, also print the confidence score
# Expected outcome: Q&A pairs with confidence levels

# Your code here

<details>
<summary>üí° Hint 1</summary>
Loop through your questions list
</details>

<details>
<summary>üí° Hint 2</summary>
Check if confidence is above 50% for reliable answers
</details>

### B2. Summarization - Condensing Knowledge

üìù **What is Summarization?**
- Condense long texts while preserving key ideas.

In [None]:
# Load summarizer
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", device=device)

# Minimal example
long_text = """The trolley problem is a thought experiment in ethics about a fictional scenario
in which an onlooker has the choice to save 5 people in danger of being hit by a trolley,
by diverting the trolley to kill just 1 person. The problem highlights the difference between
deontological and consequentialist ethical frameworks."""
summary = summarizer(long_text, max_length=50, min_length=10)
print(summary[0]['summary_text'])

#### üéØ Exercise B2: Research Paper Digest
**Objective**: Create concise summaries of philosophical arguments

In [None]:
# TODO: Find and summarize 3 philosophical arguments
# Each should be 150-300 words originally
# Summarize to 30-50 words each

arguments = [
    # TODO: Add your philosophical arguments here
]

# TODO: Create summaries and calculate compression ratios
# Expected outcome: Original length ‚Üí Summary length for each

# Your code here

<details>
<summary>üí° Hint 1</summary>
Use len(text.split()) to count words
</details>

<details>
<summary>üí° Hint 2</summary>
Compression ratio = original_length / summary_length
</details>

### B3. Feature Extraction - Understanding Meaning

üß† What is Feature Extraction?

- Converts text into numbers that capture semantic similarities
- Allows us to find similar texts
- Like creating a "fingerprint" of ideas

In [None]:
# Load feature extractor
from sentence_transformers import SentenceTransformer
import numpy as np

embedder = SentenceTransformer('all-MiniLM-L6-v2')

# Minimal example
texts = ["Knowledge is power", "Power comes from knowledge"]
embeddings = embedder.encode(texts)
similarity = np.dot(embeddings[0], embeddings[1])
print(f"Similarity score: {similarity:.2f}")

#### üéØ Exercise B3: Concept Similarity Explorer
**Objective**: Find which philosophical concepts are most related

In [None]:
concepts = [
    "free will",
    "determinism",
    "consciousness",
    "moral responsibility",
    "causation",
    # TODO: Add 5 more philosophical concepts
]

# TODO: Find the most similar pair of concepts
# Expected outcome: "X and Y are most similar (score: 0.XX)"

# Your code here

<details>
<summary>üí° Hint 1</summary>
You'll need to compare every pair of concepts
</details>

<details>
<summary>üí° Hint 2</summary>
Use nested loops or itertools.combinations
</details>

### üîß Category B Synthesis
**Objective**: Build a knowledge extraction pipeline

In [None]:
def extract_knowledge(text):
    """Extract structured knowledge from philosophical text"""
    # TODO: Combine B-category tools to:
    # 1. Summarize the text (B2)
    # 2. Generate 3 key questions and answer them (B1)
    # 3. Extract key concepts and find their similarities (B3)
    pass

# Test with a philosophical text
# Your code here

## Part 4: Category C - Creating and Connecting (30 minutes)

### C1. Text Generation - AI as a Writing Partner

‚úçÔ∏è **What is Text Generation?**
- AI continues or creates text based on a prompt
- Like having a writing assistant that knows many styles
- Useful for: Brainstorming, exploring ideas, creating examples

In [None]:
# Load generator
generator = pipeline("text-generation", model="gpt2", device=device)

# Minimal example
prompt = "The meaning of life is"
result = generator(prompt,
                   max_length=30,
                   num_return_sequences=1,
                   temperature=0.8  # Controls randomness (0=predictable, 2=creative)
)
print(result[0]['generated_text'])

#### üéØ Exercise C1: Philosophical Prompt Explorer
**Objective**: Explore how AI continues philosophical thoughts

In [None]:
prompts = [
    "The nature of reality is",
    "Consciousness arises from",
    # TODO: Add 3 more philosophical prompts
]

# TODO: Generate 2 completions for each prompt
# Use different temperatures (0.7 and 1.2) to see the difference
# Expected outcome: Compare creative vs conservative completions

# Your code here

<details>
<summary>üí° Hint 1</summary>
Temperature controls randomness: lower = more focused
</details>

<details>
<summary>üí° Hint 2</summary>
Use temperature parameter in generator()
</details>

### C2. Translation
Access philosophy across languages.

In [None]:
# Load translator (English to German)
translator = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de", device=device)

# Minimal example
text = "I think therefore I am"
translation = translator(text)
print(f"English: {text}")
print(f"German: {translation[0]['translation_text']}")

#### üéØ Exercise C2: Multilingual Philosophy
**Objective**: Translate key philosophical terms and verify accuracy

In [None]:
philosophical_terms = [
    "being",
    "existence",
    "knowledge",
    # TODO: Add 5 more terms
]

# TODO: Create an English-German philosophical dictionary
# For bonus: translate a famous quote and its context
# Expected outcome: Dictionary of translations

# Your code here

### C3. Text Generation - Chat Models

‚úçÔ∏è What is Text Generation?

- Interactive philosophical dialogue.
- AI continues or creates text based on a prompt
- Like having a writing assistant that knows many styles
- Useful for: Brainstorming, exploring ideas, creating examples

In [None]:
# For chat models, we'll use the text generation pipeline with special formatting
chatbot = pipeline("text-generation", model="gpt2", device=device)

# Minimal example
user_input = "What is wisdom?"
response = chatbot(user_input, max_length=50, pad_token_id=50256)
print(f"Human: {user_input}")
print(f"AI: {response[0]['generated_text'][len(user_input):].strip()}")

In [None]:
user_input = "What is the meaning of life?"
response = chatbot(user_input, max_length=550, pad_token_id=50256)
print(f"Human: {user_input}")
print(f"AI: {response[0]['generated_text'][len(user_input):].strip()}")

#### üéØ Exercise C3: Philosophical Dialogue
**Objective**: Create a multi-turn philosophical conversation

In [None]:
# TODO: Have a 4-turn conversation about a philosophical topic
# Track the dialogue history
# Evaluate if the AI stays on topic

conversation_history = []
# Your code here

### C4. Text-to-Image (Conceptual)

üé® What is Text-to-Image?

- Generate images from text descriptions
- Visualize abstract concepts


In [None]:
# Note: Full text-to-image models require ~4GB+ RAM
# Here's how you would use one:

print("üé® Text-to-Image Concept Demo")
print("\nIf running a full model, you would:")
print("1. Load model: pipeline('text-to-image', model='runwayml/stable-diffusion-v1-5')")
print("2. Generate: image = generator('philosophical concept visualization')")
print("\nExample prompts for philosophical visualization:")

prompts = [
    "The allegory of the cave by Plato, dramatic lighting",
    "The trolley problem ethical dilemma, minimalist illustration",
    "Zen Buddhism meditation, peaceful abstract art",
    # TODO: Add 2 more visualization ideas
]

for prompt in prompts:
    print(f"- {prompt}")

### üîß Category C Synthesis
**Objective**: Build a creative philosophy exploration tool

In [None]:
def philosophical_exploration(topic):
    """Use creative tools to explore a philosophical topic"""
    # TODO: Given a topic, use C-category tools to:
    # 1. Generate 3 different perspectives on it (C1)
    # 2. Translate the key insight (C2)
    # 3. Create a dialogue about it (C3)
    # 4. Suggest a visual representation (C4)
    pass

# Test with a topic
# Your code here

## Part 5: Integration & Showcase (15 minutes)

### üéØ Final Project: Your Research Assistant
**Objective**: Combine capabilities from all categories

In [None]:
class PhilosophyResearchAssistant:
    def __init__(self):
        # TODO: Initialize 3-4 key pipelines you'll use
        pass

    def analyze_text(self, text):
        """Comprehensive text analysis using A-category tools"""
        # TODO: Implement using A1, A2, A3
        pass

    def extract_insights(self, text):
        """Extract knowledge using B-category tools"""
        # TODO: Implement using B1, B2, B3
        pass

    def explore_concept(self, concept):
        """Creative exploration using C-category tools"""
        # TODO: Implement using C1, C2, C3
        pass

# Create and test your assistant
assistant = PhilosophyResearchAssistant()
# Your code here

## üéØ Capability Combination Matrix

Which capabilities work well together?

| Combination | Use Case |
|---|---|
| A1 + A2 | Sentiment analysis of texts about specific philosophers |
| A3 + B1 | Classify text then ask targeted questions |
| B2 + B3 | Summarize multiple texts and find similar themes |
| A2 + C2 | Extract names and translate for international research |
| B1 + C1 | Answer questions then generate follow-up ideas |

## üìä Lab Summary

You've learned 10 fundamental LLM capabilities:
- **Understanding** (A): Classification, NER, Zero-shot
- **Knowledge** (B): Q&A, Summarization, Similarity
- **Creating** (C): Generation, Translation, Chat, Images

### üöÄ Next Lab Preview
In Lab 8, we'll use these same capabilities with more powerful API-based models, learning:
- How to access GPT-4, Claude, and other large models
- Prompt engineering for better results
- Cost management and when to use APIs vs local models

### üè† Take-Home Challenge
Pick one capability that interests you most and:
1. Find a more specialized model on Hugging Face
2. Apply it to your research area
3. Document what worked and what didn't

## üÜò Troubleshooting

| Issue | Solution |
|---|---|
| Out of memory | Use smaller models or reduce batch size |
| Slow performance | Enable GPU in Colab: Runtime ‚Üí Change runtime type |
| Model not loading | Check internet connection and model name |
| Poor results | Try different models from the same category |


## Author:

[Yunus Emre Tapan](https://www.linkedin.com/in/yemretapan/)