<a href="https://colab.research.google.com/github/dslmllab/dSL-Lab-Coding-Challenge/blob/main/12_llm_Rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Large Language Models (LLMs) Tutorial with Challenges

## Table of Contents
1. Introduction to LLMs
2. Key Concepts and Architecture
3. Working with Pre-trained Models
4. Fine-tuning LLMs
5. Prompt Engineering
6. LLM Applications
7. Challenges and Exercises

---

## 1. Introduction to LLMs

Large Language Models (LLMs) are neural networks trained on vast amounts of text data to understand and generate human-like text. They have revolutionized NLP by achieving state-of-the-art performance on various tasks.

### Key Characteristics:
- **Scale**: Billions of parameters (GPT-3: 175B, LLaMA: 7B-70B)
- **Pre-training**: Trained on massive text corpora
- **Transfer Learning**: Can be fine-tuned for specific tasks
- **Few-shot Learning**: Can adapt to new tasks with minimal examples

In [19]:
!pip3 install torch transformers numpy matplotlib seaborn



In [20]:
# Import necessary libraries
import torch
import numpy as np
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    pipeline,
    GPT2LMHeadModel,
    GPT2Tokenizer
)
import warnings
warnings.filterwarnings('ignore')

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cpu


## 2. Key Concepts and Architecture

### Transformer Architecture
LLMs are based on the Transformer architecture, which uses self-attention mechanisms to process sequential data.

### Key Components:
1. **Self-Attention**: Allows the model to focus on different parts of the input
2. **Positional Encoding**: Provides position information to the model
3. **Feed-Forward Networks**: Process the attention outputs
4. **Layer Normalization**: Stabilizes training

In [21]:
# Import necessary libraries

import torch
import numpy as np
from transformers import (    AutoTokenizer,     AutoModelForCausalLM,    pipeline,    GPT2LMHeadModel,    GPT2Tokenizer)

import warnings

warnings.filterwarnings('ignore')

# Check if GPU is available

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(f"Using device: {device}")

# Note: If you get import errors, install required packages:



Using device: cpu


## 3. Working with Pre-trained Models

Let's load and use a pre-trained language model from Hugging Face.

In [22]:
# Load a small pre-trained model (GPT-2)
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
model = model.to(device)

# Set pad token
tokenizer.pad_token = tokenizer.eos_token

print(f"Model loaded: {model_name}")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters()):,}")

Model loaded: gpt2
Number of parameters: 124,439,808


In [23]:
# Text generation function
def generate_text(prompt, max_length=100, temperature=0.8, top_p=0.9):
    """
    Generate text using the loaded model

    Args:
        prompt: Input text prompt
        max_length: Maximum length of generated text (including prompt)
        temperature: Controls randomness (0.0 = deterministic, 1.0 = random)
        top_p: Nucleus sampling parameter
    """
    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt", padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Calculate prompt length
    prompt_length = inputs['input_ids'].shape[1]

    # Ensure max_length is greater than prompt length
    effective_max_length = max(max_length, prompt_length + 50)

    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=50,  # Use max_new_tokens instead of max_length
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode and return
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Test generation
prompt = "The future of artificial intelligence is"
generated = generate_text(prompt, max_length=100)
print(f"Prompt: {prompt}")
print(f"Generated: {generated}")

Prompt: The future of artificial intelligence is
Generated: The future of artificial intelligence is still uncertain, but the ability to do anything we want to with it is certainly in the making. It's possible that this is the key to a truly advanced machine. We can't be sure, but it's always good to have something that can


## 4. Fine-tuning LLMs

Fine-tuning allows us to adapt pre-trained models to specific tasks or domains.

In [24]:
# Example: Preparing data for fine-tuning
from torch.utils.data import Dataset, DataLoader

class TextDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length=128):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = self.texts[idx]
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )
        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze()
        }

# Sample training data
training_texts = [
    "Machine learning is transforming industries.",
    "Natural language processing enables computers to understand human language.",
    "Deep learning models can learn complex patterns from data.",
    "Transformers have revolutionized NLP tasks."
]

# Create dataset
dataset = TextDataset(training_texts, tokenizer)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

print(f"Dataset size: {len(dataset)}")
print(f"Batch example shape: {next(iter(dataloader))['input_ids'].shape}")

Dataset size: 4
Batch example shape: torch.Size([2, 128])


In [25]:
# Simple fine-tuning loop (demonstration)
from torch.optim import AdamW

def train_step(model, dataloader, optimizer, device):
    model.train()
    total_loss = 0

    for batch in dataloader:
        # Move to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)

        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=input_ids
        )

        loss = outputs.loss
        total_loss += loss.item()

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    return total_loss / len(dataloader)

# Initialize optimizer
optimizer = AdamW(model.parameters(), lr=5e-5)

# Train for one epoch (demonstration)
print("Training for 1 epoch...")
avg_loss = train_step(model, dataloader, optimizer, device)
print(f"Average loss: {avg_loss:.4f}")

Training for 1 epoch...
Average loss: 7.6350


## 5. Prompt Engineering

Prompt engineering is the art of crafting inputs to get desired outputs from LLMs.

In [26]:
# Prompt engineering examples
class PromptTemplates:
    @staticmethod
    def zero_shot(task, input_text):
        return f"{task}: {input_text}"

    @staticmethod
    def few_shot(task, examples, input_text):
        prompt = f"{task}\n\n"
        for ex in examples:
            prompt += f"Input: {ex['input']}\nOutput: {ex['output']}\n\n"
        prompt += f"Input: {input_text}\nOutput:"
        return prompt

    @staticmethod
    def chain_of_thought(question):
        return f"{question}\n\nLet's think step by step:"

# Example: Sentiment analysis with few-shot learning
sentiment_examples = [
    {"input": "This movie was fantastic!", "output": "Positive"},
    {"input": "I really didn't enjoy the food.", "output": "Negative"},
    {"input": "The weather is okay today.", "output": "Neutral"}
]

test_text = "The service was excellent and the staff were friendly."
prompt = PromptTemplates.few_shot(
    "Classify the sentiment of the following text",
    sentiment_examples,
    test_text
)

print("Few-shot prompt:")
print(prompt)
print("\nModel output:")
# Use max_new_tokens for better control
print(generate_text(prompt, temperature=0.1))

Few-shot prompt:
Classify the sentiment of the following text

Input: This movie was fantastic!
Output: Positive

Input: I really didn't enjoy the food.
Output: Negative

Input: The weather is okay today.
Output: Neutral

Input: The service was excellent and the staff were friendly.
Output:

Model output:
Classify the sentiment of the following text

Input: This movie was fantastic!
Output: Positive

Input: I really didn't enjoy the food.
Output: Negative

Input: The weather is okay today.
Output: Neutral

Input: The service was excellent and the staff were friendly.
Output: Negative

Input: The weather is okay today.
Output: Positive

Output: Negative


Input: The weather is okay today.

Output: Negative
Output: Negative

Output: Positive

Output: Negative


## 6. LLM Applications

Let's explore some practical applications of LLMs.

In [27]:
# Application 1: Text Summarization
def summarize_text(text, model, tokenizer, max_summary_length=50):
    prompt = f"Summarize the following text in one sentence:\n\n{text}\n\nSummary:"
    return generate_text(prompt, max_length=len(prompt.split()) + max_summary_length)

# Example
long_text = """
Artificial intelligence has made significant strides in recent years,
particularly in the field of natural language processing. Large language
models like GPT, BERT, and T5 have demonstrated remarkable capabilities
in understanding and generating human-like text. These models are trained
on vast amounts of data and can perform various tasks such as translation,
summarization, and question answering without task-specific training.
"""

summary = summarize_text(long_text, model, tokenizer)
print("Original text:")
print(long_text)
print("\nSummary:")
print(summary)

Original text:

Artificial intelligence has made significant strides in recent years, 
particularly in the field of natural language processing. Large language 
models like GPT, BERT, and T5 have demonstrated remarkable capabilities 
in understanding and generating human-like text. These models are trained 
on vast amounts of data and can perform various tasks such as translation, 
summarization, and question answering without task-specific training.


Summary:
Summarize the following text in one sentence:


Artificial intelligence has made significant strides in recent years, 
particularly in the field of natural language processing. Large language 
models like GPT, BERT, and T5 have demonstrated remarkable capabilities 
in understanding and generating human-like text. These models are trained 
on vast amounts of data and can perform various tasks such as translation, 
summarization, and question answering without task-specific training.


Summary:


This is a unique method of trainin

In [28]:
# Application 2: Code Generation
def generate_code(description):
    prompt = f"# Python function that {description}\ndef"
    return generate_text(prompt, max_length=150, temperature=0.2)

# Example
code_description = "calculates the factorial of a number recursively"
generated_code = generate_code(code_description)
print("Generated code:")
print(generated_code)

Generated code:
# Python function that calculates the factorial of a number recursively
def sum(x,y) (x,y,y) = sum(x,y)
The function is called with the following arguments:
The first argument is the number of the number of arguments to the function. The second argument is


In [29]:
# Application 3: Question Answering
def answer_question(context, question):
    prompt = f"""Context: {context}

Question: {question}

Answer:"""
    return generate_text(prompt, max_length=len(prompt.split()) + 50, temperature=0.3)

# Example
context = """The Transformer architecture was introduced in the paper 'Attention is All You Need'
by Vaswani et al. in 2017. It revolutionized NLP by replacing recurrent layers with
self-attention mechanisms, allowing for better parallelization and capturing long-range dependencies."""

question = "When was the Transformer architecture introduced?"
answer = answer_question(context, question)
print(f"Question: {question}")
print(f"Answer: {answer}")

Question: When was the Transformer architecture introduced?
Answer: Context: The Transformer architecture was introduced in the paper 'Attention is All You Need' 
by Vaswani et al. in 2017. It revolutionized NLP by replacing recurrent layers with 
self-attention mechanisms, allowing for better parallelization and capturing long-range dependencies.

Question: When was the Transformer architecture introduced?

Answer: The Transformer architecture was introduced in the paper 'The Transformer Architecture'

by Vaswani et al. in 2017. It revolutionized NLP by replacing recurrent layers with 

self-attention mechanisms.

Question:


## 7. Challenges and Exercises

Now it's time to test your understanding with these challenges!

### Challenge 1: Implement Temperature Sampling

Implement a function that demonstrates how temperature affects text generation. Generate the same prompt with different temperature values and compare the outputs.

In [30]:
# Challenge 1: Your code here
def compare_temperatures(prompt, temperatures=[0.1, 0.5, 1.0, 1.5]):
    """
    Generate text with different temperature values and compare outputs

    TODO:
    1. For each temperature value, generate text
    2. Display the results side by side
    3. Analyze how temperature affects creativity/randomness
    """
    # Your implementation here
    pass

# Test your function
# compare_temperatures("The meaning of life is")

### Challenge 2: Build a Custom Few-Shot Classifier

Create a few-shot classifier for a custom task (e.g., classifying programming languages from code snippets).

In [31]:
# Challenge 2: Your code here
class FewShotClassifier:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.examples = []

    def add_example(self, input_text, label):
        """
        Add a training example for few-shot learning
        """
        # Your implementation here
        pass

    def classify(self, input_text):
        """
        Classify the input text using few-shot learning
        """
        # Your implementation here
        pass

# Test your classifier
# classifier = FewShotClassifier(model, tokenizer)
# Add examples and test classification

### Challenge 3: Implement Beam Search

Implement beam search decoding for text generation and compare it with greedy decoding.

In [32]:
# Challenge 3: Your code here
def beam_search_generate(model, tokenizer, prompt, beam_width=3, max_length=50):
    """
    Implement beam search for text generation

    TODO:
    1. Tokenize the prompt
    2. Maintain top-k sequences at each step
    3. Expand each sequence and keep top-k overall
    4. Return the best sequence
    """
    # Your implementation here
    pass

# Compare with greedy decoding
# prompt = "The future of technology"
# beam_output = beam_search_generate(model, tokenizer, prompt)
# greedy_output = generate_text(prompt, temperature=0.0)  # Greedy when temp=0

### Challenge 4: Prompt Optimization

Design and test different prompt templates for a specific task (e.g., translation, style transfer) and evaluate which works best.

In [33]:
# Challenge 4: Your code here
def evaluate_prompts(task_description, test_cases, prompt_templates):
    """
    Evaluate different prompt templates for a task

    TODO:
    1. Design at least 3 different prompt templates
    2. Test each template on the test cases
    3. Implement a scoring mechanism
    4. Return the best-performing template
    """
    # Your implementation here
    pass

# Example task: Style transfer (formal to casual)
# test_cases = [
#     "I would like to request your assistance with this matter.",
#     "Please find the attached document for your review."
# ]
# prompt_templates = [
#     # Template 1, Template 2, Template 3...
# ]

### Challenge 5: Build a Simple RAG System

Implement a basic Retrieval-Augmented Generation (RAG) system that retrieves relevant context before generating answers.

In [34]:
# Challenge 5: Your code here
class SimpleRAG:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.knowledge_base = []

    def add_document(self, document):
        """
        Add a document to the knowledge base
        """
        # Your implementation here
        pass

    def retrieve(self, query, k=3):
        """
        Retrieve top-k relevant documents for the query

        TODO:
        1. Implement a simple similarity metric (e.g., word overlap)
        2. Return top-k most relevant documents
        """
        # Your implementation here
        pass

    def generate_answer(self, query):
        """
        Generate an answer using retrieved context
        """
        # Your implementation here
        pass

# Test your RAG system
# rag = SimpleRAG(model, tokenizer)
# Add some documents and test question answering

### Challenge 6: Analyze Model Biases

Design experiments to test for potential biases in the language model and propose mitigation strategies.

In [35]:
# Challenge 6: Your code here
def analyze_bias(model, tokenizer, bias_type='gender'):
    """
    Analyze potential biases in model outputs

    TODO:
    1. Design test prompts that might reveal biases
    2. Generate responses and analyze patterns
    3. Quantify bias if possible
    4. Suggest mitigation strategies
    """
    # Your implementation here
    pass

# Example test cases for gender bias
# test_prompts = [
#     "The nurse said",
#     "The engineer said",
#     "The CEO decided to",
#     "The secretary was"
# ]

### Challenge 7: Implement Perplexity Calculation

Calculate the perplexity of the model on a given text corpus to evaluate model performance.

In [36]:
# Challenge 7: Your code here
def calculate_perplexity(model, tokenizer, text_corpus):
    """
    Calculate perplexity of the model on a text corpus

    Perplexity = exp(average negative log-likelihood)

    TODO:
    1. Tokenize the text corpus
    2. Calculate log probabilities for each token
    3. Compute average negative log-likelihood
    4. Return perplexity
    """
    # Your implementation here
    pass

# Test on sample texts
# sample_texts = [
#     "The quick brown fox jumps over the lazy dog.",
#     "Machine learning is a subset of artificial intelligence.",
#     "asdfjkl qwerty zxcvbn"  # Random text for comparison
# ]

### Challenge 8: Create a Dialogue System

Build a simple dialogue system that maintains context across multiple turns of conversation.

In [37]:
# Challenge 8: Your code here
class DialogueSystem:
    def __init__(self, model, tokenizer, max_history=5):
        self.model = model
        self.tokenizer = tokenizer
        self.max_history = max_history
        self.conversation_history = []

    def add_turn(self, speaker, text):
        """
        Add a conversation turn to history
        """
        # Your implementation here
        pass

    def generate_response(self, user_input):
        """
        Generate a response considering conversation history

        TODO:
        1. Format conversation history as context
        2. Create appropriate prompt
        3. Generate response
        4. Update conversation history
        """
        # Your implementation here
        pass

    def reset_conversation(self):
        """Reset conversation history"""
        self.conversation_history = []

# Test the dialogue system
# dialogue = DialogueSystem(model, tokenizer)
# Simulate a multi-turn conversation

## Bonus Challenges

### Advanced Challenge 1: Implement LoRA (Low-Rank Adaptation)
Research and implement a simple version of LoRA for efficient fine-tuning.

### Advanced Challenge 2: Build a Token Prediction Visualizer
Create a visualization tool that shows the top-k predicted tokens at each generation step.

### Advanced Challenge 3: Implement Constrained Generation
Build a system that generates text with constraints (e.g., must include certain words, follow a specific pattern).

## Resources for Further Learning

1. **Papers to Read:**
   - "Attention Is All You Need" (Vaswani et al., 2017)
   - "Language Models are Few-Shot Learners" (GPT-3 paper)
   - "BERT: Pre-training of Deep Bidirectional Transformers"

2. **Useful Libraries:**
   - Hugging Face Transformers
   - LangChain for LLM applications
   - OpenAI API for GPT models

3. **Online Resources:**
   - Hugging Face Course
   - Fast.ai Practical Deep Learning
   - The Illustrated Transformer

4. **Practice Platforms:**
   - Kaggle NLP competitions
   - Hugging Face Model Hub
   - Papers with Code