[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dz-web3/DS-Tech-2026spring/blob/main/Module8_LLM_Finetuning/LLM_Finetuning.ipynb)

**Click the badge above to open this notebook in Google Colab!**

# Module 8: Fine-Tuning Large Language Models (LLMs)

**Data Science for Business (Technical) ‚Äî Spring 2026**

---

## Learning Objectives

By the end of this module, you will be able to:

1. **Explain** the difference between pre-training, fine-tuning, and prompting
2. **Understand** when fine-tuning is the right choice vs. other approaches
3. **Perform** hands-on fine-tuning using Hugging Face and LoRA
4. **Evaluate** the business value and trade-offs of fine-tuning LLMs

---

## Why This Matters for Business

Large Language Models like GPT-4, Claude, and Llama are transforming how businesses operate. But **off-the-shelf models don't always fit your specific needs**. Fine-tuning allows you to:

- üéØ **Customize** model behavior for your domain (legal, medical, customer service)
- üí∞ **Reduce costs** by using smaller, specialized models instead of expensive large ones
- üîê **Maintain control** over your data and model behavior
- ‚ö° **Improve performance** on specific tasks your business cares about

## 1. Setting Up Google Colab Pro (Free for NYU Students)

### üéì NYU Students: Get Free Colab Pro!

Google offers **free Colab Pro subscriptions** for students at U.S. higher education institutions.

**To claim your free subscription:**

1. Go to [colab.research.google.com](https://colab.research.google.com)
2. Click on the gear icon (‚öôÔ∏è) ‚Üí "Colab Pro"
3. Select "Colab Pro for Education"
4. Verify your student status using your NYU email
5. You'll receive a **1-year free subscription** with more compute resources

### üñ•Ô∏è Enabling GPU for This Notebook

Fine-tuning requires a GPU. Here's how to enable it:

1. Go to **Runtime** ‚Üí **Change runtime type**
2. Set **Hardware accelerator** to **T4 GPU** (or A100 if available with Colab Pro)
3. Click **Save**

Run the cell below to verify GPU is enabled:

In [None]:
# Check if GPU is available
import torch
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"‚úÖ GPU is enabled: {gpu_name}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå GPU is NOT enabled. Please go to Runtime ‚Üí Change runtime type ‚Üí Select GPU")

## 2. The LLM Customization Spectrum

Before diving into fine-tuning, let's understand where it fits in the broader landscape of LLM customization:

| Approach | Effort | Data Needed | Use Case |
|----------|--------|-------------|----------|
| **Prompting** | Low | None | Quick tasks, general use |
| **Few-shot Learning** | Low | 5-20 examples | Demonstrate desired format |
| **RAG** (Retrieval) | Medium | Documents | Add knowledge, keep model current |
| **Fine-tuning** | High | 100-10,000+ examples | Change model behavior/style |
| **Pre-training** | Very High | Billions of tokens | Build from scratch (rarely needed) |

### When Should You Fine-Tune?

‚úÖ **Fine-tune when you want to:**
- Change the model's communication style consistently
- Make the model follow specific formats/templates
- Teach domain-specific terminology or behavior
- Improve reliability on repetitive tasks

‚ùå **Don't fine-tune when you can:**
- Solve the problem with better prompts
- Use RAG to add relevant knowledge
- Use few-shot examples in the prompt

## 3. Understanding Fine-Tuning: The Concept

### Pre-training vs Fine-tuning

**Pre-training** is like giving someone a general education:
- The model learns from massive amounts of text (books, websites, code)
- It learns language patterns, facts, and reasoning
- This is expensive: millions of dollars, weeks of compute time
- Done by companies like Meta (Llama), OpenAI (GPT), Google (Gemini)

**Fine-tuning** is like specialized job training:
- Start with a pre-trained model that already "knows" language
- Train it on your specific examples to learn your style/domain
- Much cheaper: can be done in minutes to hours on a single GPU
- This is what **you** can do!

### LoRA: Efficient Fine-Tuning

Traditional fine-tuning updates **all** model parameters ‚Äî expensive and slow.

**LoRA (Low-Rank Adaptation)** is a clever technique that:
- Freezes the original model weights
- Adds small "adapter" layers that learn your specific task
- Only trains ~1% of the parameters
- Result: **Same quality, 10x less memory, 10x faster!**

Think of it like this: instead of rewriting an entire textbook, you're adding sticky notes with your customizations.

## 4. Hands-On: Fine-Tuning with Hugging Face

Now let's actually fine-tune a model! We'll use:

- **Model**: GPT-2 Small (a classic, well-tested model perfect for learning)
- **Library**: Hugging Face Transformers + PEFT (Parameter-Efficient Fine-Tuning)
- **Technique**: LoRA adapters
- **Task**: Create a customer service chatbot

### Step 1: Install Dependencies

This will take about 1 minute. ‚òï

In [None]:
%%capture
# Install Hugging Face libraries
!pip install transformers datasets peft accelerate trl -q

In [None]:
# Verify installation
import transformers
import peft
print(f"‚úÖ Transformers version: {transformers.__version__}")
print(f"‚úÖ PEFT version: {peft.__version__}")

### Step 2: Load the Base Model

We'll load GPT-2, a smaller model that's great for learning fine-tuning concepts.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Model configuration
model_name = "gpt2"  # 124M parameters - fast and efficient

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token  # GPT-2 doesn't have a pad token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)

print(f"‚úÖ Model loaded: {model_name}")
print(f"   Parameters: ~124 million")

### Step 3: Add LoRA Adapters

Now we add the LoRA adapters ‚Äî the small trainable layers that will learn our task.

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

# LoRA configuration
lora_config = LoraConfig(
    r=16,  # LoRA rank - higher = more capacity but more memory
    lora_alpha=32,
    target_modules=["c_attn", "c_proj"],  # GPT-2 attention layers
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

# Count trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"‚úÖ LoRA adapters added!")
print(f"   Trainable parameters: {trainable_params:,} ({100*trainable_params/total_params:.2f}%)")
print(f"   Total parameters: {total_params:,}")

### Step 4: Prepare the Training Data

We'll use a simple customer service dataset. In real scenarios, you'd use your own business data.

In [None]:
# Sample customer service training data
training_data = [
    {"instruction": "What is your return policy?", "response": "Our return policy allows returns within 30 days of purchase with a valid receipt. Items must be in original condition with tags attached."},
    {"instruction": "How do I track my order?", "response": "You can track your order by logging into your account and clicking 'Order History', or use the tracking number from your shipping confirmation email."},
    {"instruction": "Do you offer international shipping?", "response": "Yes, we ship to over 50 countries worldwide. International shipping rates and delivery times vary by destination."},
    {"instruction": "How can I cancel my order?", "response": "To cancel an order, please contact us within 2 hours of placing it. Once an order has been processed for shipping, it cannot be cancelled."},
    {"instruction": "What payment methods do you accept?", "response": "We accept all major credit cards (Visa, Mastercard, Amex), PayPal, Apple Pay, and Google Pay."},
    {"instruction": "How do I reset my password?", "response": "Click 'Forgot Password' on the login page, enter your email, and we'll send you a reset link valid for 24 hours."},
    {"instruction": "Is my personal information secure?", "response": "Yes, we use industry-standard SSL encryption and never share your data with third parties. Your security is our priority."},
    {"instruction": "How do I apply a discount code?", "response": "Enter your discount code in the 'Promo Code' field at checkout and click 'Apply'. The discount will be reflected in your order total."},
    {"instruction": "What are your store hours?", "response": "Our online store is available 24/7. For physical locations, hours vary by store - please check our store locator for specific hours."},
    {"instruction": "How do I contact customer support?", "response": "You can reach us via live chat on our website, email at support@example.com, or call 1-800-EXAMPLE Monday-Friday 9am-6pm EST."},
    {"instruction": "Do you price match?", "response": "Yes, we offer price matching within 14 days of purchase if you find the same item at a lower price from an authorized retailer."},
    {"instruction": "How long does shipping take?", "response": "Standard shipping takes 5-7 business days. Express shipping (2-3 days) and overnight options are also available at checkout."},
    {"instruction": "Can I change my shipping address?", "response": "You can update your shipping address before the order ships by contacting customer support. Once shipped, address changes are not possible."},
    {"instruction": "Do you have a loyalty program?", "response": "Yes! Join our rewards program for free to earn points on every purchase, receive exclusive discounts, and get early access to sales."},
    {"instruction": "What if my item arrives damaged?", "response": "We're sorry to hear that! Please contact us within 48 hours with photos of the damage, and we'll send a replacement or issue a full refund."},
]

print(f"‚úÖ Training data prepared: {len(training_data)} examples")
print(f"\nExample:")
print(f"  Q: {training_data[0]['instruction']}")
print(f"  A: {training_data[0]['response']}")

In [None]:
from datasets import Dataset

# Format data for training
def format_prompt(example):
    """Format the data into a simple chat template"""
    text = f"""### Customer Service Bot

Customer: {example['instruction']}
Agent: {example['response']}
"""
    return {"text": text}

# Create dataset
dataset = Dataset.from_list(training_data)
dataset = dataset.map(format_prompt)

print(f"‚úÖ Dataset formatted!")
print(f"\nSample formatted prompt:")
print(dataset[0]['text'])

### Step 5: Train the Model! üöÄ

This is where the magic happens. Training will take approximately **3-5 minutes** on a T4 GPU.

Watch the loss decrease ‚Äî that means the model is learning!

In [None]:
from trl import SFTTrainer, SFTConfig

# Training configuration
training_args = SFTConfig(
    output_dir="./customer_service_model",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    logging_steps=5,
    save_strategy="no",
    report_to="none",
    max_seq_length=256,
)

# Create trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

print("üöÄ Starting training...")
print("   This will take ~3-5 minutes on a T4 GPU")
print("   Watch the 'loss' value decrease - that means learning is happening!\n")

In [None]:
# Run training
trainer.train()

print(f"\n‚úÖ Training complete!")

### Step 6: Test the Fine-Tuned Model

Let's see if our model learned to be a good customer service assistant!

In [None]:
# Set model to evaluation mode
model.eval()

def ask_customer_service(question):
    """Ask our fine-tuned customer service model a question"""
    prompt = f"""### Customer Service Bot

Customer: {question}
Agent:"""
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract just the agent's response
    response = response.split("Agent:")[-1].strip()
    # Stop at newline or end
    response = response.split("\n")[0].strip()
    return response

print("ü§ñ Customer Service Bot Ready!")
print("="*50)

In [None]:
# Test with questions from training data
test_questions = [
    "What's your return policy?",
    "How can I track my package?",
    "Do you ship internationally?",
]

print("üìù Testing with training-similar questions:\n")
for q in test_questions:
    print(f"Customer: {q}")
    print(f"Bot: {ask_customer_service(q)}")
    print("-" * 40)

In [None]:
# Test with NEW questions (not in training data)
new_questions = [
    "Can I get a refund if I don't like the product?",
    "What happens if my package is lost?",
    "Do you have a size guide?",
]

print("üÜï Testing with NEW questions (not in training data):\n")
for q in new_questions:
    print(f"Customer: {q}")
    print(f"Bot: {ask_customer_service(q)}")
    print("-" * 40)

### Step 7: Save the Model (Optional)

If you want to use this model later, you can save it.

In [None]:
# Save the LoRA adapters (small, ~10MB)
model.save_pretrained("customer_service_lora")
tokenizer.save_pretrained("customer_service_lora")
print("‚úÖ Model saved to 'customer_service_lora' folder")

# Check the size
!du -sh customer_service_lora/

## 5. Business Applications & Decision Framework

### Real-World Use Cases

| Company | Application | Why Fine-Tuning? |
|---------|-------------|------------------|
| **Legal Tech** | Contract analysis | Domain-specific terminology |
| **Healthcare** | Patient communication | Regulatory compliance, tone |
| **E-commerce** | Customer service bots | Brand voice, product knowledge |
| **Finance** | Report generation | Consistent formatting, compliance |
| **Education** | Tutoring assistants | Teaching style, curriculum alignment |

### Cost-Benefit Analysis

**Fine-tuning costs:**
- Compute: ~$1-10 for small models, ~$100-1000 for large models
- Data preparation: Often the largest cost (human time to create/curate examples)
- Iteration: Usually need 2-5 rounds to get it right

**Fine-tuning benefits:**
- 10-100x cheaper inference than prompting with examples
- More consistent behavior
- Faster response times (no need for long prompts)
- Can use smaller, cheaper models

### Decision Framework

```
Start with Prompting
        ‚Üì
Works well enough? ‚Üí YES ‚Üí Stop here! üéâ
        ‚Üì NO
Need external knowledge? ‚Üí YES ‚Üí Try RAG first
        ‚Üì NO
Have 100+ good examples? ‚Üí NO ‚Üí Collect more data
        ‚Üì YES
Fine-tune! ‚Üí Evaluate ‚Üí Iterate
```

## 6. Summary & Key Takeaways

### What We Learned

1. **Fine-tuning** adapts a pre-trained model to your specific needs
2. **LoRA** makes fine-tuning efficient (train only ~1% of parameters)
3. **Hugging Face** provides easy-to-use tools for fine-tuning
4. **Business value** comes from consistency, cost reduction, and customization

### What We Did

- ‚úÖ Loaded GPT-2 (a 124 million parameter model)
- ‚úÖ Added LoRA adapters for efficient training
- ‚úÖ Fine-tuned on customer service data
- ‚úÖ Tested the model on new questions
- ‚úÖ Saved the model for future use

### Next Steps for Your Career

1. **Experiment**: Try fine-tuning with your own data
2. **Explore**: Look into Hugging Face Hub, OpenAI fine-tuning API
3. **Stay current**: This field moves fast ‚Äî follow AI news!

---

*Questions? Reach out during office hours or on the course forum.*

## üìù Required Tasks

Complete the following two task notebooks to practice what you've learned:

### Task 1: Sentiment Fine-Tuning
**Notebook**: `Task1_Sentiment_Finetuning.ipynb`

[![Open Task 1 in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dz-web3/DS-Tech-2026spring/blob/main/Module8_LLM_Finetuning/Task1_Sentiment_Finetuning.ipynb)

In this task, you will:
- Fine-tune a model on product reviews
- Add your own training examples
- Test the model on custom text

---

### Task 2: Prompting vs Fine-Tuning
**Notebook**: `Task2_Prompting_vs_Finetuning.ipynb`

[![Open Task 2 in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dz-web3/DS-Tech-2026spring/blob/main/Module8_LLM_Finetuning/Task2_Prompting_vs_Finetuning.ipynb)

In this task, you will:
- Compare zero-shot vs few-shot prompting
- Improve prompts for better responses
- Create a decision framework for business scenarios