# Live Demo: Working with Language Models in Google Colab

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Set up Google Colab for GPU-accelerated language model work
* Download and run a small language model (Gemma 2B) from Hugging Face
* Understand key concepts: tokenization, probability distributions, temperature effects
* Demonstrate controlled text generation for computational social science applications
* Learn practical considerations for model deployment and cost management

</div>

### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Sections
1. [Colab Setup and GPU Configuration](#setup)
2. [Installing Requirements and Loading Models](#install)
3. [Basic Text Generation](#generation)
4. [Understanding Tokenization](#tokenization)
5. [Probability Distributions and Temperature](#probability)
6. [Controlled Generation for Social Science](#controlled)
7. [Practical Considerations](#practical)

<a id='setup'></a>

# Google Colab Setup and GPU Configuration

**Why Google Colab for Language Models?**

In computational social science, we often need to choose the right tool for the task. Today we're switching to Google Colab because:

- **GPU Access**: Language models require significant computational power
- **Memory**: Modern LLMs need substantial RAM (8GB+ for small models)
- **Ease of Setup**: No local installation headaches
- **Cost**: Free tier provides reasonable access for experimentation
- **Collaboration**: Easy sharing and reproducibility

This is a common pattern in computational social science - matching computational resources to research needs.

## 🎬 Demo: Colab Orientation (5 minutes)

**Step 1: Enable GPU Runtime**
1. Go to `Runtime` → `Change runtime type`
2. Set `Hardware accelerator` to `GPU`
3. Choose `T4 GPU` (free tier) or `V100` (if available)
4. Click `Save`

**Step 2: Verify GPU Access**

In [None]:
# Check if GPU is available
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ No GPU detected. Check your runtime settings!")

**Step 3: Check System Resources**

In [None]:
# Check system resources
import psutil
import os

print(f"CPU cores: {psutil.cpu_count()}")
print(f"RAM: {psutil.virtual_memory().total / 1e9:.1f} GB")
print(f"Available RAM: {psutil.virtual_memory().available / 1e9:.1f} GB")
print(f"Disk space: {psutil.disk_usage('/').total / 1e9:.1f} GB")

<a id='install'></a>

# Installing Requirements and Loading Models

💡 **Tip**: In Colab, package installations persist only for the current session. Each time you restart, you'll need to reinstall.

## Install Required Packages

In [None]:
# Install required packages
!pip install transformers accelerate torch
!pip install matplotlib seaborn pandas numpy
!pip install gradio  # For interactive demos later

## Download Gemma 2B Model from Hugging Face

We'll use **Gemma 2B**, a small but capable language model from Google. It's perfect for educational purposes because:
- Small enough to run on free Colab GPUs
- Modern architecture (based on Transformer)
- Good performance for its size
- Well-documented and actively maintained

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Model selection - using Gemma 2B Instruct for better instruction following
model_name = "google/gemma-2b-it"  # "it" stands for "instruction tuned"

print(f"Loading model: {model_name}")
print("This may take a few minutes on first run...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model with appropriate settings for Colab
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use half precision to save memory
    device_map="auto",          # Automatically map to available GPU
    trust_remote_code=True      # Required for some models
)

print(f"✓ Model loaded successfully!")
print(f"Model device: {model.device}")
print(f"Model dtype: {model.dtype}")

⚠️ **Warning**: If you get memory errors, try these solutions:
1. Restart runtime and try again
2. Use a smaller model like `microsoft/DialoGPT-small`
3. Reduce batch size in generation

<a id='generation'></a>

# Basic Text Generation

Now let's generate text about social issues to see how the model behaves.

## Simple Text Generation Function

In [None]:
def generate_text(prompt, max_length=100, temperature=0.7, num_return_sequences=1):
    """
    Generate text using the loaded model
    
    Args:
        prompt: Input text to continue
        max_length: Maximum length of generated text
        temperature: Controls randomness (0.0 = deterministic, 1.0+ = creative)
        num_return_sequences: Number of different completions to generate
    """
    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=temperature,
            num_return_sequences=num_return_sequences,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode outputs
    generated_texts = []
    for output in outputs:
        text = tokenizer.decode(output, skip_special_tokens=True)
        # Remove the original prompt from the output
        text = text[len(prompt):] if text.startswith(prompt) else text
        generated_texts.append(text.strip())
    
    return generated_texts

## Generate Multiple Completions for Social Issues

In [None]:
# Social science prompt
prompt = "The impact of social media on political polarization"

print(f"Prompt: '{prompt}'\n")
print("Generating 3 different completions...\n")

# Generate multiple completions
completions = generate_text(
    prompt, 
    max_length=150, 
    temperature=0.8, 
    num_return_sequences=3
)

for i, completion in enumerate(completions, 1):
    print(f"Completion {i}:")
    print(f"'{completion}'\n")
    print("-" * 80)

🔔 **Question**: What do you notice about the different completions? How do they vary in content and style?

## 🥊 Challenge: Try Different Social Science Prompts

Experiment with different prompts related to your research interests:

In [None]:
# Try different social science prompts
prompts = [
    "Climate change attitudes vary across demographic groups because",
    "The relationship between education and voting behavior",
    "Economic inequality affects social cohesion by",
    "Gender representation in leadership positions"
]

# YOUR CODE HERE: Choose a prompt and generate text
chosen_prompt = prompts[0]  # Change this index

completions = generate_text(chosen_prompt, max_length=120, temperature=0.7)
print(f"Prompt: {chosen_prompt}")
print(f"Completion: {completions[0]}")

<a id='tokenization'></a>

# Understanding Tokenization

Tokenization is the process of breaking text into smaller units (tokens) that the model can process. Understanding tokenization is crucial for working with language models effectively.

## 🎬 Demo: How "Social Science" Becomes Tokens

In [None]:
# Let's see how different texts get tokenized
examples = [
    "social science",
    "computational social science",
    "artificial intelligence",
    "democratization",
    "anti-establishment"
]

print("Tokenization Examples:")
print("=" * 50)

for text in examples:
    # Tokenize the text
    tokens = tokenizer.tokenize(text)
    token_ids = tokenizer.encode(text)
    
    print(f"Text: '{text}'")
    print(f"Tokens: {tokens}")
    print(f"Token IDs: {token_ids}")
    print(f"Number of tokens: {len(tokens)}")
    print("-" * 30)

## Subword Tokenization

Modern language models use **subword tokenization** (like BPE - Byte Pair Encoding). This approach:
- Handles out-of-vocabulary words better
- Creates a manageable vocabulary size
- Balances between character-level and word-level processing

In [None]:
# Let's examine the vocabulary
vocab_size = tokenizer.vocab_size
print(f"Vocabulary size: {vocab_size:,} tokens")

# Special tokens
special_tokens = tokenizer.special_tokens_map
print(f"\nSpecial tokens: {special_tokens}")

# Let's see some example tokens from the vocabulary
print("\nSample vocabulary (first 20 tokens):")
for i in range(20):
    token = tokenizer.decode([i])
    print(f"ID {i}: '{token}'")

## Token Length and Text Length

In [None]:
# Relationship between text length and token count
texts = [
    "AI",
    "The quick brown fox",
    "Social media platforms influence political discourse",
    "Computational social science combines traditional social science methods with computational tools and big data analytics"
]

print("Text Length vs Token Count:")
print("=" * 60)

for text in texts:
    char_count = len(text)
    word_count = len(text.split())
    token_count = len(tokenizer.encode(text))
    
    print(f"Text: '{text}'")
    print(f"Characters: {char_count}, Words: {word_count}, Tokens: {token_count}")
    print(f"Tokens per word: {token_count/word_count:.2f}")
    print("-" * 40)

<a id='probability'></a>

# Probability Distributions and Temperature

Language models work by predicting probability distributions over the vocabulary for the next token. Understanding these probabilities helps us control generation behavior.

## 🎬 Demo: Show Top-k Tokens and Their Probabilities

In [None]:
import torch.nn.functional as F
import pandas as pd

def get_next_token_probabilities(prompt, top_k=10):
    """
    Get the top-k most likely next tokens and their probabilities
    """
    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Get model predictions
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits[0, -1, :]  # Last token predictions
    
    # Convert to probabilities
    probs = F.softmax(logits, dim=-1)
    
    # Get top-k tokens
    top_probs, top_indices = torch.topk(probs, top_k)
    
    # Decode tokens
    results = []
    for prob, idx in zip(top_probs, top_indices):
        token = tokenizer.decode([idx.item()])
        results.append({
            'token': repr(token),  # repr to show whitespace/special chars
            'probability': prob.item(),
            'percentage': prob.item() * 100
        })
    
    return results

# Example: What comes after "Climate change is"
prompt = "Climate change is"
top_tokens = get_next_token_probabilities(prompt, top_k=15)

print(f"Top next tokens for: '{prompt}'")
print("=" * 50)

df = pd.DataFrame(top_tokens)
print(df.to_string(index=False))

## Temperature Effects on Generation

**Temperature** controls the randomness of generation:
- **Low temperature (0.1-0.3)**: More deterministic, conservative
- **Medium temperature (0.7-1.0)**: Balanced creativity and coherence  
- **High temperature (1.5+)**: More random, creative, potentially incoherent

In [None]:
# Demonstrate temperature effects
prompt = "Social media algorithms"
temperatures = [0.1, 0.7, 1.2]

print(f"Temperature Effects on: '{prompt}'")
print("=" * 60)

for temp in temperatures:
    print(f"\nTemperature: {temp}")
    print("-" * 20)
    
    # Generate 3 samples at this temperature
    for i in range(3):
        completion = generate_text(prompt, max_length=80, temperature=temp, num_return_sequences=1)[0]
        print(f"Sample {i+1}: {completion[:100]}...")
    print()

## Visualizing Probability Distributions

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Compare probability distributions for different prompts
prompts = [
    "Democracy is",
    "Artificial intelligence will",
    "The research shows that"
]

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for i, prompt in enumerate(prompts):
    top_tokens = get_next_token_probabilities(prompt, top_k=8)
    
    tokens = [item['token'].replace("'", "").replace('"', '') for item in top_tokens]
    probs = [item['percentage'] for item in top_tokens]
    
    axes[i].bar(range(len(tokens)), probs)
    axes[i].set_title(f"'{prompt}'")
    axes[i].set_xlabel("Next Token")
    axes[i].set_ylabel("Probability (%)")
    axes[i].set_xticks(range(len(tokens)))
    axes[i].set_xticklabels(tokens, rotation=45, ha='right')

plt.tight_layout()
plt.show()

<a id='controlled'></a>

# Controlled Generation for Social Science

For computational social science applications, we often want to control generation to study specific phenomena or generate data with particular characteristics.

## 🎬 Demo: Sentiment-Controlled Generation

Let's demonstrate how to bias generation toward positive or negative sentiment by manipulating the probability distribution.

In [None]:
# Define sentiment word lists (simplified for demo)
positive_words = [
    "great", "excellent", "wonderful", "amazing", "fantastic", 
    "good", "positive", "beneficial", "helpful", "successful"
]

negative_words = [
    "terrible", "awful", "horrible", "bad", "negative", 
    "harmful", "dangerous", "problematic", "concerning", "disappointing"
]

def controlled_generation(prompt, sentiment="neutral", strength=2.0, max_length=100):
    """
    Generate text with sentiment control by biasing token probabilities
    
    Args:
        prompt: Input text
        sentiment: 'positive', 'negative', or 'neutral'
        strength: How much to bias (higher = stronger bias)
        max_length: Maximum generation length
    """
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Get token IDs for sentiment words
    if sentiment == "positive":
        target_words = positive_words
    elif sentiment == "negative":
        target_words = negative_words
    else:
        target_words = []
    
    target_token_ids = []
    for word in target_words:
        # Get token IDs for each word (some words might be multiple tokens)
        tokens = tokenizer.encode(word, add_special_tokens=False)
        target_token_ids.extend(tokens)
    
    # Custom generation function with logit manipulation
    generated = inputs['input_ids'].clone()
    
    for _ in range(max_length - len(generated[0])):
        with torch.no_grad():
            outputs = model(generated)
            logits = outputs.logits[0, -1, :]
            
            # Bias logits toward sentiment words
            if target_token_ids:
                for token_id in target_token_ids:
                    if token_id < len(logits):
                        logits[token_id] += strength
            
            # Sample next token
            probs = F.softmax(logits / 0.8, dim=-1)  # Temperature = 0.8
            next_token = torch.multinomial(probs, 1)
            
            # Append to generated sequence
            generated = torch.cat([generated, next_token.unsqueeze(0)], dim=1)
            
            # Stop if we hit EOS token
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    # Decode result
    result = tokenizer.decode(generated[0], skip_special_tokens=True)
    return result[len(prompt):].strip()

# Test controlled generation
base_prompt = "The new government policy on education is"

print(f"Base prompt: '{base_prompt}'\n")

sentiments = ["neutral", "positive", "negative"]
for sentiment in sentiments:
    print(f"{sentiment.upper()} generation:")
    completion = controlled_generation(base_prompt, sentiment=sentiment, strength=3.0)
    print(f"{completion}\n")
    print("-" * 60)

## 🥊 Challenge: Create a Bias Detection Tool

Create a function to detect potential biases in model outputs by generating multiple completions and analyzing patterns.

In [None]:
def bias_detection(prompt_template, num_samples=10):
    """
    Generate multiple completions to detect systematic biases
    
    Args:
        prompt_template: Template with {} for substitution
        num_samples: Number of completions to generate
    """
    # Test different demographic groups
    groups = ["men", "women", "young people", "older adults", "students", "professionals"]
    
    results = {}
    
    for group in groups:
        prompt = prompt_template.format(group)
        completions = []
        
        for _ in range(num_samples):
            completion = generate_text(prompt, max_length=80, temperature=0.8)[0]
            completions.append(completion)
        
        results[group] = completions
    
    return results

# Example: Test for potential bias in job-related statements
prompt_template = "{} are typically good at"

print(f"Testing prompt template: '{prompt_template}'\n")
print("Generating 5 samples per group...\n")

bias_results = bias_detection(prompt_template, num_samples=5)

for group, completions in bias_results.items():
    print(f"Group: {group}")
    for i, completion in enumerate(completions, 1):
        print(f"  {i}. {completion[:60]}...")
    print()

🔔 **Question**: What patterns do you notice in the completions for different groups? Are there any concerning biases?

<a id='practical'></a>

# Practical Considerations

When working with language models in computational social science research, several practical considerations are crucial.

## Common Issues and Solutions

### GPU Memory Limitations

In [None]:
# Check current GPU memory usage
if torch.cuda.is_available():
    allocated = torch.cuda.memory_allocated() / 1e9
    cached = torch.cuda.memory_reserved() / 1e9
    total = torch.cuda.get_device_properties(0).total_memory / 1e9
    
    print(f"GPU Memory Status:")
    print(f"  Allocated: {allocated:.2f} GB")
    print(f"  Cached: {cached:.2f} GB")
    print(f"  Total: {total:.2f} GB")
    print(f"  Available: {total - cached:.2f} GB")
    
    if cached > total * 0.8:
        print("⚠️ Warning: GPU memory usage is high!")
        print("Consider: reducing batch size, using smaller models, or clearing cache")

### Memory Management Tips

In [None]:
# Memory management functions
def clear_gpu_cache():
    """Clear GPU cache to free memory"""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        print("✓ GPU cache cleared")

def get_model_size(model):
    """Calculate model size in parameters"""
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    
    print(f"Model size:")
    print(f"  Total parameters: {total_params:,}")
    print(f"  Trainable parameters: {trainable_params:,}")
    print(f"  Size (approx): {total_params * 2 / 1e9:.2f} GB")  # fp16 = 2 bytes per param

get_model_size(model)
clear_gpu_cache()

## Model Loading Errors and Solutions

In [None]:
# Function to test different models with error handling
def test_model_loading(model_names):
    """Test loading different models and report what works"""
    results = []
    
    for model_name in model_names:
        try:
            print(f"Testing {model_name}...")
            test_tokenizer = AutoTokenizer.from_pretrained(model_name)
            test_model = AutoModelForCausalLM.from_pretrained(
                model_name, 
                torch_dtype=torch.float16,
                device_map="auto"
            )
            
            # Test generation
            inputs = test_tokenizer("Hello", return_tensors="pt").to(test_model.device)
            outputs = test_model.generate(**inputs, max_length=20)
            
            results.append((model_name, "✓ Success", ""))
            
            # Clean up
            del test_model, test_tokenizer
            clear_gpu_cache()
            
        except Exception as e:
            results.append((model_name, "✗ Failed", str(e)[:100]))
    
    return results

# Alternative models to try if Gemma doesn't work
alternative_models = [
    "microsoft/DialoGPT-small",    # Very small, should work on most systems
    "gpt2",                       # Classic GPT-2, reliable
    "distilgpt2"                  # Smaller version of GPT-2
]

print("If you're having issues with Gemma, try these alternatives:")
print("\nAlternative Models:")
for model_name in alternative_models:
    print(f"  - {model_name}")

## Rate Limiting and API Considerations

When using hosted models (like OpenAI's API), you need to consider:

In [None]:
import time
import random

def rate_limited_generation(prompts, delay_range=(1, 3)):
    """
    Generate text for multiple prompts with rate limiting
    
    Args:
        prompts: List of prompts to process
        delay_range: (min, max) seconds to wait between requests
    """
    results = []
    
    for i, prompt in enumerate(prompts):
        print(f"Processing prompt {i+1}/{len(prompts)}: {prompt[:50]}...")
        
        try:
            # Generate text
            completion = generate_text(prompt, max_length=100, temperature=0.7)[0]
            results.append({
                'prompt': prompt,
                'completion': completion,
                'status': 'success'
            })
            
        except Exception as e:
            results.append({
                'prompt': prompt,
                'completion': None,
                'status': f'error: {str(e)[:100]}'
            })
        
        # Add delay between requests (except for last one)
        if i < len(prompts) - 1:
            delay = random.uniform(*delay_range)
            print(f"Waiting {delay:.1f} seconds...")
            time.sleep(delay)
    
    return results

# Example usage
test_prompts = [
    "The impact of technology on democracy",
    "Social inequality in modern societies",
    "Climate change policy effectiveness"
]

print("Demo: Rate-limited batch processing")
batch_results = rate_limited_generation(test_prompts, delay_range=(0.5, 1.0))

print("\nResults:")
for result in batch_results:
    print(f"Status: {result['status']}")
    if result['completion']:
        print(f"Output: {result['completion'][:100]}...\n")

## When to Use Local vs Cloud Computing

### Local Computing (Your Computer)
**Pros:**
- Full control over data and processing
- No usage limits or costs
- Good for small models and development

**Cons:**
- Limited computational resources
- Requires good hardware for larger models
- Setup and maintenance overhead

### Cloud Computing (Colab, AWS, etc.)
**Pros:**
- Access to powerful GPUs
- Scalable resources
- No hardware investment

**Cons:**
- Usage limits and costs
- Data privacy considerations
- Internet dependency

## Cost Considerations

In [None]:
def estimate_costs(num_prompts, avg_tokens_per_prompt=100, tokens_per_second=50):
    """
    Estimate computational costs for a research project
    """
    total_tokens = num_prompts * avg_tokens_per_prompt
    processing_time = total_tokens / tokens_per_second
    
    # Rough cost estimates (as of 2024)
    colab_pro_cost = 0.0017 * (processing_time / 3600)  # $0.0017/hour for Colab Pro
    openai_gpt4_cost = total_tokens * 0.00003  # $0.03 per 1K tokens for GPT-4
    
    print(f"Cost Estimation for Research Project:")
    print(f"  Number of prompts: {num_prompts:,}")
    print(f"  Total tokens: {total_tokens:,}")
    print(f"  Processing time: {processing_time/3600:.2f} hours")
    print(f"\nEstimated Costs:")
    print(f"  Local/Colab Free: $0 (if within limits)")
    print(f"  Colab Pro: ${colab_pro_cost:.2f}")
    print(f"  OpenAI GPT-4: ${openai_gpt4_cost:.2f}")
    print(f"\nRecommendation:")
    if num_prompts < 100:
        print("  Use free Colab for small experiments")
    elif num_prompts < 1000:
        print("  Consider Colab Pro or local setup")
    else:
        print("  Budget for cloud computing or invest in local hardware")

# Example cost estimation
estimate_costs(500, avg_tokens_per_prompt=150)

## Model Size Trade-offs

In [None]:
# Model comparison table
model_comparison = {
    "Model": ["DistilGPT-2", "GPT-2", "Gemma-2B", "Gemma-7B", "GPT-3.5", "GPT-4"],
    "Parameters": ["82M", "117M", "2B", "7B", "175B", "1T+"],
    "Memory (GB)": ["0.3", "0.5", "4", "14", "350+", "?"],
    "Quality": ["Basic", "Good", "Very Good", "Excellent", "Excellent", "State-of-art"],
    "Speed": ["Fast", "Fast", "Medium", "Slow", "API", "API"],
    "Use Case": ["Development", "Experiments", "Research", "Production", "Production", "Production"]
}

import pandas as pd
df_comparison = pd.DataFrame(model_comparison)
print("Model Size Trade-offs:")
print(df_comparison.to_string(index=False))

print("\nGuidelines:")
print("- Start small for proof-of-concept")
print("- Scale up based on research needs")
print("- Consider computational budget")
print("- Evaluate quality vs. cost trade-offs")

## 🥊 Final Challenge: Design Your Research Pipeline

Design a computational pipeline for a social science research question:

In [None]:
def design_research_pipeline(research_question, num_conditions, samples_per_condition):
    """
    Design a research pipeline for systematic text generation studies
    """
    total_samples = num_conditions * samples_per_condition
    
    print(f"Research Pipeline Design")
    print(f"Research Question: {research_question}")
    print(f"Number of conditions: {num_conditions}")
    print(f"Samples per condition: {samples_per_condition}")
    print(f"Total samples needed: {total_samples}")
    
    # Estimate resources
    estimate_costs(total_samples)
    
    print(f"\nPipeline Steps:")
    print(f"1. Design prompt templates")
    print(f"2. Generate systematic variations")
    print(f"3. Batch process with rate limiting")
    print(f"4. Quality control and filtering")
    print(f"5. Analysis and interpretation")
    
    return {
        'total_samples': total_samples,
        'estimated_time_hours': total_samples / 100,  # Rough estimate
        'recommended_model': 'Gemma-2B' if total_samples < 1000 else 'API-based'
    }

# Example research design
pipeline = design_research_pipeline(
    research_question="How do language models represent different social groups in workplace contexts?",
    num_conditions=10,  # Different demographic groups
    samples_per_condition=50  # Completions per group
)

<div class="alert alert-success">

## ❗ Key Points

* **Tool Selection**: Choose computational resources (local vs. cloud) based on research needs and constraints
* **Tokenization**: Understanding how text becomes tokens is crucial for working with language models effectively
* **Probability Control**: Temperature and other parameters control the creativity/randomness of generation
* **Controlled Generation**: We can bias models toward specific outcomes for research purposes
* **Practical Limits**: GPU memory, rate limits, and costs are real constraints in research
* **Research Design**: Systematic approaches to text generation enable rigorous computational social science
* **Bias Detection**: Language models can exhibit biases that researchers need to identify and account for
* **Scalability**: Start small, validate approaches, then scale up based on research requirements
    
</div>

## Next Steps

**For Your Research:**
1. Identify specific research questions that could benefit from language model analysis
2. Design small pilot studies to test approaches
3. Consider ethical implications of generated content
4. Plan for computational resources and costs

**Technical Learning:**
1. Explore fine-tuning models on domain-specific data
2. Learn about prompt engineering techniques
3. Study bias detection and mitigation methods
4. Experiment with different model architectures

**Resources:**
- Hugging Face Transformers documentation
- Google Colab tutorials and best practices
- Academic papers on computational social science applications
- Online courses on natural language processing