# DiffusionLens: GPU-Accelerated Demo

This notebook demonstrates the DiffusionLens method for analyzing text encoders in diffusion models.

**Paper**: [Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines](https://arxiv.org/abs/2403.05846)

**What this does**: Extracts and visualizes intermediate layer representations from text encoders, showing how concepts progressively form across layers.

**Requirements**: 
- GPU runtime (Runtime → Change runtime type → GPU)
- ~15GB disk space for models
- 10-20 minutes for full demo

## 1. Setup: Clone Repository

In [None]:
# Clone the repository
!git clone https://github.com/AzulEye/Lens.git
%cd Lens

# Verify we're in the right directory
!ls -la

## 2. Install Dependencies

**Strategy**: Use pre-built wheels to avoid compilation issues, and install only necessary packages.

In [None]:
# Install dependencies that work with Colab
# Using versions that have pre-built wheels and don't require compilation
print("Installing compatible packages for Colab...\n")

# Install tokenizers first (pre-built version)
!pip install -q tokenizers==0.13.3

# Install transformers with compatible tokenizers
!pip install -q transformers==4.30.2 --no-deps
!pip install -q filelock huggingface-hub pyyaml regex requests packaging

# Install other core dependencies
!pip install -q matplotlib python-box

print("\n✅ Core dependencies installed")
print("\nInstalled versions:")
!pip show transformers tokenizers | grep -E '^Name:|^Version:'

## 3. Setup Modified Diffusers (diffusers_local)

The DiffusionLens method uses modified diffusion pipelines that support layer-wise text encoder extraction.

In [None]:
import os

# Download diffusers v0.20.2 (the version with our modifications)
!wget -q https://github.com/huggingface/diffusers/archive/refs/tags/v0.20.2.tar.gz
!tar -xzf v0.20.2.tar.gz
!mv diffusers-0.20.2 diffusers_local

# Copy our modified pipeline files
!cp pipeline_stable_diffusion.py diffusers_local/src/diffusers/pipelines/stable_diffusion/
!cp pipeline_if.py diffusers_local/src/diffusers/pipelines/deepfloyd_if/

# Verify the setup
if os.path.exists('diffusers_local/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py'):
    print("✅ diffusers_local setup complete")
else:
    print("❌ Error: pipeline files not found")

## 4. Verify GPU Availability

In [None]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
    print("\n✅ GPU ready for fast inference!")
else:
    print("\n⚠️  Warning: No GPU detected. Please enable GPU in Runtime → Change runtime type")
    print("The demo will run very slowly on CPU.")

## 5. Run DiffusionLens Demo

This will generate images showing how the text encoder progressively builds understanding across layers.

**Parameters:**
- Model: Stable Diffusion 1.4 (4GB, fast)
- Prompts: Authentic examples from the paper
- Layers: 0, 4, 8, 12 (early → mid → late → final)
- Images per layer: 1 (faster demo)

In [None]:
# Run the demo with paper prompts
!python run_experiment.py \
    --model_key sd1.4 \
    --img_num 1 \
    --generate \
    --start_layer 0 \
    --end_layer 12 \
    --step_layer 4 \
    --folder_name colab_demo \
    --input_filename paper_prompts.txt \
    --number_of_inputs 2

print("\n✅ Demo complete! Check the output below.")

## 6. Visualize Results

Display the generated images showing layer-by-layer concept formation.

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import os
from pathlib import Path

# Read the prompts we used
with open('inputs/paper_prompts.txt', 'r') as f:
    prompts = [line.strip() for line in f.readlines()[:2]]  # First 2 prompts

# Display results for each prompt
for prompt in prompts:
    print(f"\n{'='*80}")
    print(f"Prompt: {prompt}")
    print(f"{'='*80}\n")
    
    base_path = Path(f'colab_demo/{prompt}/sd1.4/encoder_full_direct')
    
    if not base_path.exists():
        print(f"⚠️  Output not found for: {prompt}")
        continue
    
    # Find all layer images
    layer_images = sorted(base_path.glob('layer_*.png'))
    
    if not layer_images:
        print(f"⚠️  No layer images found for: {prompt}")
        continue
    
    # Display in a grid
    num_layers = len(layer_images)
    fig, axes = plt.subplots(1, num_layers, figsize=(5*num_layers, 5))
    
    if num_layers == 1:
        axes = [axes]
    
    for idx, img_path in enumerate(layer_images):
        img = Image.open(img_path)
        layer_num = img_path.stem.split('_')[1]  # Extract layer number from filename
        
        axes[idx].imshow(img)
        axes[idx].set_title(f'Layer {layer_num}', fontsize=14, fontweight='bold')
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\nLayer progression:")
    print("  Layer 0:  Early - abstract concepts, minimal semantic understanding")
    print("  Layer 4:  Mid-early - concepts starting to form")
    print("  Layer 8:  Mid-late - concepts interacting and composing")
    print("  Layer 12: Final - complete understanding and composition")

## 7. Explore Individual Layers (Optional)

You can also view individual images from the `all_images/` directory for more detailed analysis.

In [None]:
# Example: Show all individual layer images for the first prompt
from IPython.display import display

prompt = prompts[0]
print(f"Detailed view for: {prompt}\n")

all_images_path = Path(f'colab_demo/{prompt}/sd1.4/encoder_full_direct/all_images')

if all_images_path.exists():
    images = sorted(all_images_path.glob('*.png'))
    
    for img_path in images:
        print(f"\n{img_path.name}:")
        img = Image.open(img_path)
        display(img)
else:
    print("No detailed images found")

## 8. Download Results (Optional)

Download all generated images as a zip file.

In [None]:
# Create a zip file of all results
!zip -r colab_demo_results.zip colab_demo/

print("\n✅ Results zipped!")
print("To download: Click the folder icon on the left, find 'colab_demo_results.zip', right-click → Download")

# Alternative: Direct download link (may not work in all browsers)
from google.colab import files
files.download('colab_demo_results.zip')

## 9. Run with Custom Prompts (Optional)

Try your own prompts to see how the text encoder builds understanding!

In [None]:
# Define your custom prompts
custom_prompts = [
    "A robot painting a self-portrait",
    "An elephant balancing on a ball",
]

# Write to file
with open('inputs/custom_prompts.txt', 'w') as f:
    for prompt in custom_prompts:
        f.write(prompt + '\n')

# Run the experiment
!python run_experiment.py \
    --model_key sd1.4 \
    --img_num 1 \
    --generate \
    --start_layer 0 \
    --end_layer 12 \
    --step_layer 4 \
    --folder_name custom_demo \
    --input_filename custom_prompts.txt \
    --number_of_inputs 2

print("\n✅ Custom prompts complete!")

## Understanding the Results

**What you're seeing:**

Each image shows what the diffusion model generates when it only has access to information up to that layer of the text encoder.

**Layer progression patterns:**

1. **Layer 0 (Early)**: Very abstract, minimal semantic understanding
   - Generic shapes and colors
   - No clear object identity

2. **Layer 4 (Mid-early)**: Concepts starting to emerge
   - Individual objects becoming recognizable
   - Limited interaction between concepts

3. **Layer 8 (Mid-late)**: Composition forming
   - Multiple concepts interacting
   - Spatial relationships developing

4. **Layer 12 (Final)**: Complete understanding
   - Full scene composition
   - All concepts properly integrated
   - Highest quality and coherence

**Paper findings:**
- **Compositional understanding** builds progressively (e.g., "A cake on a butterfly")
- **Rare concepts** (e.g., "babirusa") require more layers to retrieve correctly
- **Complex scenes** show clearer layer-by-layer progression than simple objects

## References

- **Paper**: [Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines](https://arxiv.org/abs/2403.05846)
- **Original Repository**: [github.com/tokeron/DiffusionLens](https://github.com/tokeron/DiffusionLens)
- **This Fork**: [github.com/AzulEye/Lens](https://github.com/AzulEye/Lens)

## Tips

- **Faster runtime**: Reduce `--img_num` to 1, increase `--step_layer` to 6
- **Better quality**: Increase `--img_num` to 4 (generates 4 images per layer)
- **Different model**: Try `--model_key sd2.1` (larger model, better quality, slower)
- **More prompts**: Increase `--number_of_inputs` to process more from the file

## Troubleshooting

**Out of memory?**
- Restart runtime and try again
- Reduce `--img_num` to 1
- Use `sd1.4` instead of larger models

**Slow generation?**
- Verify GPU is enabled (see Section 4)
- Expected: ~2-3 minutes per prompt with GPU, ~30-60 minutes on CPU

**Tokenizers build error?**
- Section 2 now uses pre-built tokenizers wheel (v0.13.3)
- If issues persist, restart runtime and re-run from Section 1