In [1]:
import os
os.chdir('/home/smallyan/eval_agent')
print(f"Current working directory: {os.getcwd()}")

Current working directory: /home/smallyan/eval_agent


In [2]:
# Now navigate to the target repository
repo_path = '/net/scratch2/smallyan/universal-neurons_eval'
os.chdir(repo_path)
print(f"Changed to repository: {os.getcwd()}")

# List the directory contents
import subprocess
result = subprocess.run(['ls', '-la'], capture_output=True, text=True)
print("\nDirectory contents:")
print(result.stdout)

Changed to repository: /net/scratch2/smallyan/universal-neurons_eval

Directory contents:
total 3985
drwxrwx---  8 smallyan smallyan      28 Jan  9 16:56 .
drwxrwx--- 40 smallyan smallyan      42 Jan 11 07:09 ..
-rw-rw----  1 smallyan smallyan   12692 Dec 22 21:49 activations.py
drwxrwx---  3 smallyan smallyan      14 Dec 23 23:57 analysis
-rw-rw----  1 smallyan smallyan    8160 Dec 22 21:49 attention_deactivation.py
-rw-rw----  1 smallyan smallyan    7845 Dec 22 21:49 attention_deactivation_qpos.py
-rw-rw----  1 smallyan smallyan    4519 Dec 22 21:49 CodeWalkthrough.md
-rw-rw----  1 smallyan smallyan   10191 Dec 22 21:49 correlations_fast.py
-rw-rw----  1 smallyan smallyan   14345 Dec 22 21:49 correlations_parallel.py
-rw-rw----  1 smallyan smallyan   13230 Dec 22 21:49 correlations.py
drwxrwx---  5 smallyan smallyan       5 Dec 22 21:49 dataframes
-rw-r-----  1 smallyan smallyan 3861499 Dec 23 01:40 documentation.pdf
-rw-rw----  1 smallyan smallyan    6106 Dec 22 21:49 entropy_interv

# Universal Neurons in GPT2 Language Models - Replication

This notebook replicates the experiments from "Universal Neurons in GPT2 Language Models" by Gurnee et al. (2024).

## Experiment Overview

The original study investigates the universality of individual neurons across GPT2 language models trained from different random seeds. The key hypothesis is that universal neurons (those that consistently activate on the same inputs across different models) are more likely to be monosemantic and interpretable.

### Key Experiments:
1. **Neuron correlation analysis**: Computing pairwise Pearson correlations of neuron activations across models
2. **Statistical properties analysis**: Analyzing activation and weight statistics of universal vs non-universal neurons
3. **Neuron family taxonomization**: Classifying neurons into families based on their behavior
4. **Functional analysis**: Studying prediction, suppression, and partition neurons through weight analysis
5. **Causal interventions**: Testing effects of fixing neuron activations

### Replication Scope

Due to computational constraints, this replication will focus on:
1. Demonstrating the neuron correlation computation methodology on a small subset
2. Analyzing statistical properties of neurons
3. Reproducing the key analytical framework even if not at full scale

In [3]:
# Check GPU availability
import torch
import sys
import os

print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA device count: {torch.cuda.device_count()}")
    
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"\nUsing device: {device}")

Python version: 3.11.0 (main, Mar  1 2023, 18:26:19) [GCC 11.2.0]
PyTorch version: 2.7.1+cu118
CUDA available: True
CUDA device: NVIDIA A100 80GB PCIe
CUDA device count: 1

Using device: cuda


In [4]:
# Install required packages if needed
import subprocess
import sys

packages_to_check = [
    'transformer-lens',
    'einops',
    'datasets',
    'circuitsvis'
]

for package in packages_to_check:
    try:
        __import__(package.replace('-', '_'))
        print(f"✓ {package} is installed")
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])



✓ transformer-lens is installed
✓ einops is installed
✓ datasets is installed


✓ circuitsvis is installed


## Part 1: Load Models and Setup

We'll load GPT-2 small models to demonstrate the correlation analysis. The original study used multiple models trained from different random seeds, but we'll work with available pretrained models.

In [5]:
from transformer_lens import HookedTransformer
import torch
import numpy as np
from tqdm.auto import tqdm
import einops

# Load a smaller GPT-2 model for demonstration
print("Loading GPT-2 small model...")
model = HookedTransformer.from_pretrained(
    "gpt2-small",
    device=device,
    dtype=torch.float32
)

print(f"\nModel configuration:")
print(f"  Layers: {model.cfg.n_layers}")
print(f"  MLP dimension: {model.cfg.d_mlp}")
print(f"  Attention heads: {model.cfg.n_heads}")
print(f"  Model dimension: {model.cfg.d_model}")
print(f"  Vocabulary size: {model.cfg.d_vocab}")

Loading GPT-2 small model...


`torch_dtype` is deprecated! Use `dtype` instead!


Loaded pretrained model gpt2-small into HookedTransformer

Model configuration:
  Layers: 12
  MLP dimension: 3072
  Attention heads: 12
  Model dimension: 768
  Vocabulary size: 50257


## Part 2: Activation Collection

The original study computed neuron activations over 100 million tokens from the Pile test set. For this replication, we'll use a smaller sample to demonstrate the methodology.

In [6]:
from datasets import load_dataset

# Load a small dataset for demonstration
# Using a subset of the pile or similar dataset
print("Loading dataset...")

# For replication, we'll use a small sample from a text dataset
# The original uses 100M tokens from Pile test set
try:
    # Try loading pile if available
    dataset = load_dataset("EleutherAI/pile", split="test", streaming=True)
    dataset_iter = iter(dataset)
    
    # Collect some samples
    num_samples = 100  # Much smaller than original for demo
    texts = []
    for i, sample in enumerate(dataset_iter):
        if i >= num_samples:
            break
        texts.append(sample['text'])
    
    print(f"Loaded {len(texts)} samples from Pile test set")
except Exception as e:
    print(f"Could not load Pile dataset ({e}), using alternative...")
    # Fallback to OpenWebText or another dataset
    dataset = load_dataset("openwebtext", split="train", streaming=True)
    dataset_iter = iter(dataset)
    
    num_samples = 100
    texts = []
    for i, sample in enumerate(dataset_iter):
        if i >= num_samples:
            break
        texts.append(sample['text'])
    
    print(f"Loaded {len(texts)} samples from OpenWebText")

Loading dataset...


Downloading builder script:   0%|          | 0.00/9.53k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

In [8]:
# Use a simpler approach with sample texts for demonstration
# In the original study, they use 100M tokens from Pile test set
# For this replication, we'll create a smaller representative sample

sample_texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Machine learning is a subset of artificial intelligence.",
    "Python is a popular programming language for data science.",
    "The theory of relativity was developed by Albert Einstein.",
    "Natural language processing enables computers to understand human language.",
    "Deep learning models use neural networks with multiple layers.",
    "The capital of France is Paris, a beautiful city.",
    "Mathematics is the foundation of computer science and engineering.",
    "Climate change is one of the most pressing challenges of our time.",
    "The internet has revolutionized how we communicate and share information."
] * 20  # Repeat to get more samples

print(f"Created {len(sample_texts)} sample texts for demonstration")
print(f"\nNote: Original study uses 100M tokens from Pile test set.")
print(f"This is a simplified demonstration with ~{sum(len(t.split()) for t in sample_texts)} tokens.")

In [9]:
# Actually create the texts
sample_texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Machine learning is a subset of artificial intelligence.",
    "Python is a popular programming language for data science.",
    "The theory of relativity was developed by Albert Einstein.",
    "Natural language processing enables computers to understand human language.",
    "Deep learning models use neural networks with multiple layers.",
    "The capital of France is Paris, a beautiful city.",
    "Mathematics is the foundation of computer science and engineering.",
    "Climate change is one of the most pressing challenges of our time.",
    "The internet has revolutionized how we communicate and share information."
] * 20

print(f"Created {len(sample_texts)} sample texts")
print(f"Total tokens (approximate): {sum(len(t.split()) for t in sample_texts)}")

In [10]:
print("Test print")
x = 5
print(f"x = {x}")

In [11]:
# Restart with clean implementation
import sys
sys.stdout.flush()

# Create sample texts for demonstration
sample_texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Machine learning is a subset of artificial intelligence.",
    "Python is a popular programming language for data science.",
    "The theory of relativity was developed by Albert Einstein.",
    "Natural language processing enables computers to understand human language.",
] * 10

total_words = sum(len(t.split()) for t in sample_texts)
print(f"Created {len(sample_texts)} sample texts with ~{total_words} words", flush=True)