# 01.1c: Save Token Norms

**Goal:** Compute norms for all tokens in gamma space and save to CSV for analysis.

This is a simple **generator** notebook: compute the radial distances (norms) for all 151,936 tokens in gamma-prime (centered) space and save them with token indices.

Output: CSV with columns `[token_id, norm_gamma]`

This data will be used in 01.2d to investigate the spike in the radial distribution.

## Parameters

In [1]:
TENSOR_DIR = "../data/tensors"
OUTPUT_DIR = "../data/results"
OUTPUT_FILE = "token_norms_gamma.csv"

## Imports

In [2]:
import torch
import pandas as pd
from safetensors.torch import load_file
from pathlib import Path
import os

print("Imports loaded successfully.")

Imports loaded successfully.


## Step 1: Load Centered Gamma

In [3]:
gamma_centered_path = Path(TENSOR_DIR) / "gamma_centered_qwen3_4b_instruct_2507.safetensors"
gamma_centered = load_file(gamma_centered_path)['gamma_centered']

N, d = gamma_centered.shape

print(f"Loaded γ' (gamma_centered):")
print(f"  Tokens: {N:,}")
print(f"  Dimensions: {d:,}")

Loaded γ' (gamma_centered):
  Tokens: 151,936
  Dimensions: 2,560


## Step 2: Compute Norms

In [4]:
print("Computing norms for all tokens...")
norms = gamma_centered.norm(dim=1)

print(f"\nComputed {len(norms):,} norms")
print(f"  Mean: {norms.mean().item():.6f} gamma units")
print(f"  Std: {norms.std().item():.6f} gamma units")
print(f"  Min: {norms.min().item():.6f} gamma units")
print(f"  Max: {norms.max().item():.6f} gamma units")

Computing norms for all tokens...

Computed 151,936 norms
  Mean: 1.040134 gamma units
  Std: 0.188920 gamma units
  Min: 0.153098 gamma units
  Max: 1.568522 gamma units


## Step 3: Create DataFrame

In [5]:
# Create DataFrame with token IDs and norms
df = pd.DataFrame({
    'token_id': range(N),
    'norm_gamma': norms.cpu().numpy()
})

print(f"Created DataFrame:")
print(f"  Shape: {df.shape}")
print(f"\nFirst few rows:")
print(df.head(10))

Created DataFrame:
  Shape: (151936, 2)

First few rows:
   token_id  norm_gamma
0         0    1.172701
1         1    1.221996
2         2    1.097519
3         3    1.075008
4         4    1.109532
5         5    1.164412
6         6    1.208837
7         7    1.218959
8         8    1.005412
9         9    1.208332


## Step 4: Save to CSV

In [6]:
# Create output directory if needed
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Save to CSV
output_path = Path(OUTPUT_DIR) / OUTPUT_FILE
df.to_csv(output_path, index=False)

print(f"Saved {len(df):,} token norms to: {output_path}")
print(f"File size: {output_path.stat().st_size / 1024:.1f} KB")

Saved 151,936 token norms to: ../data/results/token_norms_gamma.csv
File size: 2412.4 KB


## Summary

Successfully computed and saved all token norms in gamma space.

Output file: `data/results/token_norms_gamma.csv`

Columns:
- `token_id`: Integer index (0 to 151,935)
- `norm_gamma`: Radial distance from centroid in gamma units

This data is ready for analysis in 01.2d to investigate the central spike.