Reference implementation. This repository is not maintained and not accepting contributions. Report security issues to security@anthropic.com; we do not commit to fixes or timelines.
A visualization tool for attention heads in transformer language models. Pick a head, see its top-activating sequences across your dataset, its attention patterns, per-head metrics (induction score, previous-token score, entropy, …), and a PCA/UMAP projection of its Q/K/O/V space. Optionally decompose individual attention edges into contributing sparse-dictionary feature pairs.
The frontend is static-first — once data/ is populated it serves from disk with no backend. A backend unlocks live operations (custom prompts, live attribution compute) but is optional.
npm install
npm run buildServe dist/ alongside a populated data/ directory (see below).
data_pipeline.py and server.py are skeletons with the data-format contracts fully specified in docstrings. The intended workflow is to hand them to Claude Code with your model and dataset:
Here's
data_pipeline.pyfrom the headvis repo. I want to run it againstgpt2from HuggingFace using theopenwebtextdataset, studying layers 5 and 8 across all heads. Implement theNotImplementedErrorfunctions and run the pipeline.
Claude reads the docstrings (which are the spec), implements the model-specific forward-pass and tokenization bits using transformers + datasets, and runs it. The output-format contracts are exact — as long as the JSON shapes match, the frontend works.
The same applies to server.py for the backend:
Implement
add_custom_sequenceandproject_to_umapinserver.pyfor the same gpt2 setup. Leave the attribution endpoints stubbed (I don't have an SAE).
┌─────────────────────────────────┐
│ data_pipeline.py (run once) │──→ data/config.json
│ │ data/scatter_data.json
│ dataset scan → attention → │ data/heads/L{l}H{h}.json
│ metrics, decile sampling, │ data/umap/L{l}H{h}.json
│ PCA/UMAP fit │ data/umap/sequences.json
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ Svelte frontend (src/) │ ◀──────▶ │ server.py (optional) │
│ │ POST │ │
│ static read from data/ │ │ live attribution, │
│ optional runtime POST │ │ custom-prompt ingestion, │
│ │ │ UMAP point projection │
└─────────────────────────────────┘ └──────────────────────────────┘
Walks your dataset, computes attention per head, produces the static data/ tree. Every metric is a closed-form reduction over the attention matrix — the logit_* trio need pre-softmax QK scores (hook the attention module or recompute Q @ K.T), everything else runs on post-softmax weights.
| function | produces | model required |
|---|---|---|
compute_head_metrics |
scatter-plot coordinates | yes (attention) |
sample_by_decile |
heads/L{l}H{h}.json sequences array |
yes (attention) |
compute_projection_cloud |
umap/L{l}H{h}.json |
yes (Q/K/O/V vectors) |
compute_qk_distance_histogram, compute_top_tokens |
derived stats | no (pure transforms) |
write_* |
JSON serialization | no |
Framework-agnostic parts (decile bucketing, COO sparsification, histogram binning, JSON writing) are pure data transforms — implementable once and reusable. Only the attention-extraction and vector-extraction bits depend on your model framework.
Six endpoints. The frontend runs fully static without any of them.
| endpoint | unlocks | needs |
|---|---|---|
add_custom_sequence |
type any prompt, see its attention | model + tokenizer |
delete_custom_sequence |
delete button on custom prompts | file I/O only |
project_to_umap |
"where does this prompt sit in the PCA cloud?" | model + saved PCA rotation |
save_umap_clusters |
persist user-lassoed cluster labels | file I/O only |
qk_attributions, ov_attributions |
decompose an attention edge into feature pairs | sparse feature dictionary (SAE/transcoder) |
The two file-I/O-only endpoints are implementable with no model. The attribution endpoints need a trained sparse dictionary — see SAELens if you don't have one. Everything else is a single forward pass.
Side-effect contract: endpoints that produce durable results write them into data/ so the next load hits the cache. The frontend tries data/attributions/{qk,ov}/L{l}H{h}S{seq}Q{q}K{k}.json before POSTing.
The context blob: data/server_config.json has a context field that the frontend forwards verbatim on every POST. It never interprets the contents — put whatever your implementation needs in there (model path, cache dir, etc.). Frontend in static-only mode when server_config.json is {} or missing.
Every JSON schema is in the docstrings of data_pipeline.py. A few that trip people up:
- Sparse attention —
attention_indicesare flat COO indices:idx = q * seq_len + k. Values are parallel. Store top-K per row or above a threshold; dense matrices blow up file sizes. - Intervals —
interval=10is the highest-activation decile,interval=1the lowest. 1-indexed.interval=-1(URL param) means the custom-sequences view. - Position 0 — the frontend's max-reductions skip row 0 and column 0 (see
src/lib/sparse.js). Yourmax_activationshould do the same. viewskey — if you only fit PCA, omit theviewskey from the UMAP JSON entirely; the frontend auto-selects PCA mode whenviewsis absent.
The frontend supports URL state: ?layer=L&head=H&tab=T&interval=N. For a specific attribution: ?layer=L&head=H&attr=qk&seq=ID&qpos=P&kpos=P. See src/Index.svelte for the full param list.