headvis

Reference implementation. This repository is not maintained and not accepting contributions. Report security issues to security@anthropic.com; we do not commit to fixes or timelines.

A visualization tool for attention heads in transformer language models. Pick a head, see its top-activating sequences across your dataset, its attention patterns, per-head metrics (induction score, previous-token score, entropy, …), and a PCA/UMAP projection of its Q/K/O/V space. Optionally decompose individual attention edges into contributing sparse-dictionary feature pairs.

The frontend is static-first — once data/ is populated it serves from disk with no backend. A backend unlocks live operations (custom prompts, live attribution compute) but is optional.

Quick start

npm install
npm run build

Serve dist/ alongside a populated data/ directory (see below).

Populating `data/` — using Claude

data_pipeline.py and server.py are skeletons with the data-format contracts fully specified in docstrings. The intended workflow is to hand them to Claude Code with your model and dataset:

Here's data_pipeline.py from the headvis repo. I want to run it against gpt2 from HuggingFace using the openwebtext dataset, studying layers 5 and 8 across all heads. Implement the NotImplementedError functions and run the pipeline.

Claude reads the docstrings (which are the spec), implements the model-specific forward-pass and tokenization bits using transformers + datasets, and runs it. The output-format contracts are exact — as long as the JSON shapes match, the frontend works.

The same applies to server.py for the backend:

Implement add_custom_sequence and project_to_umap in server.py for the same gpt2 setup. Leave the attribution endpoints stubbed (I don't have an SAE).

Architecture

┌─────────────────────────────────┐
│  data_pipeline.py  (run once)   │──→  data/config.json
│                                 │     data/scatter_data.json
│  dataset scan → attention →     │     data/heads/L{l}H{h}.json
│  metrics, decile sampling,      │     data/umap/L{l}H{h}.json
│  PCA/UMAP fit                   │     data/umap/sequences.json
└─────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────┐          ┌──────────────────────────────┐
│  Svelte frontend (src/)         │ ◀──────▶ │  server.py  (optional)       │
│                                 │   POST   │                              │
│  static read from data/         │          │  live attribution,           │
│  optional runtime POST          │          │  custom-prompt ingestion,    │
│                                 │          │  UMAP point projection       │
└─────────────────────────────────┘          └──────────────────────────────┘

Implementation

`data_pipeline.py` — offline

Walks your dataset, computes attention per head, produces the static data/ tree. Every metric is a closed-form reduction over the attention matrix — the logit_* trio need pre-softmax QK scores (hook the attention module or recompute Q @ K.T), everything else runs on post-softmax weights.

function	produces	model required
`compute_head_metrics`	scatter-plot coordinates	yes (attention)
`sample_by_decile`	`heads/L{l}H{h}.json` sequences array	yes (attention)
`compute_projection_cloud`	`umap/L{l}H{h}.json`	yes (Q/K/O/V vectors)
`compute_qk_distance_histogram`, `compute_top_tokens`	derived stats	no (pure transforms)
`write_*`	JSON serialization	no

Framework-agnostic parts (decile bucketing, COO sparsification, histogram binning, JSON writing) are pure data transforms — implementable once and reusable. Only the attention-extraction and vector-extraction bits depend on your model framework.

`server.py` — runtime

Six endpoints. The frontend runs fully static without any of them.

endpoint	unlocks	needs
`add_custom_sequence`	type any prompt, see its attention	model + tokenizer
`delete_custom_sequence`	delete button on custom prompts	file I/O only
`project_to_umap`	"where does this prompt sit in the PCA cloud?"	model + saved PCA rotation
`save_umap_clusters`	persist user-lassoed cluster labels	file I/O only
`qk_attributions`, `ov_attributions`	decompose an attention edge into feature pairs	sparse feature dictionary (SAE/transcoder)

The two file-I/O-only endpoints are implementable with no model. The attribution endpoints need a trained sparse dictionary — see SAELens if you don't have one. Everything else is a single forward pass.

Side-effect contract: endpoints that produce durable results write them into data/ so the next load hits the cache. The frontend tries data/attributions/{qk,ov}/L{l}H{h}S{seq}Q{q}K{k}.json before POSTing.

The context blob: data/server_config.json has a context field that the frontend forwards verbatim on every POST. It never interprets the contents — put whatever your implementation needs in there (model path, cache dir, etc.). Frontend in static-only mode when server_config.json is {} or missing.

Data format details

Every JSON schema is in the docstrings of data_pipeline.py. A few that trip people up:

Sparse attention — attention_indices are flat COO indices: idx = q * seq_len + k. Values are parallel. Store top-K per row or above a threshold; dense matrices blow up file sizes.
Intervals — interval=10 is the highest-activation decile, interval=1 the lowest. 1-indexed. interval=-1 (URL param) means the custom-sequences view.
Position 0 — the frontend's max-reductions skip row 0 and column 0 (see src/lib/sparse.js). Your max_activation should do the same.
views key — if you only fit PCA, omit the views key from the UMAP JSON entirely; the frontend auto-selects PCA mode when views is absent.

Deep linking

The frontend supports URL state: ?layer=L&head=H&tab=T&interval=N. For a specific attribution: ?layer=L&head=H&attr=qk&seq=ID&qpos=P&kpos=P. See src/Index.svelte for the full param list.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
data_pipeline.py		data_pipeline.py
package.json		package.json
server.py		server.py
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

headvis

Quick start

Populating `data/` — using Claude

Architecture

Implementation

`data_pipeline.py` — offline

`server.py` — runtime

Data format details

Deep linking

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

headvis

Quick start

Populating data/ — using Claude

Architecture

Implementation

data_pipeline.py — offline

server.py — runtime

Data format details

Deep linking

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Populating `data/` — using Claude

`data_pipeline.py` — offline

`server.py` — runtime

Packages