Skip to content

danielle34/mixviz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MixViz

Visualizing Cross-Head Mixing in Non-Standard Attention Architectures

Query Construction: MHA vs Talking Heads vs IHA

Standard attention visualization tools like BertViz assume each head operates independently, displaying per-head attention heatmaps as if they tell the whole story. But architectures that mix information across heads before or during attention — such as Talking Heads and Interleaved Head Attention (IHA) — break this assumption. Their real computational structure is hidden inside cross-head mixing weights that no existing tool surfaces.

MixViz is an interactive visualization tool built to reveal what standard tools miss: how cross-head mixing reshapes attention patterns, and why per-head interpretability breaks down in architectures like IHA.

Live Demo

https://mixviz.up.railway.app

The System

MixViz — Sentence View and Attention Field

The main interface has two linked views. On the left, the Sentence View shows a BertViz-style bipartite display of query-to-key attention for a single head, with a timeline scrubber that lets you step through each query position. On the right, the Attention Field plots every attention weight across all 8 heads as a circular dot cloud — each head gets its own color, and gold rings highlight weights that align with the ground-truth reasoning path.

Mixing Matrix and Attention Heatmap

Switch to the Mixing Matrix tab to see the learned alpha weights that IHA uses to combine heads before attention. This is the structure that standard tools completely hide — it shows which source heads contribute to each output head's query construction.

Timeline Playback

The timeline playback lets you scrub or auto-play through query positions. As the scrubber moves, the sentence view updates attention lines in real time and the attention field highlights the current query row — making it easy to see how attention shifts along the reasoning chain.

Why This Exists

We trained three architectures — MHA, Talking Heads, and IHA — on a synthetic relation-composition task where the ground-truth reasoning path is fully known. All three solve the task perfectly (F1 > 0.999). But when we inspected their attention patterns:

Architecture Alignment Redundancy Entropy Best Epoch
MHA 0.507 0.190 1.604 46
Talking Heads 0.560 0.289 1.423 27
IHA 0.055 0.139 0.503 44

IHA's per-head alignment to the reasoning path drops by 10x compared to MHA. The model still solves the task, but the computation is distributed across the mixing structure in a way that individual attention maps can't reveal. Standard visualization tools are actively misleading for this architecture.

MixViz was built to make this visible — to show both the attention patterns and the mixing structure that shapes them.

Running Locally

Quick start (demo mode, no GPU needed)

Demo mode uses precomputed attention data — no model checkpoints or training required.

# Clone and install frontend
git clone https://github.com/danielle34/mixviz.git
cd mixviz/mixviz_web/frontend
npm install

# Start the backend (serves precomputed data)
cd ..
pip install flask numpy scipy
python server.py &

# Start the frontend
cd frontend
npm run dev

Open http://localhost:5173 in your browser.

Full setup (live mode with models)

Requires trained checkpoints in outputs/checkpoints/.

# Python environment
conda create -n attn_interp python=3.11
conda activate attn_interp
pip install torch flask scipy numpy

# Install frontend
cd mixviz_web/frontend
npm install

# Precompute attention data (optional, speeds up demo mode)
cd ../..
python mixviz_web/backend/precompute.py

# Run backend + frontend
python mixviz_web/server.py &
cd mixviz_web/frontend && npm run dev

Training models from scratch

python src/train.py --model_type mha --task binary --lr 1e-4 \
    --data_dir data --output_dir outputs/checkpoints/mha_binary_1e4

python src/train.py --model_type iha --task binary --lr 1e-4 \
    --data_dir data --output_dir outputs/checkpoints/iha_binary_1e4

python src/train.py --model_type talking_heads --task binary --lr 1e-4 \
    --data_dir data --output_dir outputs/checkpoints/talking_heads_binary_1e4

What's in This Repo

Folder Description
mixviz_web/ Interactive web visualization (Flask backend + React frontend)
models/ Attention architecture implementations (MHA, Talking Heads, IHA)
src/ Training, metric extraction, analysis scripts
paper/ Paper source (LaTeX)
outputs/metrics/ Experiment results (summary JSONs and per-model arrays)
scripts/ SLURM job launchers for training sweeps

Citation

If you use this work, please cite:

@article{adeyemi2026mixviz,
  title   = {MixViz: Visualizing Cross-Head Mixing in Non-Standard Attention Architectures},
  author  = {Adeyemi, Morayo Danielle},
  year    = {2026},
}

Acknowledgments

This work builds on and is inspired by:

About

Visualization tool for cross-head mixing in IHA and other non-standard attention architectures.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors