SinkCircuit

Overview

This repo contains utilities and scripts for analyzing attention and hidden-state behavior in large language models and for generating paper figures and tables.

Requirements

Python 3.10+
torch, transformers, numpy, pandas, pyarrow, matplotlib, tqdm, scikit-learn
Install dependencies: python -m pip install -r requirements.txt

Data configuration

Store dataset paths in privacy.json (key: default_data).
Load it into the environment with python -c "import load_privacy" or rely on run.py, which imports load_privacy.
load_privacy.py also switches the working directory to output_dir from privacy.json.
data.py demo expects DEFAULT_DATA (uppercase) in the environment.

Usage

Edit run.py to enable the figure or table block you want, then run python run.py.
Each figure or table function can also be called directly; see fig*.py and tab*.py signatures.

Example:

import os
import torch

from fig1 import fig1_1
from model import LlamaWrapper

model_info = {
    "model_name": "Llama-3.2-1B-Instruct",
    "model_path": "/path/to/model",
    "device": "auto",
    "dtype": torch.float16,
    "wrapper": "llama",
    "exp2_layer_idx": 1.5,
}
fig1_1(model_info, data_path=os.environ["default_data"], batch_size=64, batch_cnt=16, max_seq_len=64)

Figures

fig1_1: Per-layer attention heatmaps with head averages.
fig1_2: Layer-wise L2 norms for hidden and pre-MLP states (half-integer layers).
fig3_1: Per-layer attention heatmaps using head-wise max aggregation.
fig4_1: MLP intermediate visualization with PCA and t-SNE (first token vs others).
fig5_1: Hidden and pre-MLP state visualization with PCA (first token vs others).
fig6_1: Per-layer, per-head attention heatmaps.
fig6_2: First-layer MLP intermediate head-ablation visualization.
fig7_1: Cosine similarity to position 0 mean across layers and positions.

Tables

tab1: Loss vs repeating the first token (with and without BOS).
tab2: Repeated-token n-gram ratios (n=2,3,4).
tab3: L2 norm of first-layer attention outputs by position.

Model wrappers

model.py provides LMWrapper utilities and model-specific wrappers:

LlamaWrapper, NeoxWrapper, QwenWrapper, OptWrapper, InternLM3Wrapper, Olmo3Wrapper. Common methods include:
get_layer_activations, get_layer_qkv, get_layer_attn_scores
get_attn_output, get_pre_mlp_hidden_states
get_mlp_intermediate_states, get_first_mlp_intermediate_states
*_apply_pos_bias variants where applicable

Notes

Qwen models do not use BOS tokens; rm_bos is ignored there.
Outputs are written under per-figure directories relative to the working directory.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
data.py		data.py
download_fineweb_edu_10b.sh		download_fineweb_edu_10b.sh
fig1.py		fig1.py
fig11.py		fig11.py
fig3.py		fig3.py
fig4.py		fig4.py
fig5.py		fig5.py
fig6.py		fig6.py
fig7.py		fig7.py
load_privacy.py		load_privacy.py
model.py		model.py
privacy.json		privacy.json
requirements.txt		requirements.txt
run.py		run.py
tab1.py		tab1.py
tab2.py		tab2.py
tab3.py		tab3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SinkCircuit

Overview

Requirements

Data configuration

Usage

Figures

Tables

Model wrappers

Notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SinkCircuit

Overview

Requirements

Data configuration

Usage

Figures

Tables

Model wrappers

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages