Skip to content

Account4PaperReview/SinkCircuit

Repository files navigation

SinkCircuit

Overview

This repo contains utilities and scripts for analyzing attention and hidden-state behavior in large language models and for generating paper figures and tables.

Requirements

  • Python 3.10+
  • torch, transformers, numpy, pandas, pyarrow, matplotlib, tqdm, scikit-learn
  • Install dependencies: python -m pip install -r requirements.txt

Data configuration

  • Store dataset paths in privacy.json (key: default_data).
  • Load it into the environment with python -c "import load_privacy" or rely on run.py, which imports load_privacy.
  • load_privacy.py also switches the working directory to output_dir from privacy.json.
  • data.py demo expects DEFAULT_DATA (uppercase) in the environment.

Usage

  • Edit run.py to enable the figure or table block you want, then run python run.py.
  • Each figure or table function can also be called directly; see fig*.py and tab*.py signatures.

Example:

import os
import torch

from fig1 import fig1_1
from model import LlamaWrapper

model_info = {
    "model_name": "Llama-3.2-1B-Instruct",
    "model_path": "/path/to/model",
    "device": "auto",
    "dtype": torch.float16,
    "wrapper": "llama",
    "exp2_layer_idx": 1.5,
}
fig1_1(model_info, data_path=os.environ["default_data"], batch_size=64, batch_cnt=16, max_seq_len=64)

Figures

  • fig1_1: Per-layer attention heatmaps with head averages.
  • fig1_2: Layer-wise L2 norms for hidden and pre-MLP states (half-integer layers).
  • fig3_1: Per-layer attention heatmaps using head-wise max aggregation.
  • fig4_1: MLP intermediate visualization with PCA and t-SNE (first token vs others).
  • fig5_1: Hidden and pre-MLP state visualization with PCA (first token vs others).
  • fig6_1: Per-layer, per-head attention heatmaps.
  • fig6_2: First-layer MLP intermediate head-ablation visualization.
  • fig7_1: Cosine similarity to position 0 mean across layers and positions.

Tables

  • tab1: Loss vs repeating the first token (with and without BOS).
  • tab2: Repeated-token n-gram ratios (n=2,3,4).
  • tab3: L2 norm of first-layer attention outputs by position.

Model wrappers

model.py provides LMWrapper utilities and model-specific wrappers:

  • LlamaWrapper, NeoxWrapper, QwenWrapper, OptWrapper, InternLM3Wrapper, Olmo3Wrapper. Common methods include:
  • get_layer_activations, get_layer_qkv, get_layer_attn_scores
  • get_attn_output, get_pre_mlp_hidden_states
  • get_mlp_intermediate_states, get_first_mlp_intermediate_states
  • *_apply_pos_bias variants where applicable

Notes

  • Qwen models do not use BOS tokens; rm_bos is ignored there.
  • Outputs are written under per-figure directories relative to the working directory.

About

This repo contains utilities and scripts for analyzing attention and hidden-state behavior in large language models and for generating paper figures and tables.

Resources

Stars

Watchers

Forks

Contributors