Skip to content

A Scalable Framework for Cross-Layer Transcoder Training and Attribution-Graph Visualization

License

Notifications You must be signed in to change notification settings

circuits-research/CircuitLab

Repository files navigation

CLT banner

Python PyTorch Poetry License: MIT

CircuitLab is a Python library for training Cross-Layer Transcoders (CLTs) at scale. It will soon include an automatic intepretability pipeline and a visual interface.

We believe that a major limitation in the development of CLTs, and more broadly attribution graph methods, is the significant engineering effort required to train, analyze, and iterate on them. This library aims to reduce that overhead by providing a clean, scalable, and extensible framework.

Features

This library currently implements L1-regularized JumpReLU CLTs with the following design principles:

  • Follows Anthropic's training guidelines
  • Supports feature sharding across GPUs (as well as DDP and FSDP)
  • Includes activation caching and compression/quantization of the activations
  • Adopts a structure similar to SAE Lens (code design, activation-store, etc.) and uses Transformer Lens

Stay-Tuned

We also plan to release within the same package (end of February 2026):

  • An automatic interpretability pipeline
  • A visual interface for exploring features and attribution graphs

We welcome contributions to the library. Please refer to CONTRIBUTING.md for guidelines and templates. If you are interested in collaboration, you can also request access to the following document with cool CLT improvement ideas. Finally, if you have any questions or want to discuss potential improvements/collaboration, write to us on the librabry discord !

Quick Start

Training happens in two steps:

  1. Precompute activations (should be parallelized across indepedent jobs)
  2. Train the CLT model on the cached activations (should run on a single multi-gpu node)

1. Generate and cache activations

from circuitlab import ActivationsStore, clt_training_runner_config, load_model

# Load model
model = load_model("meta-llama/Llama-3.2-1B", device="cuda")

# Create config
cfg = clt_training_runner_config()

# Create activation store
store = ActivationsStore(model, cfg)

# Generate and cache activations
store.generate_and_save_activations(
    path=cfg.cached_activations_path,
    use_compression=True,  # optional
)

2. Train the CLT

from circuitlab import CLTTrainingRunner

# Train
trainer = CLTTrainingRunner(cfg)
trainer.run()

⚙️ Notes

  • We provide screenshot examples of training metrics in the output folder and sample training scripts in runners
  • Compression is optional but recommended for large-scale runs (e.g. 1B +) with 4-8x memory reduction
  • Training with bf16 is fine (autocasting with activations and weights in bf16 but gradient states in 32) but requires higher lr (around 1.5-2x bigger)
  • For Llama 1B, on a full 8 gpu H100 node, we reach an expansion factor of 42 with micro-batch size 512
  • We provide a sample script to map model weigths to circuit-tracer in the file
  • There has been recent criticism of the faithfulness of CLTs (see post). We are also currently studying this phenomenon.

Citation

About

A Scalable Framework for Cross-Layer Transcoder Training and Attribution-Graph Visualization

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages