Skip to content

Aaquib111/edge-attribution-patching

Repository files navigation

Edge Attribution Patching

Use the minimal-implementation branch for an easy-to-use version of edge attribution patching! All code in the minimal_implementation branch has been created by Oscar Balcells.

This repository is currently under development. It is built on top of https://github.com/neelnanda-io/TransformerLens which we may merge into eventually.

Please cite this work as:

@inproceedings{
  syed2023attribution,
  title={Attribution Patching Outperforms Automated Circuit Discovery},
  author={Aaquib Syed and Can Rager and Arthur Conmy},
  booktitle={NeurIPS Workshop on Attributing Model Behavior at Scale},
  year={2023},
  url={https://openreview.net/forum?id=tiLbFR4bJW}
}

About

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors