To view our Streamlit, visit https://copy-suppression.streamlit.app/

(If you're interested in our research, please reach out! Our emails are {cal.s.mcdougall, arthurconmy, thisiscodyr}@gmail.com)

This repo serves two purposes:

An edited version of TransformerLens with a couple of extra features (see below).
Hosting streamlit pages from https://github.com/callummcdougall/SERI-MATS-2023-Streamlit-pages/blob/main/transformer_lens/rs/callum2/st_page/Home.py

See transformer_lens/rs/arthurs_notebooks/example_notebook.py for example usage.

Development Setup:

This setup assumes you're using an SSH key to access Github. See here and the associated links on that page (if you don't have an SSH key to begin with)

$ git clone git@github.com:callummcdougall/SERI-MATS-2023-Streamlit-pages.git
$ cd TransformerLens
$ poetry install

pip install -e works too, though not using identical packages to poetry.lock plausibly could be a problem.

You need to have poetry installed; to do this run

curl -sSL https://install.python-poetry.org | python3 -

and then either try to edit PATH manually or do echo -e "$(cat ~/.bashrc)\nexport PATH=\"$HOME/.local/bin:\$PATH\"\n" > ~/.bashrc; source ~/.bashrc to run through the poetry install tricks on a linux machine.

You should add requirements, e.g einops, via running poetry add einops.

We stored some large files in git history and need clean them up; try git clone --depth 1 git@github.com:callummcdougall/SERI-MATS-2023-Streamlit-pages.git if git clone is lagging.

If you want to launch streamlit pages, run

pip install streamlit
cd transformer_lens/rs/callum2/st_page
streamlit run Home.py

Difference from the main branch of TransformerLens

We set the ACCELERATE_DISABLE_RICH environment variable in transformer_lens/__init__.py to "1" to stop an annoying reformatting of notebook error messages
We add the qkv_normalized_input hooks that can be optionally added to models

See the main TransformerLens README here

Guide to Experiments

Surveying the direct effects of individual attention heads: transformer_lens/rs/arthurs_notebooks/direct_effect_survey.py
(TODO: scan through the paper, ideally clean up the repo too)
(TODO: write a better implementation of the learnable scale and bias vectors)

Description of directories in `transformer_lens/rs/callum2`

(Written by Callum) These are the directories which I use to structure my own work.

`ioi_and_bos`

This directory is for 2 small investigations:

How does the head manage to attend to BOS by default?

Conclusions - when you look at the cosine similarity of "residual stream vector before attn layer 10" and "query bias for head 10.7", it's very positive and in a very tight range for all tokens (between 0.45 and 0.47) whenever position is zero, and the same but very negative for all tokens whenever position isn't zero. So this isn't a function of BOS, it's a function of position. This has implications for how CSPA works; the query-side prediction has to overcome some threshold to actually activate the copy suppression mechanism.

What's the perpendicular component of the query, in IOI?

Conclusions -

Adding semantically similar tokens "Mary", "mary" rather than just " Mary" doesn't seem to help.
Found weak evidence that there's some kind of "indirect prediction", because when you take the perpendicular component and put them through the MLPs it does favour IO over S1 (but the MLPs don't have much impact in IOI so this effect isn't large anyway).

`st_page`

Hosting all of the Streamlit pages. This isn't for generating any plots (at least I don't use it for that); it's exclusively for hosting pages & storing media files.

The pages are:

OV and QK circuits - you get to see what tokens are most attended to (QK circuit, prediction-attention) and what tokens are most suppressed (OV circuit). It's a nice way to highlight semantic similarity, and build intuition for how it works.
Browse Examples - the most important page. You get to investigate OWT examples, and see how all parts of the copy suppression mechanism works. You can:
- See the loss change per token when you ablate, i.e. find the MIDS (tokens which the head is most helpful for).
- See the logits pre and post-ablation, as well as the direct logit attribution for this head. You can confirm that the head is pushing down tokens which appear in context, for most of the MIDS examples.
- Look at the attention patterns. You can confirm that the head is attending to the tokens which it pushes down.
- Look at the logit lens before head 10.7. You can confirm that the head is predicting precisely the words which it is attending to.

`cspa`

This is where I get the copy suppression-preserving ablation results. In other words, the stuff that's gonna be in section 3.3 of the paper (and that makes up one of the Streamlit pages).

It also adds to the HTML plots dictionary, for the "Browse Examples" Streamlit page.

`ov_qk_circuits`

This generates code for section 3.1, and generates the data for the following Streamlit pages:

OV and QK circuits

`generate_st_html`

This is exclusively for generating the HTML figures that will be on the following Streamlit pages:

Browse Examples
Test Your Own Examples ======= Optionally, if you want Jupyter Lab you can run poetry run pip install jupyterlab (to install in the same virtual environment), and then run with poetry run jupyter lab.

Then the library can be imported as import transformer_lens.

Testing

If adding a feature, please add unit tests for it to the tests folder, and check that it hasn't broken anything major using the existing tests (install pytest and run it in the root TransformerLens/ directory).

Running the tests

All tests via make test
Unit tests only via make unit-test
Acceptance tests only via make acceptance-test

Formatting

This project uses pycln, isort and black for formatting, pull requests are checked in github actions.

Format all files via make format
Only check the formatting via make check-format

Demos

If adding a feature, please add it to the demo notebook in the demos folder, and check that it works in the demo format. This can be tested by replacing pip install git+https://github.com/neelnanda-io/TransformerLens.git with pip install git+https://github.com/<YOUR_USERNAME_HERE>/TransformerLens.git in the demo notebook, and running it in a fresh environment.

Citation

Please cite us with :

@article{copy_suppression,
  title={Copy Suppression: Comprehensively Understanding an Attention Head},
  author={McDougall, Callum and Conmy, Arthur and Rushing, Cody and McGrath, Thomas and Nanda, Neel},
  journal={arXiv preprint},
  year={2023},
}

(arXiv should be out soon!)

Name		Name	Last commit message	Last commit date
Latest commit History 1,310 Commits
.devcontainer		.devcontainer
artifacts		artifacts
assets		assets
demos		demos
docs		docs
easy_transformer		easy_transformer
tests		tests
transformer_lens		transformer_lens
.gitattributes		.gitattributes
.gitconfig		.gitconfig
.gitignore		.gitignore
LICENSE		LICENSE
Main_Demo.ipynb		Main_Demo.ipynb
README.md		README.md
further_comments.md		further_comments.md
ioi_patching_data.json		ioi_patching_data.json
makefile		makefile
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py
typing_demo.py		typing_demo.py

License

callummcdougall/SERI-MATS-2023-Streamlit-pages

Folders and files

Latest commit

History

Repository files navigation

To view our Streamlit, visit https://copy-suppression.streamlit.app/

Development Setup:

Difference from the main branch of TransformerLens

Guide to Experiments

Description of directories in transformer_lens/rs/callum2

ioi_and_bos

st_page

cspa

ov_qk_circuits

generate_st_html

Testing

Running the tests

Formatting

Demos

Citation

About

Resources

License

Stars

Watchers

Forks

Languages