Skip to content

callummcdougall/SERI-MATS-2023-Streamlit-pages

Repository files navigation

To view our Streamlit, visit https://copy-suppression.streamlit.app/

(If you're interested in our research, please reach out! Our emails are {cal.s.mcdougall, arthurconmy, thisiscodyr}@gmail.com)

This repo serves two purposes:

  1. An edited version of TransformerLens with a couple of extra features (see below).
  2. Hosting streamlit pages from https://github.com/callummcdougall/SERI-MATS-2023-Streamlit-pages/blob/main/transformer_lens/rs/callum2/st_page/Home.py

See transformer_lens/rs/arthurs_notebooks/example_notebook.py for example usage.

Development Setup:

This setup assumes you're using an SSH key to access Github. See here and the associated links on that page (if you don't have an SSH key to begin with)

$ git clone git@github.com:callummcdougall/SERI-MATS-2023-Streamlit-pages.git
$ cd TransformerLens
$ poetry install

pip install -e works too, though not using identical packages to poetry.lock plausibly could be a problem.

You need to have poetry installed; to do this run

curl -sSL https://install.python-poetry.org | python3 -

and then either try to edit PATH manually or do echo -e "$(cat ~/.bashrc)\nexport PATH=\"$HOME/.local/bin:\$PATH\"\n" > ~/.bashrc; source ~/.bashrc to run through the poetry install tricks on a linux machine.

You should add requirements, e.g einops, via running poetry add einops.

We stored some large files in git history and need clean them up; try git clone --depth 1 git@github.com:callummcdougall/SERI-MATS-2023-Streamlit-pages.git if git clone is lagging.

If you want to launch streamlit pages, run

pip install streamlit
cd transformer_lens/rs/callum2/st_page
streamlit run Home.py
  1. We set the ACCELERATE_DISABLE_RICH environment variable in transformer_lens/__init__.py to "1" to stop an annoying reformatting of notebook error messages
  2. We add the qkv_normalized_input hooks that can be optionally added to models

Guide to Experiments

  • Surveying the direct effects of individual attention heads: transformer_lens/rs/arthurs_notebooks/direct_effect_survey.py
  • (TODO: scan through the paper, ideally clean up the repo too)
  • (TODO: write a better implementation of the learnable scale and bias vectors)

Description of directories in transformer_lens/rs/callum2

(Written by Callum) These are the directories which I use to structure my own work.

ioi_and_bos

This directory is for 2 small investigations:

  1. How does the head manage to attend to BOS by default?

Conclusions - when you look at the cosine similarity of "residual stream vector before attn layer 10" and "query bias for head 10.7", it's very positive and in a very tight range for all tokens (between 0.45 and 0.47) whenever position is zero, and the same but very negative for all tokens whenever position isn't zero. So this isn't a function of BOS, it's a function of position. This has implications for how CSPA works; the query-side prediction has to overcome some threshold to actually activate the copy suppression mechanism.

  1. What's the perpendicular component of the query, in IOI?

Conclusions -

  • Adding semantically similar tokens "Mary", "mary" rather than just " Mary" doesn't seem to help.
  • Found weak evidence that there's some kind of "indirect prediction", because when you take the perpendicular component and put them through the MLPs it does favour IO over S1 (but the MLPs don't have much impact in IOI so this effect isn't large anyway).

st_page

Hosting all of the Streamlit pages. This isn't for generating any plots (at least I don't use it for that); it's exclusively for hosting pages & storing media files.

The pages are:

  1. OV and QK circuits - you get to see what tokens are most attended to (QK circuit, prediction-attention) and what tokens are most suppressed (OV circuit). It's a nice way to highlight semantic similarity, and build intuition for how it works.
  2. Browse Examples - the most important page. You get to investigate OWT examples, and see how all parts of the copy suppression mechanism works. You can:
    • See the loss change per token when you ablate, i.e. find the MIDS (tokens which the head is most helpful for).
    • See the logits pre and post-ablation, as well as the direct logit attribution for this head. You can confirm that the head is pushing down tokens which appear in context, for most of the MIDS examples.
    • Look at the attention patterns. You can confirm that the head is attending to the tokens which it pushes down.
    • Look at the logit lens before head 10.7. You can confirm that the head is predicting precisely the words which it is attending to.

cspa

This is where I get the copy suppression-preserving ablation results. In other words, the stuff that's gonna be in section 3.3 of the paper (and that makes up one of the Streamlit pages).

It also adds to the HTML plots dictionary, for the "Browse Examples" Streamlit page.

ov_qk_circuits

This generates code for section 3.1, and generates the data for the following Streamlit pages:

  1. OV and QK circuits

generate_st_html

This is exclusively for generating the HTML figures that will be on the following Streamlit pages:

  1. Browse Examples
  2. Test Your Own Examples ======= Optionally, if you want Jupyter Lab you can run poetry run pip install jupyterlab (to install in the same virtual environment), and then run with poetry run jupyter lab.

Then the library can be imported as import transformer_lens.

Testing

If adding a feature, please add unit tests for it to the tests folder, and check that it hasn't broken anything major using the existing tests (install pytest and run it in the root TransformerLens/ directory).

Running the tests

  • All tests via make test
  • Unit tests only via make unit-test
  • Acceptance tests only via make acceptance-test

Formatting

This project uses pycln, isort and black for formatting, pull requests are checked in github actions.

  • Format all files via make format
  • Only check the formatting via make check-format

Demos

If adding a feature, please add it to the demo notebook in the demos folder, and check that it works in the demo format. This can be tested by replacing pip install git+https://github.com/neelnanda-io/TransformerLens.git with pip install git+https://github.com/<YOUR_USERNAME_HERE>/TransformerLens.git in the demo notebook, and running it in a fresh environment.

Citation

Please cite us with :

@article{copy_suppression,
  title={Copy Suppression: Comprehensively Understanding an Attention Head},
  author={McDougall, Callum and Conmy, Arthur and Rushing, Cody and McGrath, Thomas and Nanda, Neel},
  journal={arXiv preprint},
  year={2023},
}

(arXiv should be out soon!)

About

Repo for hosting Streamlit pages for my 2023 SERI MATS project with Arthur Conmy (mentored by Neel Nanda).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published