-
Notifications
You must be signed in to change notification settings - Fork 0
Update tlens and misc config settings #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nix-apollo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
| tokenizer_name: EleutherAI/pythia-14m | ||
| return_set: train # pile-10k only has train, so we take the first 90% for building and last 10% for ablations | ||
| return_set_frac: 0.01 | ||
| return_set_frac: 0.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could set to 0.07. This is ~1M toks, and what I used for the last set of edge calcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Ideally we'd have some sweeps for this, like you mentioned in 3a here https://apolloresearchhq.slack.com/archives/C06484S5UF9/p1701264768373199.
| self.register_buffer("mask", causal_mask) | ||
|
|
||
| self.register_buffer("IGNORE", torch.tensor(-1e5)) | ||
| self.register_buffer("IGNORE", torch.tensor(-torch.inf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this may break if all possible attention positions are masked, as softmax of a vector of all -torch.inf is nan.
transformerlens avoided this by checking for nans and replacing with 0s here. I'm not sure if we want that though -- it seems fine/good to me that things break if an attention head isn't allowed to pay attention to any positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support passing an attention_mask to the forward method of Attention, so we shouldn't be able to get a token attending to no other positions, unless I'm missing something. This makes me especially keen for things to break if something happens causing nans.
|
Imported updated requirements and ran tests successfully. |
|
I confirm this should close #205 |
|
I assume you are planning to put the change to the default value of eps in a different PR? |
Yep will do. Just made an issue for it #224. |
Update tlens and misc config settings
Description
Related Issue
Closes #217; Closes #216; Closes #205 (@nix-apollo please confirm, I think this issue may have been caused by the IGNORE that was too large?).
Motivation and Context
Having the old INGORE=1e-5 caused issues in both transformerlens and also in RIB tests (requiring much larger atols in tests). The other changes are also likely to improve any precision issues we have.
How Has This Been Tested?
Updated tests to point to the new modular arithmetic graphs. Same tests pass.
Does this PR introduce a breaking change?
Technically no. But everybody please reinstall the package that has an updated requirements.txt, otherwise things will break.
float32 vs float64 modular arithmetic graph build