Skip to content

Conversation

@SrGonao
Copy link
Collaborator

@SrGonao SrGonao commented Apr 1, 2025

This PR became bigger than I wanted because by mistake I added code from two different branches.

It adds:

  • Two new scoring methods, that don't require any model explanations.
  • Centers activating examples. For now, 3/4 of the window will be to the left of the maximal activating token in the previous windows, and 1/4 will be to the right. To do this, I'm discarding all activations from the first and last window of each batch (which is 1/4 of the total activations we have if we collect with ctx_len 256).
  • Some fixes to the online client
  • New example sampling option

SrGonao and others added 30 commits March 7, 2025 15:08
@SrGonao SrGonao merged commit 6b9af04 into main Apr 21, 2025
2 of 3 checks passed
@SrGonao SrGonao deleted the intruder branch June 3, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants