-
Notifications
You must be signed in to change notification settings - Fork 52
Closed
Description
I think it would be super cool if we logged the number of features that are single token (e.g. top 50% of quantiles or 100% of max activating examples are on the same token) when verbose is set to True. This would be useful when tracking auto-interp scores over training or in other scenarios where it's not clear that the features are the true-and-final ones.
See https://transformer-circuits.pub/2023/monosemantic-features/index.html#appendix-automated-randomized