Log number of single token features

I think it would be super cool if we logged the number of features that are single token (e.g. top 50% of quantiles or 100% of max activating examples are on the same token) when verbose is set to True. This would be useful when tracking auto-interp scores over training or in other scenarios where it's not clear that the features are the true-and-final ones.

See https://transformer-circuits.pub/2023/monosemantic-features/index.html#appendix-automated-randomized

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Log number of single token features #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Log number of single token features #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions