Skip to content

Log number of single token features #87

@luciaquirke

Description

@luciaquirke

I think it would be super cool if we logged the number of features that are single token (e.g. top 50% of quantiles or 100% of max activating examples are on the same token) when verbose is set to True. This would be useful when tracking auto-interp scores over training or in other scenarios where it's not clear that the features are the true-and-final ones.

See https://transformer-circuits.pub/2023/monosemantic-features/index.html#appendix-automated-randomized

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions