Skip to content

Conversation

@meghana-0211
Copy link

@meghana-0211 meghana-0211 commented Feb 22, 2025

Single Token Feature Detection

Adds functionality to detect and log features that primarily activate on single tokens, as requested in #87.

Changes

  • Added is_single_token field to LatentRecord
  • Implemented detection logic in constructors.py
  • Updated scorer and analysis code to track single token metrics
  • Added single token ratio to verbose logging output

Implementation Details

  • Features are considered "single token" if ≥80% of top 50% activations are concentrated on single tokens
  • Thresholds are configurable via parameters
  • Information is propagated through the entire pipeline
  • Results are included in logging when verbose mode is enabled

Closes #87

- Add is_single_token field to LatentRecord
- Implement single token detection in constructors
- Update scorer and analysis to track single token metrics
- Log single token ratio in verbose mode

Closes EleutherAI#87
@CLAassistant
Copy link

CLAassistant commented Feb 22, 2025

CLA assistant check
All committers have signed the CLA.

@SrGonao
Copy link
Collaborator

SrGonao commented Feb 22, 2025

If you are willing to give it another shot, I had already started in #89. I think it should be done after caching and not when you are building the feature examples (although I think having and option to skip single token latents is nice). Maybe we can have both of them - my method as a diagnostics tool that is used when verbose it true, and your suggestion as maybe an argument (skip_single_token ?) that can be used to not explain single token latents? @luciaquirke, @norabelrose maybe you have ideas?

@SrGonao SrGonao mentioned this pull request Feb 22, 2025
max_examples = constructor_cfg.max_examples
min_examples = constructor_cfg.min_examples

token_windows, act_windows = pool_max_activation_windows(
Copy link
Contributor

@luciaquirke luciaquirke Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this call has been duplicated by mistake. I don't believe this code will run. I have added documentation on how to run the tests to the README.md.

Copy link
Contributor

@luciaquirke luciaquirke Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meghana-0211 try pytest . and python -m delphi.tests.e2e.

# For each example, check if activation is concentrated in a single position
max_activations = activations.max(dim=1).values
top_k = int(len(max_activations) * quantile_threshold)
top_indices = max_activations.topk(top_k).indices
Copy link
Contributor

@luciaquirke luciaquirke Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the activations are already in top k order (tbh they should probably be renamed to ordered_example_acts or similar to reflect this, starting in _top_k_pools). So we should be able to directly slice the first len(activations) * quantile_threshold activations.

# Calculate ratio of single token activations
single_token_ratio = (significant_activations == 1).float().mean().item()

return single_token_ratio >= activation_ratio_threshold
Copy link
Contributor

@luciaquirke luciaquirke Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like this method actually checks whether the activations are single token 🤔 maybe I'm just confused but I can't see it

@luciaquirke
Copy link
Contributor

luciaquirke commented Feb 25, 2025

Looks like there may have been a miscommunication - single token feature is supposed to mean that the feature fires on the same token every time, not that it fires on one token within each sequence. This code also won't run as is. That said, the code in result_analysis looks great and could be pulled verbatim into a future PR for this issue.

@SrGonao I'm curious what the issue is with doing this type of non-LLM calling data processing in the constructor, could you say a bit more about your reasoning?

@SrGonao
Copy link
Collaborator

SrGonao commented Feb 25, 2025

I think if we want to get statistics as a diagnostics tool it should be done after caching.
If we want to skip single token latents it should be done in the constructor.

@luciaquirke
Copy link
Contributor

Yeah I think we probably just want statistics and not skipping single token explanation to keep things simple for now. I trust that we shouldn't collect statistics before caching is complete, am curious as to the reasoning.

@SrGonao
Copy link
Collaborator

SrGonao commented Feb 26, 2025

Sorry, by after caching I meant right after. I think slowing down the constructor leads to underutilization of the GPUs and so in my mind I would like to keep as much "work" that can be done beforehand out of the constructor. Maybe it doesn't make a difference though, I just thought it to be more natural.

@luciaquirke
Copy link
Contributor

Closing due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log number of single token features

4 participants