Skip to content
View annahdo's full-sized avatar

Block or report annahdo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. counterfactuals counterfactuals Public

    Python 12 3

  2. implementing_activation_steering implementing_activation_steering Public

    A collection of different ways to implement accessing and modifying internal model activations for LLMs

    Jupyter Notebook 11

  3. exploring_directions exploring_directions Public

    We find concept directions in hidden layers of an LLM an use them for classification, activation steering and knowledge removal

    Jupyter Notebook 2 1

  4. pankessel/adv_explanation_ref pankessel/adv_explanation_ref Public

    reference implementation for "explanations can be manipulated and geometry is to blame"

    Python 35 12