Pinned Loading
-
-
implementing_activation_steering
implementing_activation_steering PublicA collection of different ways to implement accessing and modifying internal model activations for LLMs
Jupyter Notebook 11
-
exploring_directions
exploring_directions PublicWe find concept directions in hidden layers of an LLM an use them for classification, activation steering and knowledge removal
-
pankessel/adv_explanation_ref
pankessel/adv_explanation_ref Publicreference implementation for "explanations can be manipulated and geometry is to blame"
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.