Skip to content

where I learn and explore mechanistic interpretability of transformers

Notifications You must be signed in to change notification settings

SasankYadati/mech-interp

Repository files navigation

mech-interp

1. Write your own transformer

Run pip install -e .. I have written this transformer by following and solving the ARENA exercises. To test the layers, I have used the gpt from HookedTransformer.

Train the sample transformer using 1-train_sample_transformer.ipynb.

2. Introduction to Mechanistic Interpretability

  • explore transformer lens
  • visualize attention patterns
  • write basic detectors
  • understand induction heads
  • direct logit attribution
  • induction head ablation

About

where I learn and explore mechanistic interpretability of transformers

Topics

Resources

Stars

Watchers

Forks