Guarantees-Based Mechanistic Interpretability

This is the codebase for the Guarantees-Based Mechanistic Interpretability MARS stream. Successor to https://github.com/JasonGross/neural-net-coq-interp.

Setup

The code can be run under any environment with Python 3.9 and above.

We use poetry for dependency management, which can be installed following the instructions here.

To build a virtual environment with the required packages, simply run

poetry config virtualenvs.in-project true
poetry install

Notes

On some systems you may need to set the environment variable PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring to avoid keyring-based errors.
The first line tells poetry to create the virtual environment in the project directory, which allows VS Code to find the virtual environment.
If you are using caches from other machines, if you see errors like "dbm.error: db type is dbm.gnu, but the module is not available", you can probably solve the issue by following instructions from StackOverflow:
- sudo apt-get install libgdbm-dev python3-gdbm
- If you are using conda or some other Python version management, you can inspect the output of dpkg -L python3-gdbm and copy the lib-dynload/_gdbm.cpython-*-x86_64-linux-gnu.so file to the corresponding lib/ directory associated to the python you are using.

Running notebooks

To open a Jupyter notebook, run

poetry run jupyter lab

If this doesn't work (e.g. you have multiple Jupyter kernels already installed on your system), you may need to make a new kernel for this project:

poetry run python -m ipykernel install --user --name=gbmi

Training models

Models for existing experiments can be trained by running e.g.

poetry run python -m gbmi.exp_max_of_n.train

or by running e.g.

from gbmi.exp_max_of_n.train import MAX_OF_10_CONFIG
from gbmi.model import train_or_load_model

rundata, model = train_or_load_model(MAX_OF_10_CONFIG)

from a Jupyter notebook.

This function will attempt to pull a trained model with the specified config from Weights and Biases; if such a model does not exist, it will train the relevant model and save the weights to Weights and Biases.

Adding new experiments

The convention for this codebase is to store experiment-specific code in an exp_[NAME]/ folder, with

exp_[NAME]/analysis.py storing functions for visualisation / interpretability
exp_[NAME]/verification.py storing functions for verification
exp_[NAME]/train.py storing training / dataset code

See the exp_template directory for more details.

Adding dependencies

To add new dependencies, run poetry add my-package.

Code Style

We use black to format our code. To set up the pre-commit hooks that enforce code formatting, run

make pre-commit-install

Tests

This codebase advocates for expect tests in machine learning, and as such uses @ezyang's expecttest library for unit and regression tests.

[TODO: add tests?]

Name		Name	Last commit message	Last commit date
Latest commit History 897 Commits
.github		.github
gbmi		gbmi
notebooks_alex		notebooks_alex
notebooks_eo		notebooks_eo
notebooks_jason		notebooks_jason
notebooks_som		notebooks_som
notebooks_soufiane		notebooks_soufiane
.gitignore		.gitignore
.mailmap		.mailmap
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
notebooks_s.ipynb		notebooks_s.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
renovate.json		renovate.json
two_functions.ipynb		two_functions.ipynb

License

JasonGross/guarantees-based-mechanistic-interpretability

Folders and files

Latest commit

History

Repository files navigation

Guarantees-Based Mechanistic Interpretability

Setup

Running notebooks

Training models

Adding new experiments

Adding dependencies

Code Style

Tests

About

Resources

License

Stars

Watchers

Forks

Languages