GitHub

Note - this repo is depreciated, see here for the more up to date version.

This repository allows you to create visualisations of the features found by a sparse autoencoder, like the one below (link here).

This particular feature seems to be a fuzzy skip trigram, with the pattern being (django syntax), ..., (' -> django. You can confirm this by taking some of the text in the top activations that comes immediately before the bracket (e.g. created_on or first_name), copying it into GPT4 and asking it to identify which library is being used - it will correctly identify these as instances of Django syntax. Furthermore, we can see that this feature boosts django a lot more than any other token.

These visualisations were created using the GELU-1l model from Neel Nanda's HuggingFace library, as well as an autoencoder which he trained on its single layer of neuron activations (see this Colab from Neel).

You can use my Colab to generate more of these visualisations. You can use this sae visualiser to navigate through the first thousand features of the aforementioned autoencoder.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
css		css
html		html
LICENSE		LICENSE
README.md		README.md
data_0007.html		data_0007.html
data_fns.py		data_fns.py
html_fns.py		html_fns.py
model_fns.py		model_fns.py
render_html.ipynb		render_html.ipynb
requirements.txt		requirements.txt
utils_fns.py		utils_fns.py
vocab_dict.json		vocab_dict.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

callummcdougall/sae_visualizer

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages