<a target="_blank" href="https://colab.research.google.com/github/ai-safety-foundation/sparse_autoencoder/blob/main/docs/content/demo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Quick Start Training Demo

This is a quick start demo to get training a SAE right away. All you need to do is choose a few
hyperparameters (like the model to train on), and then set it off.
By default it replicates Neel Nanda's
[comment on the Anthropic dictionary learning
paper](https://transformer-circuits.pub/2023/monosemantic-features/index.html#comment-nanda).

## Setup

### Imports

In [4]:
# Check if we're in Colab
try:
    import google.colab  # noqa: F401 # type: ignore

    in_colab = True
except ImportError:
    in_colab = False

#  Install if in Colab
if in_colab:
    %pip install sparse_autoencoder transformer_lens transformers wandb

# Otherwise enable hot reloading in dev mode
if not in_colab:
    from IPython import get_ipython  # type: ignore

    ip = get_ipython()
    if ip is not None and ip.extension_manager is not None and not ip.extension_manager.loaded:
        ip.extension_manager.load("autoreload")  # type: ignore
        %autoreload 2

In [5]:
import os

from sparse_autoencoder import (
    sweep,
    SweepConfig,
    Hyperparameters,
    SourceModelHyperparameters,
    Parameter,
    SourceDataHyperparameters,
    Method,
    LossHyperparameters,
    OptimizerHyperparameters,
)
import wandb


os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["WANDB_NOTEBOOK_NAME"] = "demo.ipynb"

### Hyperparameters

Customize any hyperparameters you want below (by default we're sweeping over l1 coefficient and
learning rate):

In [6]:
sweep_config = SweepConfig(
    parameters=Hyperparameters(
        loss=LossHyperparameters(
            l1_coefficient=Parameter(values=[1e-3, 1e-4, 1e-5]),
        ),
        optimizer=OptimizerHyperparameters(
            lr=Parameter(values=[1e-3, 1e-4, 1e-5]),
        ),
        source_model=SourceModelHyperparameters(
            name=Parameter("gelu-2l"),
            hook_site=Parameter("mlp_out"),
            hook_layer=Parameter(0),
            hook_dimension=Parameter(512),
        ),
        source_data=SourceDataHyperparameters(
            dataset_path=Parameter("NeelNanda/c4-code-tokenized-2b"),
        ),
    ),
    method=Method.RANDOM
)

### Run the sweep

In [7]:
sweep(sweep_config=sweep_config)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Create sweep with ID: csjkat4q
Sweep URL: https://wandb.ai/alan-cooney/sparse-autoencoder/sweeps/csjkat4q


[34m[1mwandb[0m: Agent Starting Run: a6f5i3f8 with config:
[34m[1mwandb[0m: 	activation_resampler: {'dead_neuron_threshold': 0, 'max_resamples': 4, 'n_steps_collate': 100000000, 'resample_dataset_size': 819200, 'resample_interval': 200000000}
[34m[1mwandb[0m: 	autoencoder: {'expansion_factor': 4}
[34m[1mwandb[0m: 	loss: {'l1_coefficient': 0.0001}
[34m[1mwandb[0m: 	optimizer: {'adam_beta_1': 0.9, 'adam_beta_2': 0.99, 'adam_weight_decay': 0, 'amsgrad': False, 'fused': False, 'lr': 0.0001}
[34m[1mwandb[0m: 	pipeline: {'checkpoint_frequency': 100000000, 'log_frequency': 100, 'max_activations': 2000000000, 'max_store_size': 3145728, 'source_data_batch_size': 12, 'train_batch_size': 4096, 'validation_frequency': 314572800, 'validation_number_activations': 1024}
[34m[1mwandb[0m: 	random_seed: 49
[34m[1mwandb[0m: 	source_data: {'context_size': 128, 'dataset_path': 'NeelNanda/c4-code-tokenized-2b'}
[34m[1mwandb[0m: 	source_model: {'dtype': 'float32', 'hook_dimension':

Loaded pretrained model gelu-2l into HookedTransformer


Resolving data files:   0%|          | 0/28 [00:00<?, ?it/s]

Activations trained on:   0%|          | 0/2000000000 [00:00<?, ?it/s]

  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass


In [None]:
wandb.finish()