## Setup

Step 1: Download the two models from this link and store them in `./checkpoints`.

Step 2: Use this notebook to load the models.

Step 3: If needed, convert the models to a HuggingFace version. (this is TransformerLens and might have different variable names, but the model is functionally identical to gpt2-small.)

Step 4: Interp.

In [1]:
from transformer_lens import HookedTransformer
import torch
from plotly import graph_objects as go
import plotly.express as px
import numpy as np
device = "cuda" if torch.cuda.is_available() else "cpu"

In [2]:
path = './checkpoints/'

In [21]:
# clustered model

clustered_model = HookedTransformer.from_pretrained("gpt2-small")
clustered_model.to(device)
path_clustered = '/home/b-sgolechha/research/nn-modularity/language-models/checkpoints/wiki_modular_mlp_in_model_epoch_6.pt'
clustered_model.load_state_dict(torch.load(path_clustered, map_location=device))

Loaded pretrained model gpt2-small into HookedTransformer
Moving model to device:  cuda


  clustered_model.load_state_dict(torch.load(path_clustered, map_location=device))


<All keys matched successfully>

In [23]:
# unclustered model

unclustered_model = HookedTransformer.from_pretrained("gpt2-small")
unclustered_model.to(device)
path_unclustered = '/home/b-sgolechha/research/nn-modularity/language-models/checkpoints/wiki_non_modular_mlp_in_model_epoch_2.pt'
unclustered_model.load_state_dict(torch.load(path_unclustered, map_location=device))

Loaded pretrained model gpt2-small into HookedTransformer
Moving model to device:  cuda


  unclustered_model.load_state_dict(torch.load(path_unclustered, map_location=device))


<All keys matched successfully>

In [24]:
# sanity check to see if the models are loaded correctly

input = 'The color of the darkness is'

clustered_output = clustered_model.generate(input, max_new_tokens=20)
unclustered_output = unclustered_model.generate(input, max_new_tokens=20)

print('Clustered model output: ', clustered_output)

print('Unclustered model output: ', unclustered_output)

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

Clustered model output:  The color of the darkness is unknown . It corresponds to the lunar eclipse and appears to be of the moon 's highest magnitude .
Unclustered model output:  The color of the darkness is slowly fading following the sun set , and launching of the Dawn has reduced the outer atmosphere 's surface


In [71]:
# get loss and accuracy on wiki dataset

from transformer_lens.evals import make_wiki_data_loader

wiki = make_wiki_data_loader(unclustered_model.tokenizer, batch_size=8)

36718


In [72]:
for idx, batch in enumerate(wiki.dataset['tokens']):
    
    batch = batch.to(device)
    unclustered_logits = unclustered_model.forward(batch)
    clustered_logits = clustered_model.forward(batch)
    unclustered_predictions = torch.argmax(unclustered_model.forward(batch), dim=-1)
    clustered_predictions = torch.argmax(clustered_model.forward(batch), dim=-1)

    # accuracy on second last token
    unclustered_accuracy = (unclustered_predictions[0, :-1] == batch[1:]).float().mean()
    clustered_accuracy = (clustered_predictions[0, :-1] == batch[1:]).float().mean()

    print(f'Unclustered accuracy: {round(float(100 * unclustered_accuracy), 3)}%')
    print(f'Clustered accuracy: {round(float(100 * clustered_accuracy), 3)}%')

    break

Unclustered accuracy: 52.884%
Clustered accuracy: 61.779%


In [None]:
# enjoy interp!