# Experiment: Working with Dolly
## Last Updated $DATE -  $AUTHOR.

```
Summary of High Level Research Question
```

Try to scope your experiments such you can answer your research question in 1-3 hours.
This is an ideal time block to enter flow / deep work, but short enough that you will still feel 
motivated by a relatively tight feedback loop.

If a problem seems like it needs more time that that, 

### High Level Experiment Design

## Goals:
```
List of specific goals that this experiment seeks to achieve.

This should fall under a few categories:
- Development of Intuition about a _specific_ topic
- Novel Research or Insight that could lead to a publishable result
- Meaningfully explore a topic which could lead to an improvement in product

Guiding principles should understanding, insight, and value creation.
```

## Tasks & Experiment Design

```
A list of specific tasks that are going to be tested 

```


## Outcomes

```
Document high level research findings and how
```


In [3]:
# Install things into ENV
# TODO: Setup up a container and push to docker that contains all these
%pip install git+https://github.com/neelnanda-io/TransformerLens.git
%pip install circuitsvis
%pip install plotly


Collecting git+https://github.com/neelnanda-io/TransformerLens.git
  Cloning https://github.com/neelnanda-io/TransformerLens.git to /tmp/pip-req-build-6s167ycj
  Running command git clone --filter=blob:none --quiet https://github.com/neelnanda-io/TransformerLens.git /tmp/pip-req-build-6s167ycj
  Resolved https://github.com/neelnanda-io/TransformerLens.git to commit 0ffcc8ad647d9e991f4c2596557a9d7475617773
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting datasets>=2.7.1
  Downloading datasets-2.12.0-py3-none-any.whl (474 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting wandb>=0.13.5
  Downloading wandb-0.15.1-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m98.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fa

In [None]:
# Generic Set of Imports for MI Research
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import einops
from fancy_einsum import einsum
import tqdm.auto as tqdm
import random
from pathlib import Path
from pprint import pprint
import plotly.express as px
from torch.utils.data import DataLoader

from jaxtyping import Float, Int
from typing import List, Union, Optional
from functools import partial
import copy

import itertools
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import dataclasses
import datasets
from IPython.display import HTML

In [5]:
import transformer_lens
import transformer_lens.utils as utils
from transformer_lens.hook_points import (
    HookedRootModule,
    HookPoint,
)  # Hooking utilities
from transformer_lens import HookedTransformer, HookedTransformerConfig, FactoredMatrix, ActivationCache

In [6]:
# Setup PyTorch configuration for inference based experiments
# NOTE: Mark as False if you want to do any kind of training 
#       as part of your experimentation

INFERENCE_ONLY_EXPERIMENT = True
if INFERENCE_ONLY_EXPERIMENT:
    torch.set_grad_enabled(False)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

import plotly.io as pio
pio.renderers.default = "notebook_connected"

cuda


In [7]:
def imshow(tensor, renderer=None, **kwargs):
    px.imshow(utils.to_numpy(tensor), color_continuous_midpoint=0.0, color_continuous_scale="RdBu", **kwargs).show(renderer)

def line(tensor, renderer=None, **kwargs):
    px.line(y=utils.to_numpy(tensor), **kwargs).show(renderer)

def scatter(x, y, xaxis="", yaxis="", caxis="", renderer=None, **kwargs):
    x = utils.to_numpy(x)
    y = utils.to_numpy(y)
    px.scatter(y=y, x=x, labels={"x":xaxis, "y":yaxis, "color":caxis}, **kwargs).show(renderer)

In [8]:
# Load Circuit Visualizations
# TODO: Explore building out our own packages / tooling
import circuitsvis as cv
# Testing that the library works
cv.examples.hello("Vivek")


In [9]:
# Load & Run a Model
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-7b")
hf_model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-7b")

print("Loaded hf_model, hooking transformer into TransformerLens!")
# model = HookedTransformer.from_pretrained(
#     "EleutherAI/pythia-6.9b-deduped",
#     center_unembed=False,
#     center_writing_weights=False,
#     fold_ln=False,
#     refactor_factored_attn_matrices=True,
#     hf_model=hf_model
# )

### Janky Shit
### TODO: Figure out how this library actually works and make this a cleaner integration.
import transformer_lens.loading_from_pretrained as loading
# Get the model name used in HuggingFace, rather than the alias.
official_model_name = loading.get_official_model_name("EleutherAI/pythia-6.9b-deduped")


# Load the config into an HookedTransformerConfig object. If loading from a
# checkpoint, the config object will contain the information about the
# checkpoint
cfg = loading.get_pretrained_model_config(
    official_model_name,
    checkpoint_index=None,
    checkpoint_value=None,
    fold_ln=False,
    device=device,
    n_devices=1,
)
print(cfg)
cfg.d_vocab = 50280
cfg.d_vocab_out = 50280
print(cfg)


# Get the state dict of the model (ie a mapping of parameter names to tensors), processed to match the HookedTransformer parameter names.
state_dict = loading.get_pretrained_state_dict(
    official_model_name, cfg, hf_model
)

# Create the HookedTransformer object
model = HookedTransformer(cfg, tokenizer=tokenizer)

model.load_and_process_state_dict(
    state_dict,
    fold_ln=False,
    center_writing_weights=False,
    center_unembed=False,
    refactor_factored_attn_matrices=False,
    move_state_dict_to_device=True,
)

print(f"Loaded pretrained model into HookedTransformer!")

model_description_text = """For this demo notebook we'll look at Dolly v2. It is based on pythia 6.9b, but we use the weights for dolly v2. To try the model the model out, let's find the loss on this paragraph!"""
# return_type of model can be loss, logits, both, or none!
loss = model(model_description_text, return_type="loss")
print("Model loss:", loss)


Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/13.8G [00:00<?, ?B/s]

In [None]:
# DOLLY V2 - 7B Config
pprint(model.cfg)

# Transformer Lens Note:
# get_token_position, to_tokens, to_string, to_str_tokens, prepend_bos, to_single_token
# are all methods that are added to the model object by TransformerLens

In [31]:
sample_string = "On halloween, all the children go Trick or"

print(model.to_str_tokens(sample_string)) # Shows tokenization split
print(model.to_tokens(sample_string)) #converts string to integer labeled tokens and then returns a tensor on models device of shape (batch, position)
# NOTE: in GPT2, 50256 is the token for EOS, BOS, and Padding.
# To single token converts string to a single integer, useful for looking up logits
# to_string converts a tensor of tokens to a string


model.generate(sample_string, 
               temperature=0,
               max_new_tokens=1)

['<|endoftext|>', 'On', ' hall', 'ow', 'een', ',', ' all', ' the', ' children', ' go', ' T', 'rick', ' or']
tensor([[   0, 2374, 7423,  319, 9673,   13,  512,  253, 2151,  564,  308, 4662,
          390]], device='cuda:0')


  0%|          | 0/1 [00:00<?, ?it/s]

'On halloween, all the children go Trick or Treat'

In [12]:
# Test Prompt Util -- Check the logit score of the expected output vs. the actual
#                     output
example_prompt = "the founder of Facebook is Mark"
example_answer = "Zuckerberg"

utils.test_prompt(example_prompt, example_answer, model, prepend_bos=True)

Tokenized prompt: ['<|endoftext|>', 'the', ' founder', ' of', ' Facebook', ' is', ' Mark']
Tokenized answer: [' Z', 'ucker', 'berg']


Top 0th token. Logit: 22.91 Prob: 99.94% Token: | Z|
Top 1th token. Logit: 15.02 Prob:  0.04% Token: | z|
Top 2th token. Logit: 12.46 Prob:  0.00% Token: | E|
Top 3th token. Logit: 12.16 Prob:  0.00% Token: | Elliot|
Top 4th token. Logit: 11.97 Prob:  0.00% Token: |
|
Top 5th token. Logit: 11.81 Prob:  0.00% Token: | Cuban|
Top 6th token. Logit: 11.52 Prob:  0.00% Token: |Z|
Top 7th token. Logit: 11.40 Prob:  0.00% Token: |  |
Top 8th token. Logit: 10.68 Prob:  0.00% Token: |us|
Top 9th token. Logit: 10.56 Prob:  0.00% Token: |.|


Top 0th token. Logit: 23.77 Prob: 99.60% Token: |ucker|
Top 1th token. Logit: 17.93 Prob:  0.29% Token: |uk|
Top 2th token. Logit: 16.55 Prob:  0.07% Token: |uck|
Top 3th token. Logit: 15.53 Prob:  0.03% Token: |uc|
Top 4th token. Logit: 13.66 Prob:  0.00% Token: |UCK|
Top 5th token. Logit: 12.91 Prob:  0.00% Token: |.|
Top 6th token. Logit: 11.87 Prob:  0.00% Token: |ub|
Top 7th token. Logit: 11.50 Prob:  0.00% Token: |ucks|
Top 8th token. Logit: 11.00 Prob:  0.00% Token: |ander|
Top 9th token. Logit: 10.94 Prob:  0.00% Token: |im|


Top 0th token. Logit: 25.24 Prob: 98.27% Token: |berg|
Top 1th token. Logit: 21.15 Prob:  1.65% Token: |burg|
Top 2th token. Logit: 17.57 Prob:  0.05% Token: |ber|
Top 3th token. Logit: 16.38 Prob:  0.01% Token: |borg|
Top 4th token. Logit: 15.81 Prob:  0.01% Token: |beg|
Top 5th token. Logit: 13.86 Prob:  0.00% Token: |bert|
Top 6th token. Logit: 13.72 Prob:  0.00% Token: |bur|
Top 7th token. Logit: 13.15 Prob:  0.00% Token: |­|
Top 8th token. Logit: 12.94 Prob:  0.00% Token: |b|
Top 9th token. Logit: 12.89 Prob:  0.00% Token: |berger|


In [27]:
pprint([(name, param.shape) for name, param in model.named_parameters()])

[('embed.W_E', torch.Size([50280, 4096])),
 ('blocks.0.ln1.w', torch.Size([4096])),
 ('blocks.0.ln1.b', torch.Size([4096])),
 ('blocks.0.ln2.w', torch.Size([4096])),
 ('blocks.0.ln2.b', torch.Size([4096])),
 ('blocks.0.attn.W_Q', torch.Size([32, 4096, 128])),
 ('blocks.0.attn.W_K', torch.Size([32, 4096, 128])),
 ('blocks.0.attn.W_V', torch.Size([32, 4096, 128])),
 ('blocks.0.attn.W_O', torch.Size([32, 128, 4096])),
 ('blocks.0.attn.b_Q', torch.Size([32, 128])),
 ('blocks.0.attn.b_K', torch.Size([32, 128])),
 ('blocks.0.attn.b_V', torch.Size([32, 128])),
 ('blocks.0.attn.b_O', torch.Size([4096])),
 ('blocks.0.mlp.W_in', torch.Size([4096, 16384])),
 ('blocks.0.mlp.b_in', torch.Size([16384])),
 ('blocks.0.mlp.W_out', torch.Size([16384, 4096])),
 ('blocks.0.mlp.b_out', torch.Size([4096])),
 ('blocks.1.ln1.w', torch.Size([4096])),
 ('blocks.1.ln1.b', torch.Size([4096])),
 ('blocks.1.ln2.w', torch.Size([4096])),
 ('blocks.1.ln2.b', torch.Size([4096])),
 ('blocks.1.attn.W_Q', torch.Size([32, 

In [35]:
# Testing out Dolly's Q/A ability.

model.generate(
    "Can you write a short story about a unicorn?",
    max_new_tokens=100,
)

# This Dolly is kind of dumb.
# We should try to get A100 x 8 Cluster ASAP and get a full sized model.

  0%|          | 0/100 [00:00<?, ?it/s]

'Can you write a short story about a unicorn? (covers description, esc in the story)\n\nFifteen-year-old Lauren wanted a unicorn pronto. She loved the beautiful creatures and keep seeing them everywhere. She saw one on a dinner plate at her friend Claire’s house and on a mobile phone case at her friend Claire’s house. She was disappointed when Claire told her that they weren’t real. Claire told her that they were a marketing gimmick from a toy commercial. Lauren believed Claire. She'

In [None]:
# Direct Logit Attribution
