# Llama debugging

I'll use this notebook to try and debug numerical errors described in this [issue](https://github.com/neelnanda-io/TransformerLens/issues/385).

In [7]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [8]:
from collections import defaultdict

import einops
import torch
from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer

from transformer_lens import HookedTransformer, utils

In [9]:
MODEL_NAME = "meta-llama/Llama-2-7b-hf"

In [10]:
hf_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float32,
)

tl_model = HookedTransformer.from_pretrained(
    MODEL_NAME,
    hf_model=AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float32,
    ),
    tokenizer=AutoTokenizer.from_pretrained(MODEL_NAME),
    fold_ln=False,
    fold_value_biases=False,
    center_writing_weights=False,
    center_unembed=False,
    torch_dtype=torch.float32,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 44.32 GiB of which 14.88 MiB is free. Process 2654179 has 44.29 GiB memory in use. Of the allocated memory 43.80 GiB is allocated by PyTorch, and 3.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [None]:
torch.set_grad_enabled(False)

<torch.autograd.grad_mode.set_grad_enabled at 0x7ff364312d10>

In [None]:
print(hf_model.model.layers[0].self_attn.max_position_embeddings)
print(tl_model.cfg.n_ctx)

4096
4096


In [None]:
print(hf_model.model.layers[0].self_attn.rotary_emb.dim)
print(tl_model.cfg.rotary_dim)
print(tl_model.cfg.d_head)

128
128
128


In [None]:
print(hf_model.model.layers[0].self_attn.rope_theta)


10000.0


In [None]:
def check_similarity_with_hf_model(
    tl_model: HookedTransformer,
    hf_model: AutoModelForCausalLM,
    atol: float,
    prompt="Hello world!",
):
    tokens = tl_model.tokenizer.encode(prompt, return_tensors="pt")
    tl_logits = tl_model(tokens, prepend_bos=False)
    hf_logits = hf_model(tokens).logits
    assert torch.allclose(tl_logits.cpu(), hf_logits.cpu(), atol=atol)

In [None]:
check_similarity_with_hf_model(
    tl_model,
    hf_model,
    atol=1e-4,
    prompt="Hello world!",
)

In [None]:
check_similarity_with_hf_model(
    tl_model,
    hf_model,
    atol=1e-5,
    prompt="Hello world!",
)

AssertionError: 