# Single-fact editing

We inject a counterfactual knowledge edit to our Larimar model using its memory and then quantify if this edit is indeed present, by a subsequent generation.

We compare to "no edit" (skipping memory) and "ICL" (prepending the edit directly to the prompt) setups.
    

In [1]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import random
random.seed(42)

import torch
from torch.nn.utils.rnn import pad_sequence

In [4]:
from counterfact_eval_rephrase import load_pyrite
from counterfact_eval_rephrase import latent_code_from_text

In [5]:
checkpoint_path = '../models/larimar-1.3b-c3.ckpt'

In [6]:
%%time
model, tokenizer = load_pyrite(checkpoint_path)

MemNetLight init()
encoder_model_type bert
encoder_model_name_or_path bert-large-cased
cache_dir ../cache
load_pretrained False
ParseResult(scheme='https', netloc='s3.amazonaws.com', path='/models.huggingface.co/bert/bert-large-cased-vocab.txt', params='', query='', fragment='')
get_from_cache https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt ../cache
ParseResult(scheme='https', netloc='s3.amazonaws.com', path='/models.huggingface.co/bert/gpt2-large-vocab.json', params='', query='', fragment='')
get_from_cache https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-vocab.json ../cache
ParseResult(scheme='https', netloc='s3.amazonaws.com', path='/models.huggingface.co/bert/gpt2-large-merges.txt', params='', query='', fragment='')
get_from_cache https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-large-merges.txt ../cache


01/16/2025 15:38:44 - INFO - lightning_model -   Added 3 tokens to GPT2
You are using a model of type bert to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Some weights of BertForLatentConnector were not initialized from the model checkpoint at bert-large-cased and are newly initialized: ['bert.linear.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2ForLatentConnector were not initialized from the model checkpoint at gpt2-large and are newly initialized: ['linear_emb.weight', 'linear.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


CPU times: user 11.3 s, sys: 14.4 s, total: 25.7 s
Wall time: 48.6 s


In [7]:
enc_tok, dec_tok = tokenizer

This is an example adapted from `CounterFactDataset` (Please refer to `eval_rephrase.sh` for evaluating the full dataset).

This is about Danielle Darrieux https://en.wikipedia.org/wiki/Danielle_Darrieux who was a French actor. 

Our (counterfactual) edit is that Danielle Darrieux was English speaking. 




In [11]:
subject     = 'Danielle Darrieux'
target_true = 'French'
target_new  = 'English'


prompts = [
    'The mother tongue of Danielle Darrieux is',
    'Danielle Darrieux was a native',
    'Danielle Darrieux spoke the language'
]

text_write_to_mem = ['The mother tongue of Danielle Darrieux is English']
text_to_query     = ['The mother tongue of Danielle Darrieux is']



new_fact    = 'The mother tongue of Danielle Darrieux is English.'
icl_prompts = [f'{new_fact} {prompt}'
               for prompt in prompts
              ]


In [12]:
prompts

['The mother tongue of Danielle Darrieux is',
 'Danielle Darrieux was a native',
 'Danielle Darrieux spoke the language']

In [13]:
icl_prompts

['The mother tongue of Danielle Darrieux is English. The mother tongue of Danielle Darrieux is',
 'The mother tongue of Danielle Darrieux is English. Danielle Darrieux was a native',
 'The mother tongue of Danielle Darrieux is English. Danielle Darrieux spoke the language']

So let's teach our model that Danielle Darrieux's mother tongue was English! 

In Larimar we can do this without finetuning our model to cater for the new, updated fact we want to introduce to it. All it takes is writing the new fact to Larimar's memory.

In [14]:
text_write_to_mem

['The mother tongue of Danielle Darrieux is English']

As a first step let's encode the new fact:

In [15]:
memory_write_text_encoded, cls = latent_code_from_text(text_write_to_mem, 
                                                       enc_tok, 
                                                       model, 
                                                       device='cuda')

memory_write_text_encoded      = memory_write_text_encoded.reshape(len(text_write_to_mem), 
                                                                   1, 
                                                                   model._code_size)

Then we can write the new fact encoding to Larimar's memory:

In [16]:
posterior_memory, dkl_M = model.write(input_encoded=memory_write_text_encoded)

Let's now see how our model responds when we ask it about Danielle Darrieux. Was she French-speaking (i.e. the original fact) or English speaking (i.e the new fact we just introduced)?

Let's encode our 3 prompts (prefixes) for the purpose of reading from Larimar memory.

In [17]:
prefixes         = prompts
enc_tok, dec_tok = tokenizer

prefix_lens = []
for p in prefixes:
    prefix_lens.append(len(dec_tok.encode(p)) + 1)  # +1 for <BOS>

encoded_read, cls = latent_code_from_text(prefixes, 
                                          enc_tok, 
                                          model, 
                                          device='cuda')

In [18]:
encoded_read.shape

torch.Size([3, 768])

We use the encoded prompts to get our memory read encodings:

In [30]:
zs, _ = model.read(encoded_read.unsqueeze(1), 
                   posterior_memory, 
                   deterministic=True)
zs = zs.squeeze()
zs.shape

torch.Size([3, 768])

With the read encodings on the side we are now ready to investigate whether Larimar thinks that Danielle Darrieux was
French speaking or English speaking. 

We have 3 prompts (prefixes) and each of it can be appended with either "English" or "French" as an continuation (suffix).
We tokenize these 6 statements.


In [22]:
prefixes

['The mother tongue of Danielle Darrieux is',
 'Danielle Darrieux was a native',
 'Danielle Darrieux spoke the language']

In [24]:
data = [f'<BOS>{prefix} {suffix}'
            for prefix in prefixes
            for suffix in [target_new, target_true]]
data

['<BOS>The mother tongue of Danielle Darrieux is English',
 '<BOS>The mother tongue of Danielle Darrieux is French',
 '<BOS>Danielle Darrieux was a native English',
 '<BOS>Danielle Darrieux was a native French',
 '<BOS>Danielle Darrieux spoke the language English',
 '<BOS>Danielle Darrieux spoke the language French']

In [26]:
prompt_tok = pad_sequence([torch.tensor(dec_tok.encode(d), 
                                        dtype=torch.long) for d in data], 
                          batch_first=True, 
                          padding_value=0).to('cuda')
len(prompt_tok)

6

Since, we are going to use each of the 3 read encodings from the prompts 2 times (for the "English" and the "French" statements) we repeat them to accordingly align to the 6 statements

In [31]:
zs = zs.repeat_interleave(2, dim=0)
zs.shape

torch.Size([6, 768])

We are now ready to calculate the mean of `-log(softmax(logits[token])`'s for tokens for "English" (new) and for "French" (true) as computed by the model, when provided with the prompt and told to do token generation.

Larger-likelihood tokens, will correspond to larger logit values and thus larger probabilities (after softmax). Larger probabilities will be represented as powers with smaller magnitude (negative) exponents. 

These exponent magnitudes are the metrics what we report: smaller is for the tokens the model prefers generating.

So if we subtract the metric for "English" (new) minus the metric for "French" (true), we expect the *difference*  to be negative if the model is indeed updated to the "new" knowledge that Danielle Darrieux was "English": $\Delta_{new-true} < 0$
(and it will be positive if the model has not incorporated the "new" knowledge).


Let's also calculate some auxiliary quantities:

In [32]:
a_tok                      = dec_tok.encode(target_new)
b_tok                      = dec_tok.encode(target_true)

choice_a_len, choice_b_len = (len(n) for n in [a_tok, b_tok])

(target_new, a_tok, choice_a_len), (target_true, b_tok, choice_b_len)


(('English', [3594], 1), ('French', [4141], 1))

## Unconditional generation in Larimar: Skipping memory

First let's see how unconditional generation works. Although we have read encodings on the side we do not provide them to Larimar decoder. 
Our `past` encodings are zero vectors. 

In [51]:
past = torch.zeros(zs.shape).to('cuda')

with torch.no_grad():
    logits = model.decoder(prompt_tok, past)[0]

results = np.zeros((logits.size(0),), dtype=np.float32)

for i in range(logits.size(0)):
    cur_len = choice_a_len if i % 2 == 0 else choice_b_len
    for j in range(cur_len):
        cur_tok = (a_tok if i % 2 == 0 else b_tok)[j]
        try:
            results[i] += -torch.nn.functional.log_softmax(logits[i, prefix_lens[i // 2] + j - 1, :], dim=0)[cur_tok].item()
        except:
            continue
    results[i] /= cur_len

start = len('<BOS>')
result_list = [{"prompt_new":  data[i][start:],   "target_new":  results[i].item(),
                "prompt_true": data[i+1][start:], "target_true": results[i + 1].item()}
       for i in range(0, len(results), 2)]


unconditional_result_list = result_list

In [50]:
unconditional_result_list

[{'prompt_new': 'The mother tongue of Danielle Darrieux is English',
  'target_new': 2.936532735824585,
  'prompt_true': 'The mother tongue of Danielle Darrieux is French',
  'target_true': 0.4290691912174225},
 {'prompt_new': 'Danielle Darrieux was a native English',
  'target_new': 6.317409038543701,
  'prompt_true': 'Danielle Darrieux was a native French',
  'target_true': 3.8996145725250244},
 {'prompt_new': 'Danielle Darrieux spoke the language English',
  'target_new': 9.241735458374023,
  'prompt_true': 'Danielle Darrieux spoke the language French',
  'target_true': 8.668529510498047}]

We report the list of $\Delta_{new-true}$'s for the three prompts: 

In [52]:
difference_list = [item['target_new'] - item['target_true'] for item in unconditional_result_list]
difference_list

[2.5074635446071625, 2.4177944660186768, 0.5732059478759766]

Larimar model reports smaller values for the "French" (true) rather than the "English" (new) variant for all our 3 prompts.

So unconditional generation in Larimar works as expected: Danielle Darrieux was French, model has not incorporated the "new" knowledge.

## Memory-conditioned generation in Larimar

Let's see how the memory read encodings actually work in Larimar and whether they steer our model towards incorporating the "new" knowledge that Danielle Darrieux was English speaking. We just supply memory read encodings as `past` vectors to the decoder.


In [53]:
past = zs.detach().clone()

with torch.no_grad():
    logits = model.decoder(prompt_tok, past)[0]

results = np.zeros((logits.size(0),), dtype=np.float32)

for i in range(logits.size(0)):
    cur_len = choice_a_len if i % 2 == 0 else choice_b_len
    for j in range(cur_len):
        cur_tok = (a_tok if i % 2 == 0 else b_tok)[j]
        try:
            results[i] += -torch.nn.functional.log_softmax(logits[i, prefix_lens[i // 2] + j - 1, :], dim=0)[cur_tok].item()
        except:
            continue
    results[i] /= cur_len


start = len('<BOS>')
result_list = [{"prompt_new":  data[i][start:],   "target_new":  results[i].item(),
                "prompt_true": data[i+1][start:], "target_true": results[i + 1].item()}
       for i in range(0, len(results), 2)]


conditional_result_list = result_list

In [54]:
conditional_result_list

[{'prompt_new': 'The mother tongue of Danielle Darrieux is English',
  'target_new': 3.433168603805825e-05,
  'prompt_true': 'The mother tongue of Danielle Darrieux is French',
  'target_true': 17.063688278198242},
 {'prompt_new': 'Danielle Darrieux was a native English',
  'target_new': 10.938028335571289,
  'prompt_true': 'Danielle Darrieux was a native French',
  'target_true': 11.506501197814941},
 {'prompt_new': 'Danielle Darrieux spoke the language English',
  'target_new': 11.552000045776367,
  'prompt_true': 'Danielle Darrieux spoke the language French',
  'target_true': 15.412997245788574}]

We report the list of $\Delta_{new-true}$'s for the three prompts: 

In [55]:
difference_list = [item['target_new'] - item['target_true'] for item in conditional_result_list]
difference_list

[-17.063653946512204, -0.5684728622436523, -3.860997200012207]

Larimar model reports smaller values for the "English" (new) rather than the "French" (true) variant for all our 3 prompts.

So conditional generation in Larimar works as expected: Now Danielle Darrieux was updated to being "English"-speaking, 
our model has incorporated the "new" knowledge (although it a counterfactual one).

In particular for the first prefix (`rewrite prompt`), which is the one that was used in the new knowledge written to Larimar memory, $\Delta_{new-true}$' is minimal. For the other two prefixes (`paraphrase prompts`) values are less in magnitude but still negative.


## ICL setup

We now assume that we do not have a memory module but we still want to supply the "new" knowledge to the model "directly", without incurring any additional finetuning costs for the edit (similarly to Larimar's lightweight model editing via memory). 

We can do this via ICL (In-Context Learning): Our prefixes are appended with the "new" knowledge ('The mother tongue of Danielle Darrieux is English.') and then Larimar decoder is used (unconditional generation but with an "new"-knowledge informed and longer prefix).

In [59]:
new_fact = 'The mother tongue of Danielle Darrieux is English.'
icl_data = [f"<BOS>{new_fact} {prefix} {suffix}"
            for prefix in prefixes
            for suffix in [target_new, target_true]]
icl_data

['<BOS>The mother tongue of Danielle Darrieux is English. The mother tongue of Danielle Darrieux is English',
 '<BOS>The mother tongue of Danielle Darrieux is English. The mother tongue of Danielle Darrieux is French',
 '<BOS>The mother tongue of Danielle Darrieux is English. Danielle Darrieux was a native English',
 '<BOS>The mother tongue of Danielle Darrieux is English. Danielle Darrieux was a native French',
 '<BOS>The mother tongue of Danielle Darrieux is English. Danielle Darrieux spoke the language English',
 '<BOS>The mother tongue of Danielle Darrieux is English. Danielle Darrieux spoke the language French']

Let's tokenize:

In [60]:
icl_prompt_tok = pad_sequence([torch.tensor(dec_tok.encode(d), 
                                        dtype=torch.long) for d in icl_data], 
                          batch_first=True, 
                          padding_value=0).to('cuda')
len(icl_prompt_tok)

6

And then compute the ICL prefixes and token lengths needed next:

In [66]:
new_fact = 'The mother tongue of Danielle Darrieux is English.'
icl_prefixes = [f"<BOS>{new_fact} {prefix}"
                for prefix in prefixes]

icl_prefix_lens = []
for p in icl_prefixes:
    icl_prefix_lens.append(len(dec_tok.encode(p)) + 1)  # +1 for <BOS>

In [67]:
past = torch.zeros(zs.shape).to('cuda')

with torch.no_grad():
    logits = model.decoder(icl_prompt_tok, past)[0]

results = np.zeros((logits.size(0),), dtype=np.float32)

for i in range(logits.size(0)):
    cur_len = choice_a_len if i % 2 == 0 else choice_b_len
    for j in range(cur_len):
        cur_tok = (a_tok if i % 2 == 0 else b_tok)[j]
        try:
            results[i] += -torch.nn.functional.log_softmax(logits[i, icl_prefix_lens[i // 2] + j - 1, :], dim=0)[cur_tok].item()
        except:
            continue
    results[i] /= cur_len

start = len('<BOS>')
result_list = [{"prompt_new":  data[i][start:],   "target_new":  results[i].item(),
                "prompt_true": data[i+1][start:], "target_true": results[i + 1].item()}
       for i in range(0, len(results), 2)]


icl_result_list = result_list

In [68]:
icl_result_list

[{'prompt_new': 'The mother tongue of Danielle Darrieux is English',
  'target_new': 7.133586406707764,
  'prompt_true': 'The mother tongue of Danielle Darrieux is French',
  'target_true': 5.582711219787598},
 {'prompt_new': 'Danielle Darrieux was a native English',
  'target_new': 9.786528587341309,
  'prompt_true': 'Danielle Darrieux was a native French',
  'target_true': 7.93894100189209},
 {'prompt_new': 'Danielle Darrieux spoke the language English',
  'target_new': 8.760000228881836,
  'prompt_true': 'Danielle Darrieux spoke the language French',
  'target_true': 6.0284271240234375}]

We report the list of $\Delta_{new-true}$'s for the three extended prompts: 

In [70]:
difference_list = [item['target_new'] - item['target_true'] for item in icl_result_list]
difference_list

[1.550875186920166, 1.8475875854492188, 2.7315731048583984]

Prepending the "new" knowledge in the prefixes in this setup does not seem to change the signs of $\Delta_{new-true}$'s, as compared to the unconditional generation, which skips memory (i.e. the case that essentially does no edits). 