# Contrastive Knowledge Assesment (CKA) Notebook for Running Experiments
This notebook enables interactive experimentation with CKA for models including `Flan-ul2`, `Flan-t5`, `OPT`, `GPT-Neo`, `Roberta`, `Bert`, and `GPT2` models.
The goal is to probe if factual statements are predicted at a higher probability than a given counterfactuals.

<a target="_blank" href="https://colab.research.google.com/github/daniel-furman/Capstone/blob/main/notebooks/cka_run_main.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


## Dependencies

In [None]:
!git clone https://github.com/daniel-furman/Capstone.git

Cloning into 'Capstone'...
remote: Enumerating objects: 502, done.[K
remote: Counting objects: 100% (159/159), done.[K
remote: Compressing objects: 100% (90/90), done.[K
remote: Total 502 (delta 74), reused 127 (delta 46), pack-reused 343[K
Receiving objects: 100% (502/502), 24.32 MiB | 16.01 MiB/s, done.
Resolving deltas: 100% (236/236), done.


In [None]:
!pip install -r /content/Capstone/requirements.txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece==0.1.97
  Downloading sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
Collecting transformers==4.26.1
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m105.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.16.0
  Downloading accelerate-0.16.0-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.7/199.7 KB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes==0.37.0
  Downloading bitsandbytes-0.37.0-py3-none-any.whl (76.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.3/76.3 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
Co

## Imports

In [None]:
import os

In [None]:
os.chdir('/content/Capstone/src/cka_scripts')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

## CLI usage

In [None]:
!python run_cka.py configs.rome_full.gpt2_rome_full

CKA for distilgpt2
Loading  model...
Downloading (…)lve/main/config.json: 100% 762/762 [00:00<00:00, 121kB/s]
Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:01<00:00, 772kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:01<00:00, 410kB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:01<00:00, 1.23MB/s]
Downloading pytorch_model.bin: 100% 353M/353M [00:00<00:00, 392MB/s]

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 46.9kB/s]
Running comparisons...
100% 21919/21919 [09:56<00:00, 36.75it/s]
Done


Score dict summary:
{'distilgpt2': 'This model predicted 15099/21919 facts at a higher prob than the given counterfactual. The mean p_true / (p_true + p_false) was 0.6568 while the mean p_true was 0.0095'}


In [None]:
!python run_cka.py configs.tests.opt_v0

CKA for facebook/opt-125m
Loading  model...
Downloading (…)okenizer_config.json: 100% 685/685 [00:00<00:00, 107kB/s]
Downloading (…)lve/main/config.json: 100% 651/651 [00:00<00:00, 104kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:01<00:00, 810kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 509kB/s]
Downloading (…)cial_tokens_map.json: 100% 441/441 [00:00<00:00, 165kB/s]
Downloading pytorch_model.bin: 100% 251M/251M [00:02<00:00, 87.2MB/s]

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Downloading (…)neration_config.json: 100% 137/137 [00:00<00:00, 24.0kB/s]
Running comparisons...
100% 3/3 [00:01<00:00,  1.54it/s]
Done

CKA for facebook/opt-350m
Loading  model...
Downloading (…)okenizer_config.json: 100% 685/685 [00:00<00:00, 119kB/s]
Downloading (…)lve/main/config.json: 100% 644/644 [00:00<00:00, 237kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:01<00

In [None]:
!python run_cka.py configs.tests.gpt2_v0

CKA for distilgpt2
Loading  model...

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Running comparisons...
100% 3/3 [00:00<00:00,  3.83it/s]
Done

CKA for gpt2
Loading  model...
Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 122kB/s]
Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:01<00:00, 939kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 512kB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:01<00:00, 1.22MB/s]
Downloading pytorch_model.bin: 100% 548M/548M [00:01<00:00, 447MB/s]
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 37.5kB/s]
Running comparisons...
100% 3/3 [00:00<00:00,  7.63it/s]
Done

CKA for gpt2-medium
Loading  model...
Downloading (…)lve/main/config.json: 100% 718/718 [00:00<00:00, 264kB/s]
Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:01<00:00, 786kB/s]
Downloading (…)olve/main/merges.txt: 100% 4

In [None]:
!python run_cka.py configs.tests.bert_v0

CKA for google/bert_uncased_L-2_H-128_A-2
Loading  model...
Downloading (…)lve/main/config.json: 100% 382/382 [00:00<00:00, 59.1kB/s]
Downloading (…)solve/main/vocab.txt: 100% 232k/232k [00:00<00:00, 349kB/s]
Downloading pytorch_model.bin: 100% 17.7M/17.7M [00:00<00:00, 101MB/s]
Some weights of the model checkpoint at google/bert_uncased_L-2_H-128_A-2 were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Running comparisons...
100% 3/3 [00:00<00:00,  4.51it/s]
Done

CKA for 

In [None]:
!python run_cka.py configs.tests.roberta_v0

CKA for distilroberta-base
Loading  model...
Downloading (…)lve/main/config.json: 100% 480/480 [00:00<00:00, 74.7kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:01<00:00, 811kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 508kB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:01<00:00, 1.21MB/s]
Downloading pytorch_model.bin: 100% 331M/331M [00:03<00:00, 93.6MB/s]
Running comparisons...
100% 3/3 [00:00<00:00,  4.14it/s]
Done

CKA for roberta-base
Loading  model...
Downloading (…)lve/main/config.json: 100% 481/481 [00:00<00:00, 90.6kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:01<00:00, 804kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 514kB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:01<00:00, 1.02MB/s]
Downloading pytorch_model.bin: 100% 501M/501M [00:01<00:00, 430MB/s]
Running comparisons...
100% 3/3 [00:00<00:00, 12.78it/s]
Done

CKA for xlm-roberta-base
Loading  model...
Down

In [None]:
!python run_cka.py configs.tests.flan_t5_v0

CKA for google/flan-t5-small
Loading  model...
Downloading spiece.model: 100% 792k/792k [00:00<00:00, 35.4MB/s]
Downloading (…)cial_tokens_map.json: 100% 2.20k/2.20k [00:00<00:00, 368kB/s]
Downloading (…)okenizer_config.json: 100% 2.54k/2.54k [00:00<00:00, 1.03MB/s]
Downloading (…)lve/main/config.json: 100% 1.40k/1.40k [00:00<00:00, 262kB/s]
Downloading pytorch_model.bin: 100% 308M/308M [00:00<00:00, 462MB/s]

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Downloading (…)neration_config.json: 100% 147/147 [00:00<00:00, 57.8kB/s]
Running comparisons...
100% 3/3 [00:02<00:00,  1.31it/s]
Done

CKA for google/flan-t5-base
Loading  model...
Downloading spiece.model: 100% 792k/792k [00:00<00:00, 48.1MB/s]
Downloading (…)cial_tokens_map.json: 100% 2.20k/2.20k [00:00<00:00, 785kB/s]
Downloading (…)okenizer_config.json: 100% 2.54k/2.54k [00:00<00:00, 916kB/s]
Downloading (…)lve/main/config.json: 100% 1.40k/1.40k [0

In [None]:
!python run_cka.py configs.tests.eleutherai_v0

CKA for EleutherAI/gpt-neo-125M
Loading  model...
Downloading (…)okenizer_config.json: 100% 560/560 [00:00<00:00, 86.6kB/s]
Downloading (…)lve/main/config.json: 100% 1.01k/1.01k [00:00<00:00, 164kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:01<00:00, 810kB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 514kB/s]
Downloading (…)cial_tokens_map.json: 100% 357/357 [00:00<00:00, 141kB/s]
Downloading pytorch_model.bin: 100% 526M/526M [00:05<00:00, 100MB/s] 

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Running comparisons...
100% 3/3 [00:01<00:00,  1.98it/s]
Done

CKA for EleutherAI/gpt-neo-1.3b
Loading  model...
Downloading (…)okenizer_config.json: 100% 200/200 [00:00<00:00, 73.7kB/s]
Downloading (…)lve/main/config.json: 100% 1.35k/1.35k [00:00<00:00, 496kB/s]
Downloading (…)olve/main/vocab.json: 100% 798k/798k [00:01<00:00, 721kB/s]
Downloading (…)olve/main/merges.txt: 1

## Notebook usage

In [None]:
from run_cka import main

### gpt2 example with "verbosity" turned on

In [None]:
config = {
    "models": [
        "distilgpt2",
    ],
    "input_information": {
        "0": {
            "stem": "The 2020 Olympics were held in",
            "true": "Tokyo",
            "false": ["London", "Berlin", "Chicago"],
        },
        "1": {
            "stem": "Operation Overlord took place in",
            "true": "Normandy",
            "false": ["Manila", "Santiago", "Baghdad"],
        },
        "2": {
            "stem": "Steve Jobs is the founder of",
            "true": "Apple",
            "false": ["Microsoft", "Oracle", "Intel"],
        },
    },
    "verbosity": True,
}

score_dicts = main(config)

print(score_dicts[0])
print(score_dicts[1])
print(score_dicts[2])

CKA for distilgpt2
Loading  model...

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Running comparisons...


 33%|███▎      | 1/3 [00:00<00:01,  1.49it/s]


	context... The 2020 Olympics were held in
	tokenized_context ids... tensor([[  464, 12131, 14935,   547,  2714,   287]], device='cuda:0')
	decoded tokenized_context... The 2020 Olympics were held in
	decoded target id...  Tokyo
	most probable prediction id decoded...  the


	context... The 2020 Olympics were held in
	tokenized_context ids... tensor([[  464, 12131, 14935,   547,  2714,   287]], device='cuda:0')
	decoded tokenized_context... The 2020 Olympics were held in
	decoded target id...  London
	most probable prediction id decoded...  the


	context... The 2020 Olympics were held in
	tokenized_context ids... tensor([[  464, 12131, 14935,   547,  2714,   287]], device='cuda:0')
	decoded tokenized_context... The 2020 Olympics were held in
	decoded target id...  Tokyo
	most probable prediction id decoded...  the


	context... The 2020 Olympics were held in
	tokenized_context ids... tensor([[  464, 12131, 14935,   547,  2714,   287]], device='cuda:0')
	decoded tokenized_context... T

100%|██████████| 3/3 [00:00<00:00,  3.37it/s]


	context... Operation Overlord took place in
	tokenized_context ids... tensor([[32180,  3827, 10572,  1718,  1295,   287]], device='cuda:0')
	decoded tokenized_context... Operation Overlord took place in
	decoded target id...  Baghdad
	most probable prediction id decoded...  the


	context... Steve Jobs is the founder of
	tokenized_context ids... tensor([[19206, 19161,   318,   262,  9119,   286]], device='cuda:0')
	decoded tokenized_context... Steve Jobs is the founder of
	decoded target id...  Apple
	most probable prediction id decoded...  the


	context... Steve Jobs is the founder of
	tokenized_context ids... tensor([[19206, 19161,   318,   262,  9119,   286]], device='cuda:0')
	decoded tokenized_context... Steve Jobs is the founder of
	decoded target id...  Microsoft
	most probable prediction id decoded...  the


	context... Steve Jobs is the founder of
	tokenized_context ids... tensor([[19206, 19161,   318,   262,  9119,   286]], device='cuda:0')
	decoded tokenized_context... St


