# Contrastive Knowledge Assesment (CKA) Notebook Demo
This notebook enables interactive experimentation with CKA for models including `Flan-t5s`, `OPTs`, `Robertas`, and `GPT2s`.
The goal is to probe if factual statements are predicted at a higher probability than a given counterfactual.

<a target="_blank" href="https://colab.research.google.com/github/daniel-furman/Capstone/blob/main/notebooks/cka_run_main_demo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


## Dependencies

In [None]:
!git clone https://github.com/daniel-furman/Capstone.git
!pip install -r /content/Capstone/requirements.txt

Cloning into 'Capstone'...
remote: Enumerating objects: 556, done.[K
remote: Counting objects: 100% (213/213), done.[K
remote: Compressing objects: 100% (127/127), done.[K
remote: Total 556 (delta 114), reused 155 (delta 60), pack-reused 343[K
Receiving objects: 100% (556/556), 24.37 MiB | 17.05 MiB/s, done.
Resolving deltas: 100% (276/276), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece==0.1.97
  Downloading sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
Collecting transformers==4.26.1
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m49.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.16.0
  Downloading accelerate-0.16.0-py3-none-any.whl

## Imports

In [None]:
import os

In [None]:
os.chdir('/content/Capstone/src/cka_scripts')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

## Notebook usage

In [None]:
# import the main wrapper function for running cka

from run_cka import main
config = {}

Here, you can specify a causal, masked, or text2text large language model in ```config["models"]```. The `transformers` package is used to load the model into the session at mixed 8bit precision (`load_in_8_bit=True`). See [README](https://github.com/daniel-furman/Capstone#models-tested) for the full list of model families supported and HuggingFace for specific model tags.  

Some example VRAM usages:
* `google/flan-ul2` requires slightly more than 32GB VRAM
* `EleutherAI/gpt-neox-20b` requires slightly more than 22GB VRAM
* `EleutherAI/gpt-j-6B` requires slightly more than 7GB VRAM
* `google/flan-t5-xl`requires slightly more than 5GB VRAM
* `roberta-large` requires slightly less than 1GB VRAM



In [None]:
config["models"] = [
    "EleutherAI/gpt-j-6B",
    #"gpt2-xl",
]

Next, new facts can be input in ```config["input_information"]```.
  * For instance: to input "Lebron James is famous for playing the sport of {true: basketball; false: football}", see the last example. 

In [None]:
config["input_information"] = {
    
        "0": {
            "stem": "The 2020 Olympics were held in",
            "true": "Tokyo",
            "false": ["London", "Berlin", "Chicago"],
        },
        "1": {
            "stem": "Operation Overlord took place in",
            "true": "Normandy",
            "false": ["Manila", "Santiago", "Baghdad"],
        },
        "2": {
            "stem": "Steve Jobs is the founder of",
            "true": "Apple",
            "false": ["Microsoft", "Google", "Facebook"],
        },

        # Example addition(s)
        "3": {
            "stem": "Lebron James is famous for playing the sport of",
            "true": "basketball",
            "false": ["football"],
        },   
    }

Lastly, verbosity controls how much info is printed to the user.

In [None]:
config["verbosity"] = False

In [None]:
config

{'models': ['google/flan-ul2'],
 'input_information': {'0': {'stem': 'The 2020 Olympics were held in',
   'true': 'Tokyo',
   'false': ['London', 'Berlin', 'Chicago']},
  '1': {'stem': 'Operation Overlord took place in',
   'true': 'Normandy',
   'false': ['Manila', 'Santiago', 'Baghdad']},
  '2': {'stem': 'Steve Jobs is the founder of',
   'true': 'Apple',
   'false': ['Microsoft', 'Google', 'Facebook']},
  '3': {'stem': 'Lebron James is famous for playing the sport of',
   'true': 'basketball',
   'false': ['football']}},
 'verbosity': False}

In [None]:
score_dicts = main(config)

print(score_dicts[0])
print(score_dicts[1])
print(score_dicts[2])

CKA for google/flan-ul2
Loading  model...


Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.35k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/784 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/67.5k [00:00<?, ?B/s]

Downloading (…)l-00001-of-00008.bin:   0%|          | 0.00/4.69G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00008.bin:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00008.bin:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00008.bin:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00008.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00008.bin:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

Downloading (…)l-00007-of-00008.bin:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Downloading (…)l-00008-of-00008.bin:   0%|          | 0.00/4.93G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Running comparisons...


100%|██████████| 4/4 [00:11<00:00,  2.92s/it]

Done

{'google/flan-ul2': [{"The 2020 Olympics were held in ['Tokyo', 'London']": {'p_true': 0.75732421875, 'p_false': 0.00010901689529418945, 'p_true - p_false': 0.7572152018547058, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9998560574203456}}, {"The 2020 Olympics were held in ['Tokyo', 'Berlin']": {'p_true': 0.75732421875, 'p_false': 2.8014183044433594e-06, 'p_true - p_false': 0.7573214173316956, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.999996287709018}}, {"The 2020 Olympics were held in ['Tokyo', 'Chicago']": {'p_true': 0.75732421875, 'p_false': 1.3232231140136719e-05, 'p_true - p_false': 0.7573109865188599, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9999825147548848}}, {"Operation Overlord took place in ['Normandy', 'Manila']": {'p_true': 0.54541015625, 'p_false': 5.960464477539063e-08, 'p_true - p_false': 0.5454100966453552, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9999998723811151}}, {"Operation Overlord 




## CLI usage 
* Compatible with cached configs included in the repo
* You can also create a custom config for an experiment
    * See: ```./content/Capstone/src/cka_scripts/configs```

In [None]:
# full benchmark dataset

!python run_cka.py configs.rome_full.distilgpt2_rome_full

CKA for distilgpt2
Loading  model...
Downloading pytorch_model.bin: 100% 353M/353M [00:03<00:00, 102MB/s] 

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 45.0kB/s]
Running comparisons...
100% 21919/21919 [10:21<00:00, 35.26it/s]
Done


Score dict summary:
{'distilgpt2': 'This model predicted 15099/21919 facts at a higher prob than the given counterfactual. The mean p_true / (p_true + p_false) was 0.6568 while the mean p_true was 0.0095'}
