# Contrastive Knowledge Assesment (CKA) Notebook Demo
This notebook enables interactive experimentation with CKA for models including `Flan-ul2`, `Flan-t5s`, `OPTs`, `GPT-Neos`, `Robertas`, `Berts`, and `GPT2s`.
The goal is to probe if factual statements are predicted at a higher probability than a given counterfactual.

<a target="_blank" href="https://colab.research.google.com/github/daniel-furman/Capstone/blob/main/notebooks/cka_run_main_demo.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


## Dependencies

In [1]:
!git clone https://github.com/daniel-furman/Capstone.git
!pip install -r /content/Capstone/requirements.txt

Cloning into 'Capstone'...
remote: Enumerating objects: 520, done.[K
remote: Counting objects: 100% (177/177), done.[K
remote: Compressing objects: 100% (103/103), done.[K
remote: Total 520 (delta 87), reused 136 (delta 50), pack-reused 343[K
Receiving objects: 100% (520/520), 24.36 MiB | 15.53 MiB/s, done.
Resolving deltas: 100% (249/249), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentencepiece==0.1.97
  Downloading sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting transformers==4.26.1
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m79.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.16.0
  Downloading accelerate-0.16.0-py3-none-any.whl 

## Imports

In [2]:
import os

In [3]:
os.chdir('/content/Capstone/src/cka_scripts')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

## Notebook usage

In [6]:
# import the main wrapper function for running cka

from run_cka import main
config = {}

Here, you can specify a large language model in ```config["models"]```. See the 

Some example VRAM usages:
* `Google/flan-ul2` requires X
* `EleutherAI/gpt-j-6B` requires slightly more than 7GB VRAM
* `gpt2-xl` runs comfortably on Y

For example, to run a couple different smaller models, add `roberta-base` and `google/flan-t5-base`.




In [19]:
config["models"] = ["EleutherAI/gpt-j-6B",
                    "gpt2-xl",
                   ]

Next, new facts can be input in ```config["input_information"]```. These must be input in the same format as the given examples. 
  * For instance: to input "Lebron James is famous for playing the sport of {true: basketball; false: football}", see the last example. 

In [20]:
config["input_information"] = {
    
        "0": {
            "stem": "The 2020 Olympics were held in",
            "true": "Tokyo",
            "false": ["London", "Berlin", "Chicago"],
        },
        "1": {
            "stem": "Operation Overlord took place in",
            "true": "Normandy",
            "false": ["Manila", "Santiago", "Baghdad"],
        },
        "2": {
            "stem": "Steve Jobs is the founder of",
            "true": "Apple",
            "false": ["Microsoft", "Google", "Facebook"],
        },

        # Example addition(s)
        "3": {
            "stem": "Lebron James is famous for playing the sport of",
            "true": "basketball",
            "false": ["football"],
        },   
    }

Lastly, verbosity controls how much info is printed to the user.

In [21]:
config["verbosity"] = False

In [22]:
config

{'models': ['EleutherAI/gpt-j-6B', 'gpt2-xl'],
 'input_information': {'0': {'stem': 'The 2020 Olympics were held in',
   'true': 'Tokyo',
   'false': ['London', 'Berlin', 'Chicago']},
  '1': {'stem': 'Operation Overlord took place in',
   'true': 'Normandy',
   'false': ['Manila', 'Santiago', 'Baghdad']},
  '2': {'stem': 'Steve Jobs is the founder of',
   'true': 'Apple',
   'false': ['Microsoft', 'Google', 'Facebook']},
  '3': {'stem': 'Lebron James is famous for playing the sport of',
   'true': 'basketball',
   'false': ['football']}},
 'verbosity': False}

In [23]:
score_dicts = main(config)

print(score_dicts[0])
print(score_dicts[1])
print(score_dicts[2])

CKA for EleutherAI/gpt-j-6B
Loading  model...


Downloading (…)okenizer_config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.37M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/4.04k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/930 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/24.2G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Running comparisons...


100%|██████████| 4/4 [00:07<00:00,  1.86s/it]


Done

CKA for gpt2-xl
Loading  model...


Downloading (…)lve/main/config.json:   0%|          | 0.00/689 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/6.43G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Running comparisons...


100%|██████████| 4/4 [00:01<00:00,  2.63it/s]


Done

{'eleutherai/gpt-j-6b': [{"The 2020 Olympics were held in ['Tokyo', 'London']": {'p_true': 0.820419192314148, 'p_false': 0.004076048266142607, 'p_true - p_false': 0.8163431440480053, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9950562986708852}}, {"The 2020 Olympics were held in ['Tokyo', 'Berlin']": {'p_true': 0.820419192314148, 'p_false': 0.00018768371955957264, 'p_true - p_false': 0.8202315085945884, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9997712745009163}}, {"The 2020 Olympics were held in ['Tokyo', 'Chicago']": {'p_true': 0.820419192314148, 'p_false': 6.958650192245841e-05, 'p_true - p_false': 0.8203496058122255, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.9999151767789259}}, {"Operation Overlord took place in ['Normandy', 'Manila']": {'p_true': 0.15685658156871796, 'p_false': 8.113250515862092e-09, 'p_true - p_false': 0.15685657345546744, 'p_true > p_false': 'True', 'p_true / (p_true + p_false)': 0.999999884523504}}, {

## CLI usage 
* Compatible with cached configs included in the repo
* You can also create a custom config for an experiment
    * See: ```./content/Capstone/src/cka_scripts/configs```

In [None]:
# full benchmark dataset

!python run_cka.py configs.rome_full.distilgpt2_rome_full

CKA for distilgpt2
Loading  model...
Downloading pytorch_model.bin: 100% 353M/353M [00:03<00:00, 102MB/s] 

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 45.0kB/s]
Running comparisons...
100% 21919/21919 [10:21<00:00, 35.26it/s]
Done


Score dict summary:
{'distilgpt2': 'This model predicted 15099/21919 facts at a higher prob than the given counterfactual. The mean p_true / (p_true + p_false) was 0.6568 while the mean p_true was 0.0095'}
