# Evaluation of Post Hoc Interpretability Implementations
This notebook was created to run and compare different interpretability methods on previously definied criteria.

### Tested Interpretability Approaches

- LIME (Captum)
- Shapley Values Sampling (Captum)
- KernelSHAP (Captum)
- PartitionSHAP (shap)
- PermutationSHAP (shap)

### Comparison Criteria
- Runtime
- Explanation Quality (Visually)

### Test Model & Packages
Each test were run on GTP-2 open ended text generation, from Huggingface. For masking, attribution/shap value collection and plotting the shap or captum libraries were used.

## Installation, Imports & Setup



### Tokens for Downloads

Without a Github token the different variant of shap cannot be loaded. Without a HGF Token llama cannot load from the huggingface hub.

This is set up for colab, alternatively the commented string variant below can be used. For this replace the string with an actual token.

*   Github [Token Info](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)
*   Huggingface [Token Info](https://huggingface.co/docs/hub/security-tokens)


In [None]:
# grabbing tokens for repository and model access
from google.colab import userdata

gh_token=userdata.get('GITHUB_TOKEN')
hgf_token=userdata.get('HGF_TOKEN')

#gh_token="TOKEN"
#hgf_token="TOKEN"

### Creating Folder

Creating folders to save the plots in

In [None]:
import os.path
from os import path

if path.exists('/content/gpt-2') == False:
  os.mkdir('/content/gpt-2')

if path.exists('/content/mistral') == False:
  os.mkdir('/content/mistral')

if path.exists('/content/llama2') == False:
  os.mkdir('/content/llama2')

!rm /content/gpt-2/*
!rm /content/mistral/*
!rm /content/llama2/*

rm: cannot remove '/content/gpt-2/*': No such file or directory
rm: cannot remove '/content/mistral/*': No such file or directory
rm: cannot remove '/content/llama2/*': No such file or directory


### Installs and Imports

In [None]:
# basic installs and additional utilies (usually not needed in colab)
!pip install matplotlib
!pip install numpy
!pip install pandas
!pip install ipywidgets
!pip install ipython

# model package installs
!pip install torch

!pip install transformers
!pip install huggingface_hub
!pip install accelerate

Collecting jedi>=0.16 (from ipython>=4.0.0->ipywidgets)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m18.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi
Successfully installed jedi-0.19.1
Collecting accelerate
  Downloading accelerate-0.26.1-py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.26.1


In [None]:
# installing captum package from GitHub repository
!pip install git+https://${gh_token}@github.com/LennardZuendorf/thesis-captum.git

# installing shap package from GitHub repository
!pip install git+https://${gh_token}@github.com/LennardZuendorf/thesis-shap.git

Collecting git+https://****@github.com/LennardZuendorf/thesis-captum.git
  Cloning https://****@github.com/LennardZuendorf/thesis-captum.git to /tmp/pip-req-build-i02bhn6s
  Running command git clone --filter=blob:none --quiet 'https://****@github.com/LennardZuendorf/thesis-captum.git' /tmp/pip-req-build-i02bhn6s
  Resolved https://****@github.com/LennardZuendorf/thesis-captum.git to commit 7dd85e4a2762b0d2c9850c33c966fb9d049dd909
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: captum
  Building wheel for captum (pyproject.toml) ... [?25l[?25hdone
  Created wheel for captum: filename=captum-0.7.0-py3-none-any.whl size=637397 sha256=23b1cad405fc94c99a388e81496b5d5429567c5a37c5df47cbaada085f30b29a
  Stored in directory: /tmp/pip-ephem-wheel-cache-t0psbv1o/wheels/93/81/03/eec82bfc1737f4d759ec68f9b394ffbee89259c4bef6ca4019
Succ

In [None]:
# basic imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import time

# model imports
import torch
import transformers

# interpretability import
import shap
import captum

### Model Setup

In [None]:
# setting device based on available hardware
if torch.cuda.is_available():
  device = torch.device("cuda")
else: device = torch.device("cpu")

print(f"Device set to {device}.")

Device set to cuda.


In [None]:
# setup gpt2 and godel model and tokenizer from huggingface
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM

# gpt and godel loading function so this can be run individually
def load_gd_gpt():

  # load tokenizer and model from huggingface
  gpt_tokenizer = AutoTokenizer.from_pretrained("gpt2",  use_fast=True)
  gpt_model = AutoModelForCausalLM.from_pretrained("gpt2")

  # manage setup based on available device
  gpt_model.to(device)

  # update model config
  gpt_model.config.is_decoder = True
  gpt_model.config.max_new_tokens=50
  gpt_model.config.do_sample=True


  # load tokenizer and model from huggingface
  gd_tokenizer = AutoTokenizer.from_pretrained("microsoft/GODEL-v1_1-large-seq2seq")
  gd_model = AutoModelForSeq2SeqLM.from_pretrained("microsoft/GODEL-v1_1-large-seq2seq")

  # manage setup based on available device
  gd_model.to(device)

  # update GODEL model config
  gd_model.config.max_new_tokens=50
  gd_model.config.do_sample=True

  return gpt_model, gpt_tokenizer, gd_model, gd_tokenizer

In [None]:
# setup mistral model and tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM

# mistral loading function so this can be run individually
def load_mistral():

  # load tokenizer and model from huggingface
  mistral_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
  mistral_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")


  # manage setup based on available device
  mistral_model.to(device)

  # update model config
  mistral_model.config.is_decoder=True
  mistral_model.config.max_length=50
  mistral_model.config.no_repeat_ngram_size=2
  mistral_model.config.do_sample=True

  return mistral_model, mistral_tokenizer

In [None]:
# setup llama model and tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM

# llama loading function so this can be run individually
def load_llama():

  # load tokenizer and model from huggingface
  llama_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token=hgf_token)
  llama_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token=hgf_token)

  # manage setup based on available device
  llama_model.to(device)

  # update model config
  llama_model.config.is_decoder=True
  llama_model.config.max_length=50
  llama_model.config.no_repeat_ngram_size=2
  llama_model.config.do_sample=True

  # update tokenizer config
  llama_tokenizer.pad_token = llama_tokenizer.eos_token

  return llama_model, llama_tokenizer

**(Loading all Models in Parallel will overload the 50GB RAM)**

-> load either GPT-2 + GODAL or Mistral or Llama2

In [None]:
# loading gpt and godel model and tokenizer
gpt_model, gpt_tokenizer, gd_model, gd_tokenizer = load_gd_gpt()

In [None]:
# loading mistral model and tokenizer
mistral_model, mistral_tokenizer = load_mistral()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 33.06 MiB is free. Process 4898 has 14.71 GiB memory in use. Of the allocated memory 14.61 GiB is allocated by PyTorch, and 1.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [None]:
# loading llama model and tokenizer
llama_tokenizer, llama_model = load_llama()

## Evaluation Code

### Utils

In [None]:
# helper functions to prettyfy plots

# imports
from numpy import ndarray

# format the tokens by removing special tokens and special characters
def format_tokens(tokens: list):
    # define special tokens to remove
    special_tokens = ["[CLS]", "[SEP]", "[PAD]", "[UNK]", "[MASK]", "▁", "Ġ", "</w>"]

    # initialize empty list
    updated_tokens = []

    # loop through tokens
    for t in tokens:
        # remove special token from start of token if found
        if t.startswith("▁"):
            t = t.lstrip("▁")

        # loop through special tokens list and remove from current token if matched
        for s in special_tokens:
            t = t.replace(s, " ")

        # add token to list
        updated_tokens.append(t)

    # return the list of tokens
    return updated_tokens

# function to prettyfy formatting of runtime seconds
def prettyfy_seconds(runtime_float):
  # formats seconds runtime in nice format
  runtime_str = time.strftime("%H:%M:%S", time.gmtime(runtime_float))
  return runtime_str

# function to flatten shap values into a 2d list by summing them up
def flatten_attribution(values: ndarray, axis: int = 0):
    return np.sum(values, axis=axis)

### Captum

In [None]:
# function to extract sequence attribution
# the input token attribution
def cpt_extract_seq_att(attr, method:str):

  # extract values based on the given method
  if method.lower() in ('kernelshap', 'lime'):
    values=attr.seq_attr.to(torch.device("cpu")).numpy()
  else:
    values=flatten_attribution(attr.token_attr_np.transpose(), axis=1)

  # format the input tokens nicely and check for mismatch
  input_tokens = format_tokens(attr.input_tokens)
  if len(attr.input_tokens) != len(values): raise RuntimeError("values and input len mismatch")

  # return a list of tuples with token and value
  return list(zip(input_tokens,values))

def cpt_extract_tok_att(attr, method:str):
  if method.lower() in ('kernelshap', 'lime'):
    raise RuntimeError(f"{method} doesn't support token attribution.")

  return {
    'input_tokens': format_tokens(attr.input_tokens),
    'values': attr.token_attr_np.transpose(),
    'output_tokens': format_tokens(attr.output_tokens)
  }

In [None]:
# interpretability creation function for captum powered interpretability methods
from captum.attr import ShapleyValueSampling, KernelShap, Lime, TextTokenInput, LLMAttribution, LLMGradientAttribution

# captum llm attribution runner function
# CREDIT: Adapted from Miglani, V., Yang, A., Markosyan, A.H., Garcia-Olano, D. and Kokhlikyan, N., 2023.
## Using Captum to Explain Generative Language Models. arXiv preprint arXiv:2312.05491.
def captum_llm_attribution(method, test_input, model, tokenizer):
  # deciding which method to use
  match method.lower():
    case "shapley value sampling":
      # creating llm attribution class with handed over shapley value sampling attribution class
      llm_attribution = LLMAttribution(ShapleyValueSampling(model),tokenizer);
      # updating method to be tuple of package and method
      method = ("captum", "Shapley Value Sampling")
    case "kernelshap":
      # creating llm attribution class with handed over shapley value sampling attribution class
      llm_attribution = LLMAttribution(KernelShap(model),tokenizer);
      # updating method to be tuple of package and method
      method = ("captum", "KernelSHAP")
    case "lime":
      # using special attribution class because GradientSHAP is gradient based
      llm_attribution = LLMAttribution(Lime(model),tokenizer);
      # updating method to be tuple of package and method
      method = ("captum", "LIME")
    case _:
      # raises and exception and return empty strings
      raise Exception('Interpretability method could not be detected.');
      return "", "", ""

  # setting input, running attribution with a timer
  attribution_input = TextTokenInput(test_input, gpt_tokenizer)
  runtime = time.time()
  attribution_result = llm_attribution.attribute(attribution_input)
  runtime = time.time()-runtime

  # returning result, lib/method tuple and runtime
  return attribution_result, method, runtime

#### Trial Running Code

In [None]:
# rough testing code for debugging

# getting attribution values, method and runtime
test_cpt_attr, test_cpt_method, test_cpt_runtime = captum_llm_attribution("Shapley Value Sampling", "Does Money buy happiness?", gpt_model, gpt_tokenizer)

# printing results using utilities
print(f"Ran {test_cpt_method[1]} with {test_cpt_method[0]} for {prettyfy_seconds(test_cpt_runtime)} seconds. See values below:")
print(f"Sequence attribution: {cpt_extract_seq_att(test_cpt_attr, test_cpt_method[0])}")
print(f"Token attribution: {cpt_extract_tok_att(test_cpt_attr, test_cpt_method[0])}")

### shap



In [None]:
# helper function for shap package powered interpretability methods

# imports
from shap import models, maskers

# wrapper around generative text models to make them usable by shap
def shap_llm_wrapper(model, tokenizer):
  # creates a new teacher forcing model based on the handed over model and tokenizer
  teacher_forcing_model = models.TeacherForcing(model, tokenizer, device="cpu")
  text_masker = maskers.Text(tokenizer, mask_token="...", collapse_mask_token=True)
  # returns new instances
  return teacher_forcing_model, text_masker

# function to extract summarized sequence wise attribution
def shap_extract_seq_att(shap_values):

  # extracting summed up shap values
  values = flatten_attribution(shap_values.values[0], 1)

  # returning list of tuples of token and value
  return list(zip(shap_values.data[0],values))

# function to extract token wise attribution
def shap_extract_tok_att(shap_values):

  # returning a combined standartized dict
  return {
      'input_tokens': shap_values.data[0].tolist(),
      'values': shap_values.values[0],
      'output_tokens': shap_values.output_names
  }

In [None]:
# interpretability creation function for shap powered interpretability methods

# imports
from shap import PartitionExplainer, PermutationExplainer, KernelExplainer, DeepExplainer

# shap llm explanation runner function
def shap_llm_explanation(method, test_input, model, tokenizer):

  # calling wrapper for new wrapped model and a masker
  shap_model, shap_masker = shap_llm_wrapper(model, tokenizer)

  # deciding which method to use
  match method.lower():
    case "permutationshap":
      # creating Partition Explainer class with wrapped model and new masker
      explainer = PermutationExplainer(shap_model, shap_masker)
      # updating method to be tuple of package and method
      method = ("shap", "PermutationSHAP")
    case "partitionshap":
      # creating Partition Explainer class with wrapped model and new masker
      explainer = PartitionExplainer(model, tokenizer);
      # updating method to be tuple of package and method
      method = ("shap", "PartitionSHAP")
    case _:
      # raises and exception and return empty strings
      raise Exception('Interpretability method could not be detected.');
      return "", "", ""

  # calling explanations and calculating runtime
  runtime = time.time()
  shap_values = explainer([test_input])
  runtime = time.time()-runtime

  # returning result, new method tuple and runtime
  return shap_values, method, runtime

#### Trial Running Code

In [None]:
# rough testing code for debugging

# getting shap values, method and runtime
test_shap_values, test_shap_method, test_shap_runtime = shap_llm_explanation(method="PartitionSHAP", test_input="Does money buy happiness?", model=gpt_model, tokenizer=gpt_tokenizer)

# printing results using utilities
print(f"Ran {test_shap_method[1]} with {test_shap_method[0]} for {prettyfy_seconds(test_shap_runtime)} seconds. See values below:")
print(f"Sequence attribution: {shap_extract_seq_att(test_shap_values)}")
print(f"Token attribution: {shap_extract_tok_att(test_shap_values)}")

## Code for Plotting, Result Collection and Running

### Helper and Plotting Functions

In [None]:
# plotting functions
from matplotlib.colors import LinearSegmentedColormap

def plot_seq(seq_values:list, method:tuple=("",""), file_path="", model_name:str="gpt-2"):

    # Separate the tokens and their corresponding importance values
    tokens, importance = zip(*seq_values)

    # Reverse the order of tokens and importance for correct plotting
    tokens = tokens[::-1]
    importance = importance[::-1]

    # Convert importance values to numpy array for conditional coloring
    importance = np.array(importance)

    # Determine the colors based on the sign of the importance values
    colors = ['#ff0051' if val > 0 else '#008bfb' for val in importance]

    # Create a bar plot
    height = len(tokens) * 0.5
    plt.figure(figsize=(10, height))
    y_positions = range(len(tokens))  # Positions for the bars

    # Creating horizontal bar plot
    plt.barh(y_positions, importance, color=colors, align='center')

    # Annotating each bar with its value
    padding = 0.1  # Padding for text annotation
    for y, (x, color) in enumerate(zip(importance, colors)):
        sign = '+' if x > 0 else ''
        plt.annotate(
            f'{sign}{x:.2f}',  # Format the value with sign
            xy=(x + padding if x > 0 else x - padding, y),
            va='center',
            color=color,
            ha='right' if x < 0 else 'left',  # Horizontal alignment
            fontweight='bold',  # Bold text
            bbox=dict(facecolor='white', edgecolor='none', boxstyle='round,pad=0.1')  # White background
        )

    plt.axvline(0, color='black', linewidth=1)
    plt.title(f'Input Token Attribution with {method[1]} on {model_name}')
    plt.xlabel('Attribution')
    plt.ylabel('Input Tokens', labelpad=0.5)
    plt.yticks(y_positions, tokens)  # Set the labels for the y-axis

    # Adjust x-axis limits to ensure there's enough space for labels
    x_min, x_max = plt.xlim()
    x_range = x_max - x_min
    plt.xlim(x_min - 0.1 * x_range, x_max + 0.1 * x_range)

    # Save the plot to a file if file_path is provided
    if file_path != "":
        file_path = f"{file_path}/{method[1]}_sequenceplot.png"
        plt.savefig(file_path)

    return plt, file_path

In [None]:
def plot_tok(token_attribution_dict:dict, method:tuple=("", ""), file_path="", model_name:str="gpt-2"):
    token_values = token_attribution_dict['values']
    input_tokens = token_attribution_dict['input_tokens']
    output_tokens = token_attribution_dict['output_tokens']

    # Ensure the dimensions of the attribution matrix match the token lists
    assert token_values.shape == (len(input_tokens), len(output_tokens)), \
        "Attribution matrix dimensions must match the lengths of token lists"

    # Define the custom colormap
    cmap = LinearSegmentedColormap.from_list(
        "custom_colormap",
        [(0, '#008bfb'), (0.5, "white"), (1, '#ff0051')]
    )

    # Set the width of the plot dynamically based on the number of output_tokens
    width = max(len(output_tokens) * 0.8, 10)  # Set a minimum width
    height = len(input_tokens) * 0.8

    # Create the heatmap with square boxes
    plt.figure(figsize=(width, height))
    im = plt.imshow(token_values, aspect='auto', cmap=cmap, norm=plt.Normalize(vmin=-1, vmax=1))

    # Annotate the cells with the numerical data
    for i in range(len(input_tokens)):
        for j in range(len(output_tokens)):
            text = plt.text(j, i, f"{token_values[i, j]:.2f}",
                           ha="center", va="center", color="black")

    # Setting the labels for axes
    plt.xlabel('Output Tokens')
    plt.ylabel('Input Tokens')
    plt.title(f'Input x Output Token Attribution Heatmap with {method[1]} on {model_name}')

    # Setting the tick labels for both axes
    plt.xticks(np.arange(len(output_tokens)), output_tokens, rotation=45, ha='right')
    plt.yticks(np.arange(len(input_tokens)), input_tokens)

    plt.colorbar(im)

    # Save the plot to a file if file_path is provided
    if file_path !="":
      file_path=f"{file_path}/{method[1]}_tokenplot.png"
      plt.savefig(file_path)

    return plt, file_path

### Runner and Overall Controller Functions


In [None]:
# test running functions

# function that runs interpretability methods for each selected method
# collects the results
# and calls plotting runner if selected
def test_runner(methods:list, test_text:str, model, tokenizer,plots:bool, plot_mode:str="show", model_name:str="gpt-2"):
  # assert correct setting of plot mode
  assert plot_mode in ('save','show'), \
  "Mode not supported. Choose 'save' or 'show'."

  # creating a results dataframe
  result_data_raw = empty_df = pd.DataFrame(columns=["framework","method tuple","runtime raw","runtime","seq attribution", "token attribution"], index=methods)

  # looping over all methods and creating explanations
  for method in methods:
    print(f"Running method {method}")
    match method.lower():
      case "partitionshap":
        # calling explanation function for results
        interpretation_result, method_tuple, runtime = shap_llm_explanation("PartitionSHAP", test_text, model, tokenizer)
        # adding various results together
        run_results = [method_tuple[0], method_tuple, runtime, prettyfy_seconds(runtime), shap_extract_seq_att(interpretation_result), shap_extract_tok_att(interpretation_result)]
      case "permutationshap":
        # calling explanation function for results
        interpretation_result, method_tuple, runtime = shap_llm_explanation("PermutationSHAP", test_text, model, tokenizer)
        # adding various results together
        run_results = [method_tuple[0],method_tuple, runtime,prettyfy_seconds(runtime), shap_extract_seq_att(interpretation_result), shap_extract_tok_att(interpretation_result)]
      case "kernelshap":
        # calling explanation function for results
        interpretation_result, method_tuple, runtime = captum_llm_attribution("KernelSHAP", test_text, model, tokenizer)
        # adding various results together
        run_results = [method_tuple[0],method_tuple, runtime,prettyfy_seconds(runtime), cpt_extract_seq_att(interpretation_result, method=method), ""]
      case "shapley value sampling":
        # calling explanation function for results
        interpretation_result, method_tuple, runtime = captum_llm_attribution("Shapley Value Sampling", test_text, model, tokenizer)
        # adding various results together
        run_results = [method_tuple[0],method_tuple, runtime,prettyfy_seconds(runtime), cpt_extract_seq_att(interpretation_result, method=method), cpt_extract_tok_att(interpretation_result, method=method)]
      case "lime":
        # calling explanation function for results
        interpretation_result, method_tuple, runtime = captum_llm_attribution("LIME", test_text, model, tokenizer)
        # adding various results together
        run_results = [method_tuple[0],method_tuple, runtime,prettyfy_seconds(runtime), cpt_extract_seq_att(interpretation_result, method=method), ""]
      case _:
        # raising exception of method not supported or set wrong
        raise Exception("Wrong method given!")
    # adding results of current run to dataframe
    result_data_raw.loc[method] = run_results

  # plotting code only running when set true
  if plots == True:
    # calling plot runner to get plots and file paths
    result_data_raw, plots = plot_runner(result_data_raw, plot_mode, model_name)

    if plot_mode == "show":
      for i, (seq_plot, tok_plot) in enumerate(plots):
        print(f"Plotting Plots for {methods[i]} with {result_data_raw.iloc[i]['framework']}\n")
        seq_plot.show()
        print("\n")
        if tok_plot != "":
          tok_plot.show()

    if plot_mode == "save":
      print("Saved plots to the given file names! \n")

  # return if plots are turned off
  else:
    print("No plotting because plotting is turned off! \n")
    # creating a clean dataframe to return

  # final return of cleaned up dataframe
  return result_data_raw.drop(columns=['runtime raw','method tuple','seq attribution','token attribution'])

def plot_runner(result_data, plot_mode:str, model_name:str):
  assert plot_mode in ('save','show'), \
    "Mode not supported. Choose 'save' or 'show'."

  if plot_mode == "save":
    base_path = f"{model_name.lower()}"
  else: base_path = ""

  seq_file_paths=[]
  token_file_paths=[]
  plots = []

  # looping over results dataframe
  for label, row_data in result_data.iterrows():

    # programmatically setting filepath for saving
    # needs to be empty
    seq_plot, seq_file_path = plot_seq(result_data.at[label, "seq attribution"], result_data.at[label, "method tuple"], file_path=base_path, model_name=model_name)
    if result_data.at[label, "token attribution"] != "":
      tok_plot, token_file_path = plot_tok(result_data.at[label, "token attribution"], result_data.at[label, "method tuple"], file_path=base_path, model_name=model_name)
    else: token_file_path, tok_plot = "", ""

    seq_file_paths.append(seq_file_path)
    token_file_paths.append(token_file_path)
    plots.append((seq_plot,tok_plot))

  if plot_mode == "save":
    result_data['seq plot path'] = seq_file_paths
    result_data['token plot path'] = token_file_paths

  return result_data, plots

## Run Comannd & Result Display


#### running with gpt-2

In [None]:
# using runner to get results, plots, etc. with gpt-2
gpt_test_results=test_runner(methods=["KernelSHAP", "LIME", "Shapley Value Sampling", "PartitionSHAP",], test_text="Does money buy happiness?", model=gpt_model, tokenizer=gpt_tokenizer, plots=True, plot_mode="save", model_name="gpt-2")
# display the test results
gpt_test_results.to_csv("gpt-2/results.csv")

#### running with godel

In [None]:
# formatting function to formatting input for the model
# CREDIT: Adapted from official interference example on Huggingface
## see https://huggingface.co/microsoft/GODEL-v1_1-large-seq2seq
def gd_format_prompt(message: str="Does money buy happiness?", system_prompt: str="Given a dialog context, you need to respond empathically.", knowledge: str = ""):

    # adds knowledge text if not empty
    if knowledge != "":
        knowledge = "[KNOWLEDGE] " + knowledge

    # adds the message to the prompt
    prompt = f" {message}"
    # combines the entire prompt
    full_prompt = f"{system_prompt} [CONTEXT] {prompt} {knowledge}"

    # returns the formatted prompt
    return full_prompt

In [None]:
# using runner to get results, plots, etc. with godel
gd_test_results=test_runner(methods=["PartitionSHAP"], test_text=gd_format_prompt(), model=gd_model, tokenizer=gd_tokenizer, plots=True, plot_mode="save", model_name="godel")
# display the test results
gd_test_results.to_csv("godel/results.csv")

#### running with mistral

In [None]:
# formatting function to format input for the model
# CREDIT: Inspired by offical documentation and example on Huggingface
## see https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
def mistral_format_prompt(message: str="Does money buy happiness?", system_prompt: str="Given a dialog context, you need to respond empathically."):
    prompt = f"<s>[INST] {system_prompt} [/INST] Hello, how can I assist you today?</s>[INST] {message} [/INST]"
    return prompt

In [None]:
# using runner to get results, plots, etc. with gpt-2
mistral_test_results=test_runner(methods=["KernelSHAP", "LIME", "Shapley Value Sampling"], test_text=mistral_format_prompt(), model=mistral_model, tokenizer=mistral_tokenizer, plots=True, plot_mode="save", model_name="mistral")
# display the test results
mistral_test_results.to_csv("mistral/results.csv")

NameError: name 'mistral_format_prompt' is not defined

#### running with llama 2

In [None]:
# formatting function to format input for the model
# CREDIT: Adapted from Philipp Schmid
## see https://www.philschmid.de/llama-2#how-to-prompt-llama-2-chat
def llama_format_prompt(message:str="Does money buy happiness?", system_prompt:str="Given a dialog context, you need to respond empathically."):
  prompt = f"<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{message} [/INST]"
  return prompt

In [None]:
# using runner to get results, plots, etc. with gpt-2
llama_test_results=test_runner(methods=[ "KernelSHAP", "LIME", "Shapley Value Sampling"], test_text=llama_format_prompt(), model=llama_model, tokenizer=llama_tokenizer, plots=True, plot_mode="save", model_name="llama2")
# display the test results
llama_test_results.to_csv("llama2/results.csv")