<a href="https://colab.research.google.com/github/DivyaSharma0795/Explainable_AI_Techniques_01/blob/main/Explainable_Techniques_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AIPI 590 - XAI | Assignment #02
### Explainable Techniques 01
### Divya Sharma

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/156md6_ROBkfRjyINhjMWQM1hVKhQu22E?usp=sharing)

# Introduction

This notebook aims to generate **local explanations** for predictions made by the **GPT-2** language model using **SHAP (SHapley Additive ExPlanations)**. SHAP is a powerful, model-agnostic explanation technique based on game theory that helps interpret how individual input features contribute to a model's output.

### Purpose
The goal is to:
- Understand how specific tokens in the input text influence GPT-2's predictions.
- Visualize and interpret the contributions of each token to the model's output.

### Overview
#### **Input**
- The input text used for this analysis is a manually selected sentence:  
  *"In a surprising discovery, scientists found evidence of life on Mars."*
- This input will be processed by GPT-2 to generate predictions for the next tokens.

#### **Model**
- **GPT-2**: A pre-trained transformer-based language model developed by OpenAI. It generates text by predicting the next token in a sequence based on the given input.

#### **Explanation Technique**
- **SHAP (SHapley Additive ExPlanations)**:
  - SHAP values explain the contribution of each input token to GPT-2's predicted output tokens.
  - The framework provides local explanations, enabling us to understand individual predictions in detail.
  - SHAP uses a game-theoretic approach to fairly allocate contributions among features (tokens).

By combining GPT-2 and SHAP, this notebook provides insights into how GPT-2 processes input text and generates output predictions.


# Step 0 - importinng necessary libraries

In [None]:
pip install transformers shap

Collecting shap
  Using cached shap-0.46.0-cp312-cp312-win_amd64.whl.metadata (25 kB)
Collecting slicer==0.0.8 (from shap)
  Using cached slicer-0.0.8-py3-none-any.whl.metadata (4.0 kB)
Using cached shap-0.46.0-cp312-cp312-win_amd64.whl (456 kB)
Using cached slicer-0.0.8-py3-none-any.whl (15 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.46.0 slicer-0.0.8
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Importing required libraries for reading data and EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
import shap


# Step 1 - Load pre-trained GPT-2 model

In [None]:
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Step 2 - Data Preparation

In [None]:
input_text = ["In a surprising discovery, scientists found evidence of life on Mars."]

In [None]:
tokenized_input = tokenizer(input_text, return_tensors="pt")

# Step 3 - Explanation with SHAP

In [None]:
# a. Create SHAP Explainer
# Wrap the model and tokenizer for SHAP:
masker = shap.maskers.Text(tokenizer)
explainer = shap.Explainer(model, masker)

In [None]:
# b. Compute SHAP Values
# Generate SHAP values for the input text:
shap_values = explainer(input_text)


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


  0%|          | 0/156 [00:00<?, ?it/s]

PartitionExplainer explainer: 2it [00:15, 15.35s/it]               


In [None]:
# c. Visualization
# Visualize the token contributions using SHAP's text plot:
shap.plots.text(shap_values)

### Explanation

Color Coding:

-   Red Tokens: These tokens positively contribute to the model's prediction, pushing the output probability higher.
-   Blue Tokens: These tokens negatively contribute to the model's prediction, pushing the output probability lower.
-   The intensity of the color (darker or lighter shades) represents the magnitude of the contribution. Darker colors indicate stronger influence.

Token Contributions:

-    Each token in the input text is highlighted with a background color based on its SHAP value.
-    Hovering over a token (in interactive environments) may display its exact SHAP value, which quantifies its contribution to the prediction.

Insights from SHAP Values

The visualization helps identify which parts of the input text drive specific predictions, offering transparency into black-box models like GPT-2.

For example, for the input "In a surprising discovery, scientists found evidence of life on Mars.", GPT-2 generates the following output:
"In a surprising discovery , scientists found evidence of life on Mars . The discovery , reported in"

If we highlight the generated token "discovery", we see the pre-existing "discovery", "found", and "of" are contributing the highest in generating the word.
If we highlight the generated token "reported", we see the pre-existing "surprising" and "scientists" are contributing the highest in generating the word, while "evidence" and "Mars" are contributing less towards this word.


In [None]:
# Repeat base values for all tokens
base_values_expanded = pd.DataFrame(
    [base_values] * len(words),
    columns=[f"Base Value {i+1}" for i in range(len(base_values))]
)

# Create DataFrame
shap_df = pd.DataFrame(
    shap_values_2d,
    columns=[f"SHAP Value {i+1}" for i in range(shap_values_2d.shape[1])]
)
shap_df.insert(0, "Word", words)

# Display DataFrame
shap_df


Unnamed: 0,Word,SHAP Value 1,SHAP Value 2,SHAP Value 3,SHAP Value 4,SHAP Value 5,SHAP Value 6,SHAP Value 7,SHAP Value 8,SHAP Value 9,...,SHAP Value 11,SHAP Value 12,SHAP Value 13,SHAP Value 14,SHAP Value 15,SHAP Value 16,SHAP Value 17,SHAP Value 18,SHAP Value 19,SHAP Value 20
0,In,-0.318086,0.53018,0.868297,0.82143,0.22569,0.186127,0.70444,0.363037,0.503315,...,0.392923,0.562693,0.435686,0.489674,-0.303653,-0.212633,-0.265021,-0.129038,0.031456,-0.008164
1,a,0.130955,1.236023,0.926706,0.215483,0.450126,0.352851,0.101417,-0.015085,-0.120312,...,0.155189,0.191576,0.09272,0.011012,0.168196,0.164353,0.007899,0.144802,0.041589,-0.000967
2,surprising,-0.250735,-0.181801,3.891355,1.183476,0.172291,0.591544,0.352257,0.068123,0.110026,...,0.264895,0.26284,0.136181,0.230506,0.153968,-0.071492,-0.074915,-0.083399,0.077541,-0.026029
3,discovery,-0.192919,-0.314048,0.441944,4.144678,0.573475,-0.297037,0.794054,0.518481,0.345888,...,0.422566,0.369743,0.383088,-0.07355,0.112697,-0.034269,0.311221,-0.063567,-0.222745,-0.008568
4,",",0.180429,-0.176389,-0.109411,0.272238,3.024683,0.28861,0.468709,0.037113,0.084993,...,0.28985,0.223646,0.112873,-0.114515,-0.179929,0.039989,0.071334,0.13742,0.07454,-0.000349
5,scientists,-0.264523,0.586155,1.318712,1.276397,0.866859,4.314291,1.230593,0.353215,0.425911,...,0.333236,0.47087,0.360451,0.384069,0.203486,-0.082521,-0.067161,0.015683,0.079045,-0.036839
6,found,0.457571,0.173217,-0.076996,0.900034,0.257231,0.208558,2.745776,1.252165,0.891528,...,0.383844,0.434152,0.566634,0.274377,0.141853,-0.028281,0.045828,0.013112,0.015287,-0.038609
7,evidence,-0.605855,-0.07393,0.143234,0.194339,0.033863,0.006027,-0.108696,3.302539,0.629119,...,0.45829,0.857074,0.530316,0.121127,0.041091,-0.054269,-0.117171,-0.25692,-0.083507,0.07112
8,of,-0.516366,0.082168,0.174816,0.348019,0.42003,-0.053622,0.192361,0.701705,1.925421,...,0.491613,0.516104,0.133427,-0.020945,-0.149202,0.050422,0.108885,0.031613,0.025701,-0.015665
9,life,-0.394077,-0.190935,-0.421398,-0.117454,-0.335222,0.120696,-0.265968,-0.199871,-0.240616,...,0.58147,0.086104,-0.013064,0.034307,-0.025558,0.008628,-0.115043,-0.054612,-0.004312,-0.013173


### Additional Examples

### Example 2: "I am a data science student at Duke"

In [None]:
input_text = ["I am a data science student at Duke."]
tokenized_input = tokenizer(input_text, return_tensors="pt")
masker = shap.maskers.Text(tokenizer)
explainer = shap.Explainer(model, masker)
shap_values = explainer(input_text)
shap.plots.text(shap_values)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


**Generated Text** - "I am a data science student at Duke . I am interested in the "

Here, we can see that the generated token "role" is driven by "science", "student", and "at". The "interested" token is driven by the "data and "science" tokens from the input.

### Example 3: "It snowed in Durham today."

In [None]:
input_text = ["It snowed in Durham today."]
tokenized_input = tokenizer(input_text, return_tensors="pt")
masker = shap.maskers.Text(tokenizer)
explainer = shap.Explainer(model, masker)
shap_values = explainer(input_text)
shap.plots.text(shap_values)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


**Generated Text** - "It snow ed in Durham today . " I 'm not sure if it 's snow ing in"

Here, the "snowing" token is driven by the snow"ed" and "today" tokens from the input.

# Discussion

### Why SHAP Was Chosen
SHAP was selected for this analysis due to its unique strengths in providing interpretable explanations for complex machine learning models:
- **Local and Global Explanations:** SHAP offers both instance-specific (local) explanations and overall (global) insights into model behavior.
- **Model-Agnostic Approach:** SHAP can be applied to any machine learning model, including GPT-2, without requiring modifications to the model architecture.
- **Game-Theoretic Foundation:** SHAP is grounded in Shapley values from cooperative game theory, ensuring stability, fairness, and consistency in feature attributions.
- **Visualization Capabilities:** SHAP provides intuitive visualizations of token contributions, making it particularly suitable for NLP tasks like text generation.

### Strengths
- **Clear Insights:** SHAP allows us to pinpoint which input tokens contribute the most to specific predictions.
- **Additive Feature Attributions:** The sum of SHAP values aligns with the model's output, ensuring interpretability and transparency.
- **Token-Level Explanations:** For GPT-2, SHAP highlights how each input token influences the generation of specific output tokens.

### Limitations
- **Computational Complexity:** SHAP is computationally expensive, especially when applied to large models like GPT-2, as it requires evaluating multiple feature combinations.
- **Memory Intensive:** Explaining predictions for long sequences or large datasets can demand significant memory and processing power.

### Potential Improvements
- **Optimized SHAP Implementations:** Explore faster implementations of SHAP or limit the number of output tokens analyzed to reduce computational overhead.
- **Hybrid Techniques:** Combine SHAP with other explanation methods like Anchors to gain complementary insights (e.g., rule-based explanations).
- **Input Size Reduction:** Preprocess input text by truncating or summarizing it to focus on the most relevant parts for explanation.



# Conclusion

This notebook successfully demonstrated how SHAP can be used to explain predictions made by GPT-2. By analyzing the contributions of individual input tokens to specific output tokens, we gained valuable insights into how GPT-2 generates text.

### Findings
#### Example 1: Input - *"In a surprising discovery, scientists found evidence of life on Mars."*
GPT-2 generated the following output:  
*"In a surprising discovery , scientists found evidence of life on Mars . The discovery , reported in"*

1. For the generated token `"discovery"`:
   - The input tokens `"discovery"`, `"found"`, and `"of"` contributed the most to generating this word.
2. For the generated token `"reported"`:
   - The input tokens `"surprising"` and `"scientists"` had the highest contributions, while `"evidence"` and `"Mars"` contributed less.

#### Example 2: Input - *"I am a data science student at Duke."*
GPT-2 generated the following output:  
*"I am a data science student at Duke . I am interested in"*
1. For the generated token `"role"`:
   - The input tokens `"science"`, `"student"`, and `"at"` were key contributors.
2. For the generated token `"interested"`:
   - The input tokens `"data"` and `"science"` were the primary drivers.

#### Example 3: Input - *"It snowed in Durham today."*
GPT-2 generated the following output:
*"It snow ed in Durham today . " I 'm not sure if it 's snowing in"*
1. For the generated token `"snowing"`:
   - The input tokens `"snowed"` and `"today"` contributed significantly.

### Summary
SHAP provided meaningful explanations for GPT-2's predictions by highlighting token-level contributions. These insights can help debug models, improve interpretability, and build trust in AI systems. Future work could involve optimizing SHAP for large-scale NLP tasks or exploring additional explanation techniques for deeper insights.
