# Explainable AI with SHAP
## Explainable Fake News Detection using Transformer-based Language Models

**Student Name:** Abdullah  
**Reg Number:** M24F0044DS009  
**Course:** Natural Language Processing  

### Objective
The objective of this notebook is to apply SHAP (SHapley Additive exPlanations) to interpret the predictions of the fine-tuned BERT model and identify important words contributing to fake news detection.


In [2]:
# Import Liberaries

import shap
import torch
import numpy as np
import pandas as pd

from transformers import BertTokenizer, BertForSequenceClassification


IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


In [3]:
# Check device GPU/CPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


Using device: cpu


In [4]:
# Load Test Dataset

test_df = pd.read_csv("../data/shorttextpreprocessedtest.csv")
test_df.head()


Unnamed: 0,text,label
0,torrance named europe s fifth ryder cup vice c...,0
1,i have never asked for a single earmark pork b...,0
2,hitting the media center to recap strong debat...,0
3,creflo dollar needed a million gulfstream g to...,0
4,,0


In [5]:
# Load Trained Model & Tokenizer

model_path = "../models/bert_fake_news_model"

tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)
model.to(device)
model.eval()


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

In [6]:
# Define Prediction Function for SHAP

def predict_proba(texts):
    # SHAP kabhi numpy array deta hai
    if isinstance(texts, np.ndarray):
        texts = texts.tolist()

    # make string if not already
    texts = [str(t) for t in texts]

    encodings = tokenizer(
        texts,
        truncation=True,
        padding=True,
        max_length=128,
        return_tensors="pt"
    )

    input_ids = encodings['input_ids'].to(device)
    attention_mask = encodings['attention_mask'].to(device)

    with torch.no_grad():
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask
        )

    probs = torch.nn.functional.softmax(outputs.logits, dim=1)
    return probs.cpu().numpy()



In [7]:
# Intialize SHAP Text Explainer

explainer = shap.Explainer(
    predict_proba,
    masker=shap.maskers.Text(tokenizer)
)


In [8]:
# Select Sample Texts for Explanation

test_df = test_df[test_df['text'].notna()]
test_df['text'] = test_df['text'].astype(str)

sample_texts = test_df['text'].sample(3, random_state=42).tolist()
sample_texts


['rt for michelle obama like for melania trump trumppresident electionnight ðŸ‡ºðŸ‡¸',
 'sourceshaveconfirmed that trump will lock her ass up pic twitter com deovuedzv mags for maga october',
 'can t wait to see american racism laid out in a map shape electionnight']

In [9]:
# Generate SHAP Values

shap_values = explainer(sample_texts)


PartitionExplainer explainer: 4it [00:35, 17.62s/it]               


In [10]:
# Visualize SHAP Explanations

shap.plots.text(shap_values)


In [11]:
# Explain prediction for a single instance
shap.plots.text(shap_values[0])


In [15]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
shap.plots.text(shap_values)
plt.savefig("../results/shap_explanation.png", dpi=300, bbox_inches="tight")
plt.close()


## Interpretation of SHAP Results

- SHAP highlights the most influential words contributing to model predictions.
- Sensational, emotionally charged, or misleading words often push predictions toward fake news.
- Neutral and factual terms contribute positively toward real news classification.
- This interpretability improves trust and transparency in automated fake news detection systems.


## Conclusion

SHAP successfully provides both global and local explanations for BERT-based fake news detection. 
The results demonstrate that the modelâ€™s predictions are interpretable and aligned with linguistic patterns commonly observed in fake and real news.
This fulfills the explainability objective of the project.
