<a href="https://colab.research.google.com/github/Nouran-Khallaf/why-tough/blob/main/Classification_interpertability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Captum --- integreted graidiants
To understand how each word contributes to the classification, we can use an interpretability technique like Integrated Gradients. The Captum library provides tools to calculate such attributions.

In [1]:
# Install Captum
!pip install captum




This imports the necessary libraries and modules required for the notebook.

- **`pandas`**: For data manipulation and analysis.  
- **`torch`**: For PyTorch functionality, including tensors and GPU support.  
- **`transformers`**: Provides pre-trained transformer models and tokenizers for NLP tasks.  
  - `AutoModelForSequenceClassification`, `AutoTokenizer`: Automatically load pre-trained models and tokenizers.
  - `BertForSequenceClassification`: Specifically for using BERT for classification tasks.  
- **`captum.attr`**: For explainable AI, including `LayerIntegratedGradients` and `LayerConductance` to compute attributions, and `visualization` to visualize results.  
- **`IPython.display`**: For rendering rich outputs like HTML or interactive visualizations in Jupyter notebooks.

---

In [2]:
import pandas as pd
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, BertForSequenceClassification,BertTokenizer
from captum.attr import  LayerIntegratedGradients, visualization as viz, LayerConductance
from IPython.display import display, HTML


### Description:  
This setup prepares the tokenizer and model for the task, Here we load our Pre-trained Classifier model
but you can use anyother model. uncomment if you wish to use your own model


1. **Tokenizer Initialization**:  
   - Loads a pre-trained tokenizer (`BertTokenizer`) from the `bert-base-multilingual-cased` model.

2. **Model Initialization**:  
   - Loads the pre-trained `BertForSequenceClassification` model with `bert-base-multilingual-cased` configuration.
   - Configures the model for binary classification (`num_labels=2`).

3. **Model Weights Loading**:  
   - Loads the pre-trained weights for the model from a file named `'Basic_original_SpaCy_model_bert.pth'`.  



In [31]:
# Initialize the tokenizer and model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
model = BertForSequenceClassification.from_pretrained('bert-base-multilingual-cased', num_labels=2)
#model.load_state_dict(torch.load('Basic_original_SpaCy_model_bert.pth', map_location=device))
model.to(device)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-multilingual-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(119547, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1

The `XAI` (Explainable AI) class is designed to provide explanations for predictions made by a transformer-based classification model. It combines model interpretation techniques with visualization tools to analyse and understand the contribution of individual tokens to the model's predictions.
#### Key Components:

1. **Initialization (`__init__`)**:
   - Stores input text, label, tokenizer, model, and computation device.
   - Initializes placeholders for input IDs and reference input IDs (used for attributions).

2. **Input Construction (`construct_input_ref`)**:
   - Creates tokenized inputs and reference (baseline) inputs for the model.
   - Reference inputs are padded versions of the input text, used as a baseline in attribution methods.

3. **Custom Forward Pass (`custom_forward`)**:
   - Defines a forward function for the model that outputs softmax probabilities for classification.

4. **Attribution Computation (`compute_attributions`)**:
   - Computes token-level attributions using Layer Integrated Gradients (LIG) from Captum.
   - Normalizes attributions for consistency and returns them alongside the input tokens.

5. **Prediction Probabilities (`predict_probabilities`)**:
   - Computes the model's classification probabilities for the input text.

6. **Top-K Attributed Tokens (`get_topk_attributed_tokens`)**:
   - Identifies the top \(k\) tokens with the highest attribution scores.
   - Returns a DataFrame containing the tokens, their indices, and attribution values.

7. **HTML Generation (`generate_html`)**:
   - Creates an interactive HTML visualization to display:
     - The model's prediction probabilities.
     - Input text with highlighted attributions (color-coded for positive/negative contributions).

---

#### Purpose:
This class enables detailed analysis of how each word in the input text influences the model's prediction. It provides both numerical insights (via `get_topk_attributed_tokens`) and visual representations (via `generate_html`). This makes it a powerful tool for debugging, understanding model behavior, and improving model transparency in NLP tasks.

In [13]:
class XAI:
    def __init__(self, text_, label_, tokenizer_, model_, device_):
        """
        Initialize the XAI class with text, label, tokenizer, model, and computation device.
        """
        self.text = text_  # Input text to analyze
        self.label = label_  # True label or target label
        self.tokenizer = tokenizer_  # Tokenizer for text preprocessing
        self.model = model_  # Model to explain
        self.device = device_  # Computation device (CPU or GPU)
        self.input_ids = None  # Tokenized input IDs
        self.ref_input_ids = None  # Reference (baseline) input IDs

    def construct_input_ref(self):
        """
        Create tokenized input and reference (baseline) input for the model.
        The reference input is a padded version of the text, used for attributions.
        """
        # Tokenize the input text (excluding special tokens like [CLS] and [SEP])
        text_ids = self.tokenizer.encode(self.text, add_special_tokens=False)

        # Create input with special tokens
        input_ids = [self.tokenizer.cls_token_id] + text_ids + [self.tokenizer.sep_token_id]

        # Create reference (baseline) input with padding tokens
        ref_input_ids = [self.tokenizer.cls_token_id] + [self.tokenizer.pad_token_id] * len(text_ids) + [self.tokenizer.sep_token_id]

        # Convert to tensors and move to the specified device
        self.input_ids = torch.tensor([input_ids], device=self.device)
        self.ref_input_ids = torch.tensor([ref_input_ids], device=self.device)

        return self.input_ids, self.ref_input_ids

    def custom_forward(self, inputs):
        """
        Custom forward function to compute softmax probabilities for classification.
        """
        # Pass the input through the model and apply softmax to get probabilities
        return torch.softmax(self.model(inputs)[0], dim=1)[0]

    def compute_attributions(self):
        """
        Compute token-level attributions using Layer Integrated Gradients (LIG).
        Returns normalized attributions and corresponding tokens.
        """
        # Generate tokenized input and reference input
        self.input_ids, self.ref_input_ids = self.construct_input_ref()
        # Convert input IDs back to tokens
        self.tokens = self.tokenizer.convert_ids_to_tokens(self.input_ids[0])

        # Initialize Layer Integrated Gradients (LIG) with the model's embeddings
        lig = LayerIntegratedGradients(self.custom_forward, self.model.bert.embeddings)

        # Compute attributions using LIG
        attributions, delta = lig.attribute(
            inputs=self.input_ids,  # Tokenized input
            baselines=self.ref_input_ids,  # Baseline input
            n_steps=500,  # Number of steps along the path
            internal_batch_size=3,  # Batch size for internal computation
            return_convergence_delta=True  # Return convergence delta
        )

        # Sum attributions across the embedding dimensions
        attributions = attributions.sum(dim=-1).squeeze()
        # Normalize attributions for consistency
        normalized_attributions = attributions / torch.norm(attributions)

        return normalized_attributions, self.tokens

    def predict_probabilities(self):
        """
        Predict the probabilities for the input text using the model.
        """
        # Compute probabilities using the custom forward function
        outputs = self.custom_forward(self.input_ids)
        return outputs.tolist()

    def get_topk_attributed_tokens(self, attrs, k=5):
        """
        Identify the top-k tokens with the highest attribution scores.
        Returns a DataFrame with tokens, their indices, and attribution values.
        """
        # Ensure attributions are on the CPU
        attrs = attrs.cpu()
        # Get the top-k attribution values and their indices
        values, indices = torch.topk(attrs, k)
        # Map indices to tokens
        top_tokens = [self.tokens[idx] for idx in indices]
        # Create and return a DataFrame with results
        return pd.DataFrame({
            'Word': top_tokens,
            'Index': indices.cpu().numpy(),
            'Attribution': values.cpu().numpy()
        })

    def generate_html(self, attributions, tokens, probabilities):
        """
        Generate an interactive HTML visualization for:
        - Prediction probabilities
        - Input text with token-level attributions (color-coded)
        """
        # Create a color-coded visualization of tokens
        token_html = ""
        for token, score in zip(tokens, attributions):
            # Determine color intensity based on the attribution score
            color = f"rgba(255, 0, 0, {abs(score)})" if score < 0 else f"rgba(0, 0, 255, {abs(score)})"
            token_html += f"<span style='background-color: {color}; padding: 2px;'>{token} </span>"

        # Generate HTML with prediction probabilities and highlighted tokens
        html_content = f"""
        <div style="margin-bottom: 20px;">
            <h4>Prediction Probabilities</h4>
            <div>
                <div>Simple</div>
                <div style="width: 100%; height: 20px; background-color: #ddd; border-radius: 5px; margin: 5px 0;">
                    <div style="width: {probabilities[0] * 100}%; height: 100%; background-color: blue; border-radius: 5px;"></div>
                </div>
                <p>Probability: {probabilities[0]:.2f}</p>

                <div>Complex</div>
                <div style="width: 100%; height: 20px; background-color: #ddd; border-radius: 5px; margin: 5px 0;">
                    <div style="width: {probabilities[1] * 100}%; height: 100%; background-color: orange; border-radius: 5px;"></div>
                </div>
                <p>Probability: {probabilities[1]:.2f}</p>
            </div>

            <h4>Text with Highlighted Words</h4>
            <p>{token_html}</p>
        </div>
        """
        return html_content


For each sentence:
1. **HTML Visualization**: Displays probabilities and highlighted attributions.
   - Encapsulates the analysis process into the `analyze_and_display` function. This improves reusability and clarity.
   - Accepts a sentence, label, tokenizer, model, device, and an optional `top_k` parameter for flexibility.
2. **Top-K Tokens**: A DataFrame is displayed, showing the tokens with the highest attributions.
    - Defaults to displaying as many top tokens as there are words in the sentence. This makes the code adaptable for varying sentence lengths.
---


In [21]:
def analyze_and_display(sentence, label, tokenizer, model, device, top_k=None):
    """
    Analyze and display attribution and predictions for a given sentence.

    Args:
        sentence (str): The input sentence to analyze.
        label (int): The label associated with the sentence.
        tokenizer: The tokenizer for text processing.
        model: The classification model.
        device: The computation device (CPU or GPU).
        top_k (int): The number of top tokens to display. Defaults to the number of words in the sentence.
    """
    # Ensure the model is on the correct device
    model.to(device)

    # Initialize the XAI instance
    xai_instance = XAI(
        text_=sentence,
        label_=label,
        tokenizer_=tokenizer,
        model_=model,
        device_=device
    )

    # Compute attributions and tokens
    attributions, tokens = xai_instance.compute_attributions()

    # Predict probabilities
    probabilities = xai_instance.predict_probabilities()

    # Generate HTML visualization
    html_content = xai_instance.generate_html(attributions, tokens, probabilities)
    display(HTML(html_content))

    # Get top-k attributed tokens
    if top_k is None:
        top_k = len(sentence.split())
    top_tokens_df = xai_instance.get_topk_attributed_tokens(attributions, k=top_k)

    # Display the top tokens as a DataFrame
    display(top_tokens_df)



In [29]:
# List of sentences to analyze
sentence = "Provide financially sustainable care, giving security and stability to people and their carers."
# Perform analysis for  sentence
analyze_and_display(sentence, label=1, tokenizer=tokenizer, model=model, device=device)

Unnamed: 0,Word,Index,Attribution
0,sustainable,5,0.437581
1,and,10,0.179457
2,and,14,0.150541
3,##ly,4,0.129979
4,",",7,0.095494
5,to,12,0.019841
6,[SEP],19,0.0
7,[CLS],0,0.0
8,.,18,-0.016302
9,##rs,17,-0.05846


In [32]:
sentence = "Provide cheap care that helps people and their carers feel safe."
# Perform analysis for  sentence
analyze_and_display(sentence, label=1, tokenizer=tokenizer, model=model, device=device)

Unnamed: 0,Word,Index,Attribution
0,that,6,0.335154
1,their,10,0.327757
2,feel,13,0.30364
3,people,8,0.185139
4,helps,7,0.174657
5,.,15,0.01132
6,[SEP],16,0.0
7,[CLS],0,0.0
8,##ap,4,-0.0103
9,care,11,-0.098247


You can process multiple sentences at once using the following approach:

```python
# List of sentences to analyze
sentences = [
    "Provide financially sustainable care, giving security and stability to people and their carers.",
    "Provide cheap care that helps people and their carers feel safe."
]

# Perform analysis for each sentence
for sentence in sentences:
    analyze_and_display(sentence, label=1, tokenizer=tokenizer, model=model, device=device)
```

