# Proof-of-Concept: LIME Explanations for NLI Model

## 1. Load Dependencies

In [None]:
import sys
sys.path.append('../src') # Add src directory to path to find xai_utils

import lime
import lime.lime_text
import torch
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import numpy as np
import os
from xai_utils import get_lime_predictor # Import the utility function

%matplotlib inline

## 2. Load Fine-tuned Model and Tokenizer

In [None]:
model_dir = '../src/nli_model/' # Adjusted path relative to notebook location
tokenizer = DistilBertTokenizerFast.from_pretrained(model_dir)
model = DistilBertForSequenceClassification.from_pretrained(model_dir)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

print(f"Model loaded on: {device}")

## 3. Prepare Sample Input

In [None]:
premise = "A man is playing a guitar."
hypothesis = "A person is making music."
text_instance = premise + " " + tokenizer.sep_token + " " + hypothesis

print(f"Input text instance for LIME: '{text_instance}'")

## 4. Initialize LIME TextExplainer and Predictor

In [None]:
class_names = ['entailment', 'contradiction', 'neutral']
explainer = lime.lime_text.LimeTextExplainer(class_names=class_names, bow=False, random_state=42)

# Get the predictor function from our utility module
predictor_fn = get_lime_predictor(model, tokenizer, device)

## 5. Generate LIME Explanation

In [None]:
# First, let's see the model's prediction for the instance using our predictor_fn
pred_probs = predictor_fn([text_instance]) # Use the new predictor_fn
predicted_label_idx = np.argmax(pred_probs[0])
predicted_label_str = class_names[predicted_label_idx]
print(f"Model's direct prediction for LIME input:")
print(f"  Text: '{text_instance}'")
print(f"  Predicted Label: {predicted_label_str} (Index: {predicted_label_idx})")
print(f"  Probabilities: {pred_probs[0]}")

# Generate the explanation
explanation = explainer.explain_instance(
    text_instance,
    predictor_fn, # Use the new predictor_fn
    num_features=10,
    num_samples=500
)
print("\nLIME explanation generated.")

## 6. Present Explanation

In [None]:
print("Displaying LIME explanation in notebook:")
explanation.show_in_notebook(text=True)

print("\nLIME explanation as a list of (word, weight) tuples:")
print(explanation.as_list())

## 7. Interpretation Note for LIME Output

LIME (Local Interpretable Model-agnostic Explanations) explains the prediction of a specific instance by approximating the complex model locally with a simpler, interpretable model (e.g., a weighted linear model).

**How to Interpret the Output Above (`show_in_notebook`):**

- **Predicted Label and Probabilities:** At the top, LIME shows the label it is explaining (which should match our model's prediction) and the probability scores for each class for that specific instance.
- **Highlighted Text:** The input text is displayed with words highlighted. 
  - **Color Coding:** Words are colored based on their contribution to the predicted label.
    - Typically, **green (or positive color)** words contribute *towards* the predicted label.
    - **Red (or negative color)** words contribute *against* the predicted label (i.e., they support other labels).
    - The intensity of the color often indicates the strength of the contribution.
  - **Word Importance:** The highlighted words are those that LIME's local linear model found to be most influential for the prediction of the instance. The `num_features` parameter in `explain_instance` controls how many such words are shown.

**How to Interpret the Output (`as_list()`):**

- This provides a list of tuples, where each tuple is `(word, weight)`.
- `word`: The feature (word) identified by LIME.
- `weight`: A numerical score representing the importance and direction of the word's contribution to the predicted label.
  - **Positive weights** indicate that the presence of the word increases the probability of the predicted label.
  - **Negative weights** indicate that the presence of the word decreases the probability of the predicted label (or increases the probability of other labels).
  - The magnitude of the weight indicates the strength of the contribution.

**Key Considerations:**
- **Locality:** LIME explanations are local. They explain why the model made a specific prediction for a *particular instance*, not how the model behaves globally.
- **Faithfulness vs. Interpretability:** LIME makes a trade-off. The local model is simpler and interpretable but is only an approximation of the complex model's behavior in that local region.
- **Perturbation Strategy:** LIME works by creating perturbed samples (e.g., by removing words from the input instance) and observing how the model's predictions change. The `bow=False` setting is important for transformer models, as LIME's default assumption of a bag-of-words model is not accurate for them. Even with `bow=False`, the perturbation still involves removing tokens/words, which is a simplification of how transformers process text structure.
- **`[SEP]` Token:** The `[SEP]` token might be treated as a regular word by LIME's default splitter. Its importance score can sometimes reflect its role in separating premise and hypothesis for the model.