# LIME Analysis for News Bias Classification

This notebook explores using LIME (Local Interpretable Model-agnostic Explanations) to understand why our bias classification model makes certain predictions.

We'll focus on creating **user-friendly explanations** suitable for web applications rather than technical research graphs.

In [None]:
# Install required packages
!pip install lime
!pip install transformers tensorflow

In [None]:
import os
import sys
import numpy as np
import tensorflow as tf
from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
from lime.lime_text import LimeTextExplainer
import re
from typing import List, Dict, Tuple
import json

In [None]:
# Add the backend directory to path to import functions
sys.path.append('../backend')

# Load the bias classification model
BIAS_MODEL_DIR = "../backend/bias_classification_model"

try:
    bias_model = TFDistilBertForSequenceClassification.from_pretrained(BIAS_MODEL_DIR)
    bias_tokenizer = DistilBertTokenizer.from_pretrained(BIAS_MODEL_DIR)
    print("✅ News Bias model and tokenizer loaded successfully!")
except Exception as e:
    print(f"❌ Failed to load model: {e}")

In [None]:
# Define label meanings (update based on your actual model)
BIAS_LABEL_MEANINGS = {
    0: "Republican",
    1: "Liberal", 
    2: "Neutral",
    3: "Other"
}

def clean_text(text: str) -> str:
    """Clean text for model input"""
    text = re.sub(r'[\'""]+', '"', text)
    text = re.sub(r'["

In [None]:
def predict_bias_probabilities(texts: List[str]) -> np.ndarray:
    """
    Predict bias probabilities for a list of texts.
    This function is required by LIME.
    
    Args:
        texts: List of text strings to classify
        
    Returns:
        numpy array of shape (n_samples, n_classes) with probabilities
    """
    predictions = []
    
    for text in texts:
        cleaned_text = clean_text(text)
        inputs = bias_tokenizer(
            cleaned_text, 
            return_tensors="tf", 
            padding="max_length", 
            truncation=True, 
            max_length=512
        )
        outputs = bias_model(inputs)
        probabilities = tf.nn.softmax(outputs.logits).numpy()[0]
        predictions.append(probabilities)
    
    return np.array(predictions)

In [None]:
# Test the prediction function
test_texts = [
    "The healthcare policy will provide universal coverage.",
    "The radical left-wing agenda is destroying our nation!"
]

predictions = predict_bias_probabilities(test_texts)
print("Prediction shape:", predictions.shape)
print("\nPredictions:")
for i, (text, pred) in enumerate(zip(test_texts, predictions)):
    predicted_label = np.argmax(pred)
    confidence = pred[predicted_label] * 100
    print(f"Text {i+1}: {BIAS_LABEL_MEANINGS[predicted_label]} ({confidence:.1f}%)")
    print(f"  Text: {text[:50]}...")
    print(f"  All probabilities: {pred}")
    print()

In [None]:
# Initialize LIME explainer
explainer = LimeTextExplainer(
    class_names=list(BIAS_LABEL_MEANINGS.values()),
    mode='classification'
)

print("✅ LIME explainer initialized!")

In [None]:
def analyze_text_with_lime(text: str, num_features: int = 10) -> Dict:
    """
    Analyze a text using LIME and return user-friendly explanations.
    
    Args:
        text: Text to analyze
        num_features: Number of most important features to show
        
    Returns:
        Dictionary with explanation data suitable for web display
    """
    # Get LIME explanation
    explanation = explainer.explain_instance(
        text, 
        predict_bias_probabilities, 
        num_features=num_features,
        num_samples=500  # Reduce for faster processing
    )
    
    # Get model prediction
    prediction = predict_bias_probabilities([text])[0]
    predicted_class = np.argmax(prediction)
    confidence = prediction[predicted_class] * 100
    
    # Extract feature importance for the predicted class
    feature_importance = explanation.as_list()
    
    # Separate positive and negative influences
    positive_words = []
    negative_words = []
    
    for word, importance in feature_importance:
        if importance > 0:
            positive_words.append((word, importance))
        else:
            negative_words.append((word, abs(importance)))
    
    # Sort by importance
    positive_words.sort(key=lambda x: x[1], reverse=True)
    negative_words.sort(key=lambda x: x[1], reverse=True)
    
    # Create highlighted text
    highlighted_text = create_highlighted_text(text, feature_importance)
    
    # Generate simple explanation
    simple_explanation = generate_simple_explanation(
        predicted_class, positive_words, negative_words
    )
    
    return {
        'original_text': text,
        'predicted_class': predicted_class,
        'predicted_label': BIAS_LABEL_MEANINGS[predicted_class],
        'confidence': confidence,
        'highlighted_text': highlighted_text,
        'simple_explanation': simple_explanation,
        'positive_influences': positive_words[:5],  # Top 5
        'negative_influences': negative_words[:5],  # Top 5
        'all_probabilities': {BIAS_LABEL_MEANINGS[i]: float(prob * 100) for i, prob in enumerate(prediction)}
    }

In [None]:
def create_highlighted_text(text: str, feature_importance: List[Tuple[str, float]]) -> List[Dict]:
    """
    Create highlighted text data for frontend display.
    
    Returns list of dictionaries with word and importance level.
    """
    # Create importance lookup
    importance_dict = {word: importance for word, importance in feature_importance}
    
    # Split text into words (simple approach)
    words = text.split()
    highlighted = []
    
    for word in words:
        # Clean word for lookup (remove punctuation)
        clean_word = re.sub(r'[^\w]', '', word.lower())
        importance = importance_dict.get(clean_word, 0)
        
        # Determine highlight level
        if importance > 0.1:
            highlight_level = 'high_positive'
        elif importance > 0.05:
            highlight_level = 'medium_positive'
        elif importance < -0.1:
            highlight_level = 'high_negative'
        elif importance < -0.05:
            highlight_level = 'medium_negative'
        else:
            highlight_level = 'neutral'
        
        highlighted.append({
            'word': word,
            'importance': importance,
            'highlight_level': highlight_level
        })
    
    return highlighted

In [None]:
def generate_simple_explanation(predicted_class: int, positive_words: List, negative_words: List) -> str:
    """
    Generate a simple, user-friendly explanation of the model's decision.
    """
    label = BIAS_LABEL_MEANINGS[predicted_class]
    
    explanation = f"The model classified this text as '{label}' because it "
    
    if positive_words:
        top_positive = [word for word, _ in positive_words[:3]]
        explanation += f"focused on words like '{"', '".join(top_positive)}' which "
        explanation += f"strongly indicate {label.lower()} bias. "
    
    if negative_words:
        top_negative = [word for word, _ in negative_words[:3]]
        explanation += f"However, words like '{"', '".join(top_negative)}' "
        explanation += f"work against this classification. "
    
    return explanation.strip()

In [None]:
# Test with the problematic text
problematic_text = "The radical left-wing agenda is destroying our great nation! These socialist policies will bankrupt America and take away our freedoms. We must stop this madness before it's too late!"

print("Analyzing problematic text with LIME...")
print("This may take a minute...\n")

lime_result = analyze_text_with_lime(problematic_text)

print("=== LIME Analysis Results ===")
print(f"Predicted Label: {lime_result['predicted_label']}")
print(f"Confidence: {lime_result['confidence']:.1f}%")
print(f"\nSimple Explanation: {lime_result['simple_explanation']}")

print("\n=== Words Supporting This Classification ===")
for word, importance in lime_result['positive_influences']:
    print(f"  '{word}': {importance:.3f}")

print("\n=== Words Working Against This Classification ===")
for word, importance in lime_result['negative_influences']:
    print(f"  '{word}': {importance:.3f}")

print("\n=== All Class Probabilities ===")
for label, prob in lime_result['all_probabilities'].items():
    print(f"  {label}: {prob:.2f}%")

In [None]:
# Display highlighted text (simulated for web)
print("\n=== Text Highlighting (for Web Display) ===")
print("Green = Supports classification, Red = Works against classification\n")

for word_data in lime_result['highlighted_text']:
    word = word_data['word']
    level = word_data['highlight_level']
    importance = word_data['importance']
    
    if level.endswith('positive'):
        color = '🟢' if 'high' in level else '🟡'
    elif level.endswith('negative'):
        color = '🔴' if 'high' in level else '🟠'
    else:
        color = '⚪'
    
    print(f"{color} {word}", end=" ")

print("\n\nLegend:")
print("🟢 Strongly supports classification")
print("🟡 Moderately supports classification")
print("🔴 Strongly works against classification")
print("🟠 Moderately works against classification")
print("⚪ Neutral/minimal impact")

In [None]:
# Test with neutral text
neutral_text = "The new healthcare policy introduced by the government aims to provide universal coverage for all citizens. This comprehensive reform will ensure that no one is left behind and healthcare becomes a fundamental right for everyone."

print("\n" + "="*50)
print("ANALYZING NEUTRAL TEXT")
print("="*50)

neutral_result = analyze_text_with_lime(neutral_text, num_features=8)

print(f"Predicted Label: {neutral_result['predicted_label']}")
print(f"Confidence: {neutral_result['confidence']:.1f}%")
print(f"\nSimple Explanation: {neutral_result['simple_explanation']}")

print("\n=== Most Influential Words ===")
for word, importance in neutral_result['positive_influences'][:3]:
    print(f"  '{word}': {importance:.3f} (supports classification)")
for word, importance in neutral_result['negative_influences'][:3]:
    print(f"  '{word}': {importance:.3f} (works against)")

## Web Implementation Strategy

Based on this analysis, here's how we should implement LIME in the web application:

### 1. **User-Friendly Approach**
- **Text Highlighting**: Color-code words based on their influence
- **Simple Explanations**: Plain English explanations of the model's decision
- **Top Influential Words**: Show 3-5 most important words with explanations

### 2. **API Response Structure**
```json
{
  "classification": {
    "label": "Liberal",
    "confidence": 65.2
  },
  "explanation": {
    "simple_text": "The model focused on words like 'radical', 'socialist'...",
    "highlighted_words": [
      {"word": "radical", "importance": 0.15, "highlight_level": "high_positive"},
      {"word": "left-wing", "importance": 0.12, "highlight_level": "high_positive"}
    ],
    "key_influences": {
      "supporting": [("radical", 0.15), ("socialist", 0.12)],
      "opposing": [("great", 0.08), ("nation", 0.06)]
    }
  }
}
```

### 3. **Frontend Display**
- **Highlighted Text**: Show original text with color-coded words
- **Explanation Box**: Simple explanation in conversational language
- **Advanced Toggle**: Optional detailed view for technical users

This approach makes AI explanations accessible to general users while still providing the insights needed to debug model behavior.