# Inseq Feature Attribution API Visualization

This notebook demonstrates how to interact with the Inseq Feature Attribution API and visualize the results. The helper functions for interacting with the API can be found in `api_client.py`.

In [19]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import json
from IPython.display import display, HTML

# Import our helper functions
from api_client import (
    # API interaction functions
    submit_attribution_job,
    check_job_status,
    get_job_results,
    wait_for_job_completion,
    
    # Data processing functions
    create_attribution_dataframe,
    find_important_tokens,
    
    # Visualization functions
    visualize_attributions,
    plot_step_scores,
    plot_multiple_step_scores,
    visualize_attribution_flow,
    
    # Complete workflows
    run_attribution_workflow,
    compare_attribution_methods,
    
    # Reference data
    ATTRIBUTION_METHODS,
    STEP_SCORE_FUNCTIONS,
    RECOMMENDED_MODELS
)

## API Configuration

Set the base URL for the API. Update this if your API is running on a different host or port.

In [20]:
# Configure the API base URL
API_BASE_URL = "http://localhost:8000"

## Available Attribution Methods and Step Scores

The API supports various attribution methods and step score functions. Let's explore the available options.

In [21]:
# Display available attribution methods
methods_df = pd.DataFrame.from_dict(ATTRIBUTION_METHODS, orient='index')
print("Available Attribution Methods:")
display(methods_df)

Available Attribution Methods:


Unnamed: 0,description,complexity,speed,special_params
input_x_gradient,Input multiplied by gradient,Low,Fast,
saliency,Simple gradient attribution,Low,Fast,
integrated_gradients,Integrated Gradients method,Medium,Medium,n_steps
deep_lift,DeepLIFT method,Medium,Medium,n_steps
gradient_shap,GradientSHAP method,Medium,Medium,n_steps
layer_integrated_gradients,Layer Integrated Gradients,Medium,Medium,n_steps
layer_gradient_x_activation,Layer Gradient × Activation,Medium,Medium,
layer_deep_lift,Layer DeepLIFT,Medium,Medium,n_steps
attention,Attention weights,Low,Fast,
occlusion,Occlusion-based attribution,Medium,Slow,


In [22]:
# Display available step score functions
scores_df = pd.DataFrame.from_dict(STEP_SCORE_FUNCTIONS, orient='index', columns=['Description'])
print("Available Step Score Functions:")
display(scores_df)

Available Step Score Functions:


Unnamed: 0,Description
logit,Logit of the target token
probability,Probability of the target token
entropy,Entropy of output distribution
crossentropy,Cross entropy between target and logits
perplexity,Perplexity of target from logits
contrast_logits,Logit of a generation given contrastive context
contrast_prob,Probability given contrastive context
contrast_logits_diff,Difference between logits with contrastive inputs
contrast_prob_diff,Difference between probabilities with contrast...
pcxmi,Pointwise conditional cross-mutual information


In [23]:
# Display recommended models
models_df = pd.DataFrame.from_dict(RECOMMENDED_MODELS, orient='index', columns=['Description'])
print("Recommended Models:")
display(models_df)

Recommended Models:


Unnamed: 0,Description
distilgpt2,"Lightweight GPT-2 model, works well on CPU"
gpt2,Standard GPT-2 model
facebook/opt-125m,Small OPT model
google/flan-t5-small,Small T5 model (encoder-decoder)


## Example 1: Simple Attribution Workflow

This example demonstrates a complete attribution workflow, from submitting a job to visualizing the results.

In [None]:
# Run a simple attribution workflow
simple_results = run_attribution_workflow(
    input_text="Hello Ladies and ",
    model="distilgpt2",  # Using a smaller model for faster processing
    method="input_x_gradient",
    step_scores=["probability", "logit"],
    api_base_url=API_BASE_URL
)

## Example 2: Comparing Different Attribution Methods

This example shows how to compare different attribution methods on the same input text.

In [None]:
# Compare different attribution methods
comparison_results = compare_attribution_methods(
    input_text="Feature attribution helps explain model predictions.",
    methods=["input_x_gradient", "saliency", "attention"],  # Choose methods to compare
    model="distilgpt2",
    api_base_url=API_BASE_URL
)

## Example 3: Advanced Visualization - Attribution Flow

This visualization shows how attribution flows from source to target tokens using a network graph.

In [None]:
# Only run this if you've already run one of the examples above
if 'simple_results' in locals():
    print("Creating attribution flow visualization...")
    G = visualize_attribution_flow(simple_results["df_attribution"], threshold=0.1)
else:
    print("Please run Example 1 first to generate attribution results.")

## Example 4: Custom Attribution Job

This example shows how to execute each step of the attribution process individually, which gives you more control over the workflow.

In [None]:
# Define the input text and parameters
input_text = "Natural language processing has advanced significantly with transformers."
model = "distilgpt2"
method = "integrated_gradients"

# Step 1: Submit the attribution job
print(f"Submitting attribution job with method: {method}")
job_id = submit_attribution_job(
    input_text=input_text,
    model=model,
    method=method,
    step_scores=["probability", "entropy", "perplexity"],
    n_steps=20,  # Important for integrated_gradients
    force_cpu=True,
    api_base_url=API_BASE_URL
)
print(f"Job submitted with ID: {job_id}")

# Step 2: Wait for the job to complete
print("Waiting for job to complete...")
job_status = wait_for_job_completion(job_id, api_base_url=API_BASE_URL)
print(f"Job completed with status: {job_status['status']}")

# Step 3: Get the results
results = get_job_results(job_id, api_base_url=API_BASE_URL)

# Step 4: Create a DataFrame of attribution scores
df_attribution = create_attribution_dataframe(results)
print("\nAttribution Scores DataFrame:")
display(df_attribution)

# Step 5: Visualize the attribution scores
print("\nVisualizing attribution scores...")
visualize_attributions(df_attribution)

# Step 6: Plot the step scores
print("\nPlotting step scores...")
df_step_scores = plot_step_scores(results)

# Step 7: Plot each step score separately for clarity
if df_step_scores is not None and len(df_step_scores.columns) > 1:
    print("\nPlotting individual step scores...")
    plot_multiple_step_scores(df_step_scores)

# Step 8: Find the most important source token for each target token
df_important = find_important_tokens(df_attribution)
print("\nMost important source token for each target token:")
display(df_important)

## Example 5: Working with Step Scores

This example focuses specifically on using and analyzing different step scores.

In [None]:
def analyze_step_scores(input_text="Explain how feature attribution works."):
    """Run attribution with multiple step scores and analyze them."""
    # Submit a job with multiple step scores
    job_id = submit_attribution_job(
        input_text=input_text,
        model="distilgpt2",
        method="input_x_gradient",
        step_scores=["probability", "logit", "entropy", "perplexity"],
        force_cpu=True,
        api_base_url=API_BASE_URL
    )
    
    print(f"Job submitted with ID: {job_id}")
    job_status = wait_for_job_completion(job_id, api_base_url=API_BASE_URL)
    print(f"Job completed with status: {job_status['status']}")
    
    results = get_job_results(job_id, api_base_url=API_BASE_URL)
    
    # Extract step scores data
    sequence_attr = results["sequence_attributions"][0]
    target_tokens = [token["token"] for token in sequence_attr["target"]]
    
    # Create a dataframe for step scores
    step_scores_data = {}
    for score_name, scores in sequence_attr["step_scores"].items():
        step_scores_data[score_name] = scores
    
    df_scores = pd.DataFrame(step_scores_data, index=target_tokens)
    
    # Basic analysis of step scores
    print("\nBasic statistics for each step score:")
    display(df_scores.describe())
    
    # Find tokens with highest/lowest scores
    print("\nTokens with extreme values:")
    for column in df_scores.columns:
        max_token = df_scores[column].idxmax()
        min_token = df_scores[column].idxmin()
        print(f"{column}: Highest = '{max_token}' ({df_scores[column].max():.4f}), Lowest = '{min_token}' ({df_scores[column].min():.4f})")
    
    # Plot correlation matrix of step scores
    plt.figure(figsize=(8, 6))
    sns.heatmap(df_scores.corr(), annot=True, cmap="coolwarm", vmin=-1, vmax=1)
    plt.title("Correlation Between Step Scores")
    plt.tight_layout()
    plt.show()
    
    # Plot step scores
    plot_step_scores(results)
    plot_multiple_step_scores(df_scores)
    
    return {
        "results": results,
        "df_scores": df_scores,
        "df_attribution": create_attribution_dataframe(results)
    }

# Run the step scores analysis
step_scores_analysis = analyze_step_scores()

## Troubleshooting Model Compatibility

Some models may require special handling. Here are some common issues and solutions:

### Model Loading Issues

The API has been enhanced to handle various model loading scenarios, including:

1. **Flax Models**: Some models have Flax weights instead of PyTorch weights. The API will automatically try loading with `from_flax=True` if it detects this case.

2. **GPT-2 Variants**: Some GPT-2 models might require special handling. The API includes an alternative loading path for GPT-2 models.

3. **Resource Constraints**: If you're getting memory errors, try these options:
   - Use smaller models like 'distilgpt2' instead of larger ones
   - Set `force_cpu=True` in your request
   - Use simpler attribution methods like 'input_x_gradient' or 'saliency'

In [None]:
def test_model_compatibility(model="distilgpt2"):
    """Test if a model works with the API."""
    print(f"Testing compatibility with model: {model}")
    
    try:
        job_id = submit_attribution_job(
            input_text="This is a test of model compatibility.",
            model=model,
            method="input_x_gradient",  # Use a simple method for testing
            step_scores=["probability"],
            force_cpu=True,  # More reliable for testing
            api_base_url=API_BASE_URL
        )
        
        print(f"Job submitted with ID: {job_id}")
        job_status = wait_for_job_completion(job_id, api_base_url=API_BASE_URL)
        print(f"Job completed with status: {job_status['status']}")
        
        if job_status['status'] == 'completed':
            print(f"✅ Model '{model}' is compatible with the API.")
            return True
        else:
            print(f"❌ Model '{model}' is not compatible with the API.")
            print(f"Error: {job_status.get('error', 'Unknown error')}")
            return False
            
    except Exception as e:
        print(f"❌ Error testing model '{model}': {str(e)}")
        return False

# Uncomment to test a specific model
# is_compatible = test_model_compatibility("distilgpt2")

## Creating Visualizations for Reports or Presentations

This section shows how to create high-quality visualizations for including in reports or presentations.

In [None]:
def create_report_visualizations(results_dict, output_prefix="report_viz"):
    """Create high-quality visualizations for reports from attribution results."""
    import matplotlib.pyplot as plt
    import seaborn as sns
    from matplotlib.colors import TwoSlopeNorm
    
    if not results_dict or 'df_attribution' not in results_dict:
        print("No valid results provided. Please run an attribution example first.")
        return
    
    df_attribution = results_dict['df_attribution']
    
    # 1. Create a high-quality attribution heatmap
    plt.figure(figsize=(12, 10))
    
    # Create a custom colormap with white at zero
    vmax = max(abs(df_attribution.values.min()), abs(df_attribution.values.max()))
    norm = TwoSlopeNorm(vmin=-vmax, vcenter=0, vmax=vmax)
    
    ax = sns.heatmap(df_attribution, cmap="coolwarm", center=0, annot=True, fmt=".2f", 
                linewidths=.5, xticklabels=True, yticklabels=True, norm=norm)
    
    plt.title("Feature Attribution: Source Tokens (Y) → Target Tokens (X)", fontsize=16)
    plt.xlabel("Generated Tokens", fontsize=14)
    plt.ylabel("Input Tokens", fontsize=14)
    
    # Rotate x labels for better readability
    plt.xticks(rotation=45, ha="right", fontsize=12)
    plt.yticks(fontsize=12)
    
    plt.tight_layout()
    plt.savefig(f"{output_prefix}_heatmap.png", dpi=300, bbox_inches="tight")
    plt.show()
    
    # 2. Create a step scores visualization if available
    if 'df_step_scores' in results_dict and results_dict['df_step_scores'] is not None:
        df_scores = results_dict['df_step_scores']
        
        plt.figure(figsize=(12, 8))
        
        for column in df_scores.columns:
            plt.plot(df_scores.index, df_scores[column], marker='o', linewidth=2, markersize=8, label=column)
        
        plt.xlabel("Generated Tokens", fontsize=14)
        plt.ylabel("Score", fontsize=14)
        plt.title("Step Scores by Token", fontsize=16)
        plt.xticks(rotation=45, ha="right", fontsize=12)
        plt.yticks(fontsize=12)
        plt.grid(True, linestyle='--', alpha=0.7)
        plt.legend(fontsize=12)
        
        plt.tight_layout()
        plt.savefig(f"{output_prefix}_step_scores.png", dpi=300, bbox_inches="tight")
        plt.show()
    
    # 3. Create a bar chart of important tokens
    df_important = find_important_tokens(df_attribution)
    
    plt.figure(figsize=(12, 8))
    bars = plt.bar(df_important['target_token'], df_important['score'], color=[(0, 0.5, 0.8) if x > 0 else (0.8, 0.2, 0.2) for x in df_important['score']])
    
    plt.xlabel("Generated Token", fontsize=14)
    plt.ylabel("Attribution Score of Most Important Input Token", fontsize=14)
    plt.title("Most Important Input Token for Each Generated Token", fontsize=16)
    plt.xticks(rotation=45, ha="right", fontsize=12)
    plt.yticks(fontsize=12)
    
    # Add the source token labels above the bars
    for i, bar in enumerate(bars):
        source_token = df_important.iloc[i]['source_token']
        score = df_important.iloc[i]['score']
        y_pos = score + 0.02*max(abs(df_important['score'])) if score > 0 else score - 0.08*max(abs(df_important['score']))
        plt.text(bar.get_x() + bar.get_width()/2, y_pos, 
                 f"'{source_token}'", ha='center', va='bottom', fontsize=10, rotation=45)
    
    plt.axhline(y=0, color='gray', linestyle='-', alpha=0.3)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    
    plt.tight_layout()
    plt.savefig(f"{output_prefix}_important_tokens.png", dpi=300, bbox_inches="tight")
    plt.show()
    
    print(f"Visualization files saved with prefix: {output_prefix}")

# Create report visualizations if you've run an attribution example
# Uncomment to create report visualizations
# if 'simple_results' in locals():
#     create_report_visualizations(simple_results)