# AgentSHAP: Explainability for AI Agents

This notebook demonstrates how to use **AgentSHAP** to explain AI agent behavior by analyzing which tools the agent relied on to produce its response.

## Table of Contents

1. [Setup](#setup)
2. [Agent Explainability Analysis](#analysis) - Run analysis and visualize tool attribution
3. [Deep Dive: Understanding Agent Behavior](#deep-dive) - Baseline, ablations, and SHAP calculation
4. [Comparing Tool Importance Across Prompts](#compare) - Same agent, different prompts

---

## 1. Setup <a name="setup"></a>

In [None]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

# Add parent directory to path
parent_dir = Path().resolve().parent
if str(parent_dir) not in sys.path:
    sys.path.insert(0, str(parent_dir))

from token_shap.base import OpenAIModel, OpenAIEmbeddings
from token_shap.tools import Tool, create_function_tool
from token_shap.agent_shap import AgentSHAP

In [None]:
# Set your API key
import os
api_key = os.environ.get("OPENAI_API_KEY", "your-api-key-here")

# Initialize model and vectorizer (using OpenAI embeddings for semantic similarity)
model = OpenAIModel(model_name="gpt-4o-mini", api_key=api_key)
vectorizer = OpenAIEmbeddings(api_key=api_key, model="text-embedding-3-large")

print("Model and vectorizer initialized!")

## 2. Agent Explainability Analysis <a name="analysis"></a>

Let's create an agent with tools and explain which tools it relies on for a specific query.

In [None]:
# Define custom tools with their executors

def get_weather(args):
    """Simulate weather API"""
    city = args.get("city", "Unknown")
    # Simulated weather data
    weather_data = {
        "new york": "72°F, sunny with light clouds",
        "london": "58°F, overcast with light rain",
        "tokyo": "68°F, clear skies",
        "paris": "65°F, partly cloudy",
    }
    city_lower = city.lower()
    weather = weather_data.get(city_lower, f"75°F, pleasant weather")
    return f"Weather in {city}: {weather}"

def get_stock_price(args):
    """Simulate stock API"""
    symbol = args.get("symbol", "UNKNOWN").upper()
    # Simulated stock data
    stock_data = {
        "AAPL": ("$178.52", "+1.2%"),
        "GOOGL": ("$141.80", "-0.5%"),
        "MSFT": ("$378.91", "+0.8%"),
        "TSLA": ("$248.50", "+2.1%"),
    }
    price, change = stock_data.get(symbol, ("$100.00", "+0.0%"))
    return f"{symbol}: {price} ({change})"

def calculate(args):
    """Safe calculator"""
    expression = args.get("expression", "0")
    try:
        # Only allow safe math operations
        allowed = set("0123456789+-*/.() ")
        if all(c in allowed for c in expression):
            result = eval(expression)
            return f"Result: {result}"
        return "Error: Invalid expression"
    except Exception as e:
        return f"Error: {str(e)}"

def search_news(args):
    """Simulate news search"""
    topic = args.get("topic", "general")
    return f"Latest news on {topic}: Market shows positive trends. Experts predict continued growth."

# Create Tool objects
weather_tool = create_function_tool(
    name="get_weather",
    description="Get current weather for a city",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    },
    executor=get_weather
)

stock_tool = create_function_tool(
    name="get_stock_price",
    description="Get current stock price for a ticker symbol",
    parameters={
        "type": "object",
        "properties": {
            "symbol": {"type": "string", "description": "Stock ticker symbol (e.g., AAPL, GOOGL)"}
        },
        "required": ["symbol"]
    },
    executor=get_stock_price
)

calculator_tool = create_function_tool(
    name="calculate",
    description="Perform mathematical calculations",
    parameters={
        "type": "object",
        "properties": {
            "expression": {"type": "string", "description": "Mathematical expression to evaluate"}
        },
        "required": ["expression"]
    },
    executor=calculate
)

news_tool = create_function_tool(
    name="search_news",
    description="Search for latest news on a topic",
    parameters={
        "type": "object",
        "properties": {
            "topic": {"type": "string", "description": "News topic to search"}
        },
        "required": ["topic"]
    },
    executor=search_news
)

tools = [weather_tool, stock_tool, calculator_tool, news_tool]
print(f"Created {len(tools)} tools: {[t.name for t in tools]}")

In [None]:
# Create AgentSHAP instance
agent_shap = AgentSHAP(
    model=model,
    tools=tools,
    vectorizer=vectorizer,
    max_iterations=5,
    debug=False
)

# Run analysis
prompt = "What's the weather in New York and how is AAPL stock doing today?"
print(f"Analyzing prompt: {prompt}\n")

results_df, shapley_values = agent_shap.analyze(
    prompt=prompt,
    sampling_ratio=0.5
)

### Tool Attribution Visualization

Like TokenSHAP shows token importance, AgentSHAP shows tool importance (Red = High, Blue = Low):

In [None]:
# Colored text visualization (like TokenSHAP's print_colored_text)
print("Tool importance (text color):")
agent_shap.print_colored_tools()

print("\nTool importance (background highlight):")
agent_shap.highlight_tools_background()

In [None]:
# Plot colored tools (like TokenSHAP's plot_colored_text)
agent_shap.plot_colored_tools()
plt.show()

In [None]:
# Bar chart with importance ranking
agent_shap.plot_tool_importance()
plt.show()

In [None]:
# Summary: Input, Output, Tools Used, SHAP Values
details = agent_shap.get_detailed_results()

print("=" * 70)
print("INPUT PROMPT:")
print("=" * 70)
print(details['prompt'])

print("\n" + "=" * 70)
print("AGENT RESPONSE:")
print("=" * 70)
print(details['baseline_response'])

print("\n" + "=" * 70)
print("TOOLS CALLED BY AGENT:")
print("=" * 70)
for tool, count in details['baseline_tool_usage'].items():
    print(f"  {tool}: {count} call(s)")

print("\n" + "=" * 70)
print("TOOL ATTRIBUTION (SHAP Values):")
print("=" * 70)
agent_shap.print_colored_tools()

## 3. Deep Dive: Understanding Agent Behavior <a name="deep-dive"></a>

To explain the agent, we compare its response with all tools (baseline) against responses with tools removed (ablations).

In [None]:
# Baseline: Agent response with ALL tools available
details = agent_shap.get_detailed_results()

print("=" * 70)
print("BASELINE (All Tools Available)")
print("=" * 70)
print(f"\nPrompt: {details['prompt']}\n")
print(f"Response:\n{details['baseline_response']}\n")
print(f"Tools called: {details['baseline_tool_usage']}")

In [None]:
# Ablations: Agent responses with tools REMOVED
print("=" * 70)
print("ABLATIONS (Tools Removed)")
print("=" * 70)
print("Each combination shows agent behavior when certain tools are unavailable.\n")

for idx, row in results_df.iterrows():
    print(f"--- Combination {idx + 1} ---")
    print(f"Tools available: {row['tools_available']}")
    print(f"Similarity to baseline: {row['similarity']:.4f}")
    
    used_cols = [c for c in row.index if c.startswith('used_') and pd.notna(row[c])]
    if used_cols:
        used_tools = {c.replace('used_', ''): int(row[c]) for c in used_cols}
        print(f"Tools called: {used_tools}")
    
    response = row['response'][:150] + "..." if len(row['response']) > 150 else row['response']
    print(f"Response: {response}\n")

## 4. Comparing Tool Importance Across Prompts <a name="compare"></a>

Like PixelSHAP shows how different questions highlight different image regions, AgentSHAP shows how different prompts lead to different tool importance.

**Same agent, same tools, different prompts → different tool attribution.**

In [None]:
# Compare tool importance across different prompts
prompts = [
    "What's the weather in Tokyo?",
    "How is TSLA stock performing today?",
    "What is 15% of 250?",
    "What's happening in the tech news?"
]

fig, all_shap_values = agent_shap.compare_prompts(prompts, sampling_ratio=0.5)
plt.show()

### Key Insight

Notice how each prompt activates different tools:
- **Weather query** → `get_weather` dominates
- **Stock query** → `get_stock_price` dominates  
- **Math query** → `calculate` dominates
- **News query** → `search_news` dominates

This demonstrates that AgentSHAP correctly identifies prompt-dependent tool attribution.

---

## Summary

**AgentSHAP** explains agent behavior by:
1. Running the agent with all tools (baseline)
2. Running ablations with tools removed
3. Computing Shapley values showing each tool's contribution

**Key insights**:
- Tools with high SHAP values were critical for the response
- Removing important tools significantly changes the output
- Different prompts activate different tools - demonstrating prompt-dependent behavior