# Full Agent System Demo

This notebook tests the complete end-to-end flow of the Explainable AI Agent:
1.  **Data Understanding**: Analyzes the dataset schema and statistics.
2.  **Router**: Decides which specialist agent to call.
3.  **Global Explainer**: Explains overall model behavior (SHAP).
4.  **Local Explainer**: Explains specific predictions (SHAP/LIME).

In [1]:
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt
import importlib
from IPython.display import Image, display
from dotenv import load_dotenv
from catboost import CatBoostClassifier
from langchain_core.messages import HumanMessage, AIMessage

# Add parent dir
sys.path.append('..')
load_dotenv('../.env')

# Reload to pick up changes
import agent.graph
import agent.nodes.data_understanding
import agent.nodes.global_explainer
import agent.nodes.local_explainer
importlib.reload(agent.graph)
importlib.reload(agent.nodes.data_understanding)
importlib.reload(agent.nodes.global_explainer)
importlib.reload(agent.nodes.local_explainer)

from agent.graph import app

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# 1. Load Data
def load_arff_data(file_path):
    data = []
    columns = []
    with open(file_path, 'r') as f:
        data_started = False
        for line in f:
            line = line.strip()
            if not line: continue
            if line.lower().startswith("@attribute"):
                parts = line.split()
                columns.append(parts[1])
            elif line.lower().startswith("@data"):
                data_started = True
                continue
            elif data_started:
                row = [x.strip().strip("'").strip('"') for x in line.split(',')]
                data.append(row)
    return pd.DataFrame(data, columns=columns)

dataset_path = "../datasets/banking_deposit_subscription/dataset"
df = load_arff_data(dataset_path)

for col in df.columns:
    try:
        df[col] = pd.to_numeric(df[col])
    except ValueError:
       pass 

# 2. Load Model
model = CatBoostClassifier()
model.load_model("../notebooks/models/catboost_model.cbm")

print("Loaded Data and Model.")

Loaded Data and Model.


In [3]:
# Helper to print output
def print_agent_response(result):
    print("\n--- Agent Response ---")
    messages = result['messages']
    last_msg = messages[-1]
    print(f"{last_msg.type.upper()}: {last_msg.content}")

## Turn 1: Data Understanding
System initiates data analysis.

In [4]:
state_t1 = {
    "messages": [HumanMessage(content="Here is the dataset. Please analyze it.")],
    "df": df,
    "model": model,
    # We do NOT invoke target_variable yet; let the agent find it.
}

print("Invoking Data Understanding...")
result_t1 = app.invoke(state_t1)
print_agent_response(result_t1)

Invoking Data Understanding...

--- Agent Response ---
AI: The metadata has been successfully updated with the analysis of the dataset. If you need further assistance or have more questions, feel free to ask!


In [5]:
for m in result_t1['messages']:
    m.pretty_print()


Here is the dataset. Please analyze it.
Tool Calls:
  get_dataset_samples (call_q7xbxshxDiC9BJttIKykzdU8)
 Call ID: call_q7xbxshxDiC9BJttIKykzdU8
  Args:
Name: get_dataset_samples

--- Samples ---
   age           job   marital  education default  balance housing loan  contact  day month  duration  campaign  pdays  previous poutcome   y
0   58    management   married   tertiary      no     2143     yes   no  unknown    5   may       261         1     -1         0  unknown  no
1   44    technician    single  secondary      no       29     yes   no  unknown    5   may       151         1     -1         0  unknown  no
2   33  entrepreneur   married  secondary      no        2     yes  yes  unknown    5   may        76         1     -1         0  unknown  no
3   47   blue-collar   married    unknown      no     1506     yes   no  unknown    5   may        92         1     -1         0  unknown  no
4   33       unknown    single    unknown      no        1      no   no  unknown    5   may 

## Turn 2: Global Explanation
User asks for global feature importance. Router should send to Global Explainer.

In [6]:
state_t2 = result_t1 # Continue conversation
state_t2['messages'].append(HumanMessage(content="What are the most important features driving the model?"))

print("Invoking Global Explainer...")
result_t2 = app.invoke(state_t2)
print_agent_response(result_t2)

Invoking Global Explainer...

--- Agent Response ---
AI: I have generated the Global SHAP Summary Plot to analyze the feature importance. You can view the plot [here](sandbox:/artifacts/shap_summary_0df22b36171f4408b0df0a5d0b71295f.png).

### Key Insights from the SHAP Analysis:
- **Most Important Features**: The features with the highest impact on the model's predictions are typically at the top of the plot. These are the features that the model relies on most heavily to make its predictions.
- **Driving Features**: Features such as `duration`, `campaign`, and `balance` are likely to be among the most influential, as they often play a significant role in determining the outcome of marketing campaigns.

The SHAP plot provides a visual representation of how each feature contributes to the model's predictions, with the color indicating the feature value (e.g., red for high values and blue for low values).

If you have any more questions or need further analysis, feel free to ask!


## Turn 3: Local Explanation
User asks about a specific instance. Router should send to Local Explainer.

In [7]:
state_t3 = result_t2
state_t3['messages'].append(HumanMessage(content="Why was User 5 classified this way? Use SHAP."))
state_t3['user_id'] = 5 # In a real app, this would be extracted or set by UI context

print("Invoking Local Explainer...")
result_t3 = app.invoke(state_t3)
print_agent_response(result_t3)

Invoking Local Explainer...

--- Agent Response ---
AI: I have generated the SHAP Waterfall plot to explain the prediction for User 5. You can view the plot [here](sandbox:/artifacts/shap_waterfall_0_411b09666de34986b05c373f02bf309f.png).

### Explanation of the SHAP Waterfall Plot:
- **Base Value**: This is the average model output over the training dataset. It represents the starting point for the prediction.
- **Feature Contributions**: Each feature contributes to pushing the prediction from the base value towards the final output. Features that increase the prediction are shown in red, while those that decrease it are in blue.
- **Final Prediction**: The sum of the base value and all feature contributions gives the final prediction for User 5.

The plot provides a detailed breakdown of how each feature influenced the model's decision for this specific user. If you have any further questions or need additional insights, feel free to ask!


In [8]:
state_t4 = result_t3
state_t4['messages'].append(HumanMessage(content="Why was User 5 classified this way? Use LIME."))
state_t4['user_id'] = 5 # In a real app, this would be extracted or set by UI context

print("Invoking Local Explainer...")
result_t4 = app.invoke(state_t4)
print_agent_response(result_t3)

Invoking Local Explainer...


TypeError: unsupported operand type(s) for -: 'str' and 'str'