# Interactive Heatmap Widget Example

This notebook demonstrates how to use the `HeatmapWidget` - an anywidget-based implementation of the observatory heatmap component for Jupyter notebooks.

The widget provides interactive policy evaluation heatmaps with:
- Have control over the number of policies displayed
- Decide and select metrics to render
- Choose which runs to get policies from


## Installation

First, make sure you have the required dependencies:

`pip install anywidget traitlets httpx jupyter`


## Import and Basic Setup


In [None]:
%load_ext autoreload
%autoreload 2
import os

import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use("default")

%load_ext anywidget

print("Setup complete! Auto-reload enabled.")


## Example: Demo Heatmap with Sample Data

Let's start with a simple demo that includes sample data:


In [None]:

from experiments.notebooks.utils.heatmap_widget import create_demo_heatmap, create_heatmap_widget

# Create a demo heatmap with sample data
demo_widget = create_demo_heatmap()

# Display the widget
demo_widget

## Real Data from Metta Database

Now let's write code that fetches real evaluation data from metta's databases:

First we'll need some API variables.

Then we'll make an API client for the metta API.

Then we'll use the client to retrieve policy data.

Then we'll render it with the HeatmapWidget.


In [None]:
# Check out this module, it will help you use this widget.
from experiments.notebooks.utils.heatmap_widget import MettaAPIClient, fetch_real_heatmap_data

api_base_url = "https://api.observatory.softmax-research.net"
auth_token = ""  # from ~/.metta/observatory_tokens.yaml

if not auth_token:
    raise ValueError("No auth token found. Please set auth_token in the notebook. Check ~/.metta/observatory_tokens.yaml")

client = MettaAPIClient(api_base_url, auth_token)


## Example: Using Real Data

Now let's explore what's available in the database and create a heatmap with real data:


In [None]:
# For now, let's try with some common metrics and see what we find:
real_heatmap = None
try:
    # specific_runs = [
    #     "daveey.arena.rnd.16x4.2",
    #     "relh.skypilot.fff.j20.666",
    #     "bullm.navigation.low_reward.baseline",
    #     "bullm.navigation.low_reward.baseline.07-17", 
    #     "bullm.navigation.low_reward.baseline.07-23",
    #     "relh.multigpu.fff.1",
    #     "relh.skypilot.fff.j21.2",
    # ]
    
    # Common metrics that are likely to exist:
    metrics_to_fetch = ["reward", "heart.get", "ore_red.get", "action.move.success"]

    client = MettaAPIClient(api_base_url, auth_token)
    runs = await client.get_all_training_runs()
    run_names = [policy["name"] for policy in runs["policies"]]
    
    real_heatmap = await fetch_real_heatmap_data(
        api_base_url=api_base_url,
        auth_token=auth_token,
        training_run_names=run_names,
        metrics=metrics_to_fetch,
        max_policies=50  # Limit display to keep it manageable
    )
    
except Exception as e:
    print(f"❌ Error fetching real data: {e}")
    print("💡 This might happen if:")
    print("   - The database URI is incorrect")
    print("   - You're not authenticated with wandb")
    print("   - The specified metrics don't exist in the database")
    print("   - You don't have access to the database")
    print("\n🔄 Falling back to demo data...")
    
    # Fall back to demo data if real data fails
    from experiments.notebooks.utils.heatmap_widget import create_demo_heatmap
    demo_fallback = create_demo_heatmap()
    demo_fallback

real_heatmap

## Example: Simply get metrics for policies from the API

Here's how to create a heatmap with your own training runs and metrics using the smart policy selection:


In [None]:
# Example: Create a custom heatmap with specific training runs and metrics

# Option 1: Use training run names (recommended - uses smart selection)
my_training_runs = [
    # Add your training run names here, for example:
    "relh.multigpu.fff.1",
    "relh.skypilot.fff.j21.2",
    "relh.skypilot.fff.j20.666",
]

# Option 2: Use exact policy URIs (if you know exactly which ones you want)
my_specific_policies = [
    # Add exact policy URIs here, for example:
    # "my_policy_name:v123",
    # "baseline_experiment:v456",
    # "new_approach:v789"
]

# Step 2: Define metrics you want to compare
my_metrics = [
    "reward",
    "heart.get",           # Example game-specific metric
    "action.move.success", # Example action success rate
    # Add more metrics as needed
]

# Step 3: Optional - filter to specific evaluations
# eval_filter = "sim_env LIKE '%maze%'"  # Only maze environments
# eval_filter = "sim_env LIKE '%combat%'"  # Only combat environments  
eval_filter = None  # No filter - include all evaluations

# Step 4: Create the heatmap
custom_heatmap = None
if my_training_runs:  # Use smart policy selection from training runs
    print("🎯 Creating custom heatmap with best policies from training runs...")
    
    # Select best policies from training runs
    custom_heatmap = await fetch_real_heatmap_data(
        api_base_url=api_base_url,
        auth_token=auth_token,
        training_run_names=my_training_runs,
        metrics=my_metrics,
        policy_selector="best",
        max_policies=20
    )
    
    print("📊 Custom heatmap created! Try:")
    print("   - Hovering over cells to see detailed values")
    print("   - Changing metrics with: custom_heatmap.update_metric('heart.get')")
    print("   - Adjusting policies shown: custom_heatmap.set_num_policies(15)")
    
    custom_heatmap
    
else:
    print("📝 To use this example:")
    print("   - Add your training run names to 'my_training_runs' list above")

custom_heatmap

## Example: Multiple Metrics with Working selectedMetric

Now let's see the `selectedMetric` functionality working properly! This example shows a heatmap where changing the metric actually changes the displayed values:


In [None]:
# Create a multi-metric heatmap widget
from experiments.notebooks.utils.heatmap_widget import create_multi_metric_demo

multi_metric_widget = create_multi_metric_demo()

# Display the widget
multi_metric_widget


In [None]:
# Now try changing the metric to see the values actually change!
print("🔄 Changing metric to 'episode_length'...")
multi_metric_widget.update_metric('episode_length')

# NOTE: Notice how the values in the heatmap widget change as you switch
# metrics?  Do not display the widget again and try to change that. That ends up
# creating a seperate copy of the widget in a new output cell.  Instead just
# reference the one you originally rendered, call its functions, and watch it
# change in its Juypter notebook cell. Like we just did. Let's do it again in
# the next cell too.


In [None]:
# One more time. Run this cell then scroll back up again to see the change.
print("\n🔄 Changing metric to 'success_rate'...")
multi_metric_widget.update_metric('success_rate')


In [None]:
# Last one. Scroll up again to see the change.
print("\n🔄 Changing metric to 'success_rate'...")
multi_metric_widget.update_metric('success_rate')


# Example: Custom metrics

We can really define our cells to have any metric data we want. This is useful because we plan to have all sorts of metrics. Let's look at an example of using any old metric we decide.

Here's how to create a heatmap with your own data:


In [None]:
# Create a new heatmap widget
custom_widget = create_heatmap_widget()

# Define your data structure
# This should match the format expected by the observatory dashboard
cells_data = {
    'my_policy_v1': {
        'task_a/level1': {
            'metrics': {
                'custom_score': 85.2,
            },
            'replayUrl': 'https://example.com/replay1.json', 
            'evalName': 'task_a/level1'
        },
        'task_a/level2': {
            'metrics': {
                'custom_score': 87.5,
            },
            'replayUrl': 'https://example.com/replay2.json', 
            'evalName': 'task_a/level2'
        },
        'task_b/challenge1': {
            'metrics': {
                'custom_score': 92.5,
            },
            'replayUrl': 'https://example.com/replay3.json', 
            'evalName': 'task_b/challenge1'
        },
    },
    'my_policy_v2': {
        'task_a/level1': {
            'metrics': {
                'custom_score': 22.5,
            },
            'replayUrl': 'https://example.com/replay4.json', 
            'evalName': 'task_a/level1'
        },
        'task_a/level2': {
            'metrics': {
                'custom_score': 42.5,
            },
            'replayUrl': 'https://example.com/replay5.json', 
            'evalName': 'task_a/level2'
        },
        'task_b/challenge1': {
            'metrics': {
                'custom_score': 62.5,
            },
            'replayUrl': 'https://example.com/replay6.json', 
            'evalName': 'task_b/challenge1'
        },
    },
}

eval_names = ['task_a/level1', 'task_a/level2', 'task_b/challenge1']
policy_names = ['my_policy_v1', 'my_policy_v2']
policy_averages = {
    'my_policy_v1': 91.6,
    'my_policy_v2': 89.6,
}

# Set the data
custom_widget.set_data(
    cells=cells_data,
    eval_names=eval_names,
    policy_names=policy_names,
    policy_average_scores=policy_averages,
    selected_metric="custom_score"
)

# Display the widget
custom_widget


## Example: Adding Callbacks for Interactivity

You can add Python callbacks to respond to user interactions:


In [None]:
# NOTE: these callbacks do not work with print(), and that's really just how
# Jupyter widgets work.  Once the Jupyter python cell finishes running and
# outputs a widget, that widget won't be able to affect the output of the cell
# anymore. The only way to to print() from a python widget callback is to write
# to a file (or use a thread maybe). I give an example below.

# Create another widget for callback demonstration
callback_widget = create_heatmap_widget()

# Set up the same data as before
callback_widget.set_data(
    cells=cells_data,
    eval_names=eval_names,
    policy_names=policy_names,
    policy_average_scores=policy_averages,
    selected_metric="Interactive Score (%)"
)

# Define callback functions
def handle_cell_selection(cell_info):
    """Called when user hovers over a cell (not 'overall' column)."""
    with open("output_cell_selection.txt", "w") as f:
        f.write(f"📍 Cell selected: {cell_info['policyUri']} on evaluation '{cell_info['evalName']}'")

def handle_replay_opened(replay_info):
    """Called when user clicks to open a replay."""
    with open("output_replay_opened.txt", "w") as f:
        f.write(f"🎬 Replay opened: {replay_info['replayUrl']}")
        f.write(f"   Policy: {replay_info['policyUri']}")
        f.write(f"   Evaluation: {replay_info['evalName']}")

# Register the callbacks
callback_widget.on_cell_selected(handle_cell_selection)
callback_widget.on_replay_opened(handle_replay_opened)

# Display the widget
callback_widget


In [None]:
# Delete the files created by the callbacks in the previous cell, if they exist

for fname in ["output_cell_selection.txt", "output_replay_opened.txt"]:
    try:
        with open(fname, "r") as f:
            print(f.read())
        os.remove(fname)
        print(f"File {fname} deleted")
    except FileNotFoundError:
        pass


**Try interacting with the heatmap above to see the callback messages printed to
*output files!**

## Info: Data Cell Format Reference

The heatmap widget expects data in a specific format that matches the
observatory dashboard:

```python
cells = {
    'policy_name': {
        'eval_name': {
            'metrics': {
                'reward': 50,
                'heart.get': 98,
                'action.move.success': 5,
                'ore_red.get': 24.2,
                # ... more metrics
            },
            'replayUrl': str,         # URL to replay file
            'evalName': str,          # Should match the key
        },
        # ... more evaluations
    },
    # ... more policies
}
```

**Important notes:**
- Evaluation names with "/" will be grouped by category (the part before "/")
- The heatmap shows policies sorted by average score (worst to best, bottom to top)
- Policy names that contain ":v" will have WandB URLs generated automatically
- Replay URLs should be accessible URLs or file paths

This widget provides the same interactive functionality as the observatory dashboard but in a python environment, making it perfect for exploratory analysis and sharing results via Jupyter notebooks!
