# Interactive Heatmap Widget Example

This notebook demonstrates how to use the `HeatmapWidget` - an anywidget-based implementation of the observatory heatmap component for Jupyter notebooks.

The widget provides interactive policy evaluation heatmaps with:
- Have control over the number of policies displayed
- Decide and select metrics to render
- Choose which runs to get policies from
- Dynamically get data from the Observatory API to render
- Display heatmaps for custom metrics if you like


## Import and Basic Setup


In [None]:
%load_ext autoreload
%autoreload 2
%load_ext anywidget

print("Setup complete! Auto-reload enabled.")

from experiments.notebooks.utils.heatmap_widget.heatmap_widget.HeatmapWidget import HeatmapWidget

## Real Data from Metta Database

Now let's write code that fetches real evaluation data from metta's databases:

First we'll need some API variables.

Then we'll make an API client for the metta API.

Then we'll use the client to retrieve policy data.

Then we'll render it with the HeatmapWidget.


In [None]:
from metta.common.client.metta_client import MettaAPIClient
from experiments.notebooks.utils.heatmap_widget.heatmap_widget.util import fetch_real_heatmap_data

client = MettaAPIClient("https://api.observatory.softmax-research.net")


## Example: Finding policies with search text

Now let's explore what's available in the database and create a heatmap with real data:


In [None]:
# For now, let's try with some common metrics and see what we find:

# You can search for policies with a list of search texts
search_texts = ["zfogg.1753775626", "daveey.arena.rnd."]

# You can get training run policies by exact training run names
# search_texts = [
#     "daveey.arena.rnd.16x4.2",
#     "relh.skypilot.fff.j20.666",
#     "bullm.navigation.low_reward.baseline",
#     "bullm.navigation.low_reward.baseline.07-17", 
#     "bullm.navigation.low_reward.baseline.07-23",
#     "relh.multigpu.fff.1",
#     "relh.skypilot.fff.j21.2",
# ]

# Common metrics that are likely to exist:
# metrics_to_fetch = ["reward", "heart.get", "action.move.success"]
metrics_to_fetch = ["ore_red.get"]

# We're using search_texts to find training run policies to generate heatmaps for.
real_heatmap = await fetch_real_heatmap_data(
    search_texts=search_texts,
    api_base_url=str(client._http_client.base_url),
    metrics=metrics_to_fetch,
    max_policies=50  # Limit display to keep it manageable
)
    
real_heatmap

## Example: Finding policies using training run names

Here's how to create a heatmap with your own training runs and metrics:


In [None]:
# Example: Create a custom heatmap with specific training runs and metrics

# Use training run names (recommended - uses smart selection)
my_training_runs = [
    # Add your training run names here, for example:
    "gregorypylypovych.navigation.ffa_NAV_DEFAULT_vtm_4rooms_of_4_seed0.07-27",
    "gregorypylypovych.navigation.ffa_DEFAULT_vtm_4rooms_of_4_seed0.07-27",
    "gregorypylypovych.navigation.ffa_DEFAULT_tfn_4rooms_of_4_seed0.07-27",
    "gregorypylypovych.navigation.ffa_DEFAULT_vtb_4rooms_of_4_seed0.07-27",
]

# Step 2: Define metrics you want to compare
my_metrics = [
    "reward",
    "heart.get",           # Example game-specific metric
    "action.move.success", # Example action success rate
    # Add more metrics as needed
]

print("ðŸŽ¯ Creating custom heatmap with best policies from training runs...")
# Select best policies from training runs
custom_heatmap = await fetch_real_heatmap_data(
    api_base_url=str(client._http_client.base_url),
    search_texts=my_training_runs,
    metrics=my_metrics,
    policy_selector="best",
    max_policies=20
)

print("ðŸ“Š Custom heatmap created! Try:")
print("   - Hovering over cells to see detailed values")
print("   - Changing metrics with: custom_heatmap.update_metric('heart.get')")
print("   - Adjusting policies shown: custom_heatmap.set_num_policies(15)")

custom_heatmap

## Example: Setting the metric

Now let's see the `update_metric` functionality working properly! This example shows a heatmap where changing the metric actually changes the displayed values:


In [None]:
# Last one. Scroll up again to see the change.
print("\nðŸ”„ Changing metric to 'success_rate'...")
custom_heatmap.update_metric('action.move.success')

print("Now the heatmap in the cell that ran before this one should have changed.")

In [None]:
# And now let's change it to reward.
custom_heatmap.update_metric('reward')

## Example: Custom metrics

We can really define our cells to have any metric data we want. This is useful because we plan to have all sorts of metrics. Let's look at an example of using any old metric we decide.

First, you need to know the data format:

### Info: Data Cell Format Reference

The heatmap widget expects data in a specific format that matches the
observatory dashboard:

```python
cells = {
    'policy_name': {
        'eval_name': {
            'metrics': {
                'reward': 50,
                'heart.get': 98,
                'action.move.success': 5,
                'ore_red.get': 24.2,
                # ... more metrics
            },
            'replayUrl': str,         # URL to replay file
            'evalName': str,          # Should match the key
        },
        # ... more evaluations
    },
    # ... more policies
}
```

**Important notes:**
- Evaluation names with "/" will be grouped by category (the part before "/")
- The heatmap shows policies sorted by average score (worst to best, bottom to top)
- Policy names that contain ":v" will have WandB URLs generated automatically
- Replay URLs should be accessible URLs or file paths

This widget provides the same interactive functionality as the observatory dashboard but in a python environment, making it perfect for exploratory analysis and sharing results via Jupyter notebooks!


### Here's how to create a heatmap with your own data:


In [None]:
# Create a new heatmap widget
from experiments.notebooks.utils.heatmap_widget.heatmap_widget.HeatmapWidget import create_heatmap_widget

custom_widget = create_heatmap_widget()

# Define your data structure
# This should match the format expected by the observatory dashboard
cells_data = {
    'my_policy_v1': {
        'task_a/level1': {
            'metrics': {
                'custom_score': 85.2,
            },
            'replayUrl': 'https://example.com/replay1.json', 
            'evalName': 'task_a/level1'
        },
        'task_a/level2': {
            'metrics': {
                'custom_score': 87.5,
            },
            'replayUrl': 'https://example.com/replay2.json', 
            'evalName': 'task_a/level2'
        },
        'task_b/challenge1': {
            'metrics': {
                'custom_score': 92.5,
            },
            'replayUrl': 'https://example.com/replay3.json', 
            'evalName': 'task_b/challenge1'
        },
    },
    'my_policy_v2': {
        'task_a/level1': {
            'metrics': {
                'custom_score': 22.5,
            },
            'replayUrl': 'https://example.com/replay4.json', 
            'evalName': 'task_a/level1'
        },
        'task_a/level2': {
            'metrics': {
                'custom_score': 42.5,
            },
            'replayUrl': 'https://example.com/replay5.json', 
            'evalName': 'task_a/level2'
        },
        'task_b/challenge1': {
            'metrics': {
                'custom_score': 62.5,
            },
            'replayUrl': 'https://example.com/replay6.json', 
            'evalName': 'task_b/challenge1'
        },
    },
}

eval_names = ['task_a/level1', 'task_a/level2', 'task_b/challenge1']
policy_names = ['my_policy_v1', 'my_policy_v2']
policy_averages = {
    'my_policy_v1': 91.6,
    'my_policy_v2': 89.6,
}

# Set the data
custom_widget.set_data(
    cells=cells_data,
    eval_names=eval_names,
    policy_names=policy_names,
    policy_average_scores=policy_averages,
    selected_metric="custom_score"
)

# Display the widget
custom_widget
