# Router Dataset Analysis

Exploring the train and test router parquet files to understand the distribution of routing targets and tool families.


In [5]:
from pathlib import Path
import pandas as pd
from IPython.display import display, Markdown

BASE_PATH = Path("dataset") / "rlla_4k"
DATASETS = {
    "train": pd.read_parquet(BASE_PATH / "train_router.parquet"),
    "test": pd.read_parquet(BASE_PATH / "test_router.parquet"),
}

## Dataset overview

Load the router splits and confirm basic schema details before diving deeper.


In [6]:
for name, df in DATASETS.items():
    print(f"=== {name} router ===")
    print(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
    print("Columns:", ", ".join(df.columns))
    print("Ability distribution:", df["ability"].value_counts().to_dict())
    print()

=== train router ===
Rows: 3,920 | Columns: 7
Columns: data_source, prompt, ability, reward_model, extra_info, router_target_action_gt, router_tool_family_gt
Ability distribution: {'math': 3920}

=== test router ===
Rows: 80 | Columns: 7
Columns: data_source, prompt, ability, reward_model, extra_info, router_target_action_gt, router_tool_family_gt
Ability distribution: {'math': 80}



In [7]:
display(Markdown(
    "## Router target distribution\n\n"
    "Quantify how often each routing action (Search Family, Answer, Calculate Family, etc.) "
    "appears in the train and test splits."
))

## Router target distribution

Quantify how often each routing action (Search Family, Answer, Calculate Family, etc.) appears in the train and test splits.

In [8]:
def summarize_actions(df: pd.DataFrame) -> pd.DataFrame:
    counts = df["router_target_action_gt"].value_counts().rename("count")
    percent = (counts / counts.sum() * 100).round(2).rename("percent")
    summary = pd.concat([counts, percent], axis=1)
    summary.index.name = "router_target_action_gt"
    return summary

action_summaries = {name: summarize_actions(df) for name, df in DATASETS.items()}
for name, summary in action_summaries.items():
    print(f"=== {name} router ===")
    display(summary)
    print()


=== train router ===


Unnamed: 0_level_0,count,percent
router_target_action_gt,Unnamed: 1_level_1,Unnamed: 2_level_1
SEARCH-FAMILY,2273,57.98
CALCULATE-FAMILY,572,14.59
ANSWER,472,12.04
MIXED,456,11.63
OTHER,147,3.75



=== test router ===


Unnamed: 0_level_0,count,percent
router_target_action_gt,Unnamed: 1_level_1,Unnamed: 2_level_1
SEARCH-FAMILY,46,57.5
CALCULATE-FAMILY,11,13.75
MIXED,10,12.5
ANSWER,9,11.25
OTHER,4,5.0





display(Markdown(
    "## Additional insights\n\n"
    "Look at router-provided tool family metadata to understand how many tool options each "
    "example exposes and which families dominate."
))


In [9]:
import ast
from collections import Counter

def tool_family_stats(df: pd.DataFrame):
    parsed = df["router_tool_family_gt"].apply(ast.literal_eval)
    option_counts = parsed.apply(len)
    fam_counter = Counter()
    for mapping in parsed:
        fam_counter.update(mapping.values())
    fam_df = (
        pd.Series(fam_counter)
        .sort_values(ascending=False)
        .rename("count")
        .to_frame()
    )
    fam_df["percent"] = (fam_df["count"] / fam_df["count"].sum() * 100).round(2)
    return option_counts, fam_df

for name, df in DATASETS.items():
    option_counts, fam_df = tool_family_stats(df)
    print(f"=== {name} router ===")
    print(f"Average # tool options: {option_counts.mean():.2f}")
    print(f"Median # tool options: {option_counts.median():.0f}")
    display(fam_df.head(5))
    print()


=== train router ===
Average # tool options: 3.12
Median # tool options: 3


Unnamed: 0,count,percent
SEARCH,8853,72.46
CALCULATE,2102,17.21
OTHER,1262,10.33



=== test router ===
Average # tool options: 3.34
Median # tool options: 3


Unnamed: 0,count,percent
SEARCH,183,68.54
CALCULATE,57,21.35
OTHER,27,10.11





display(Markdown(
    "### Notes\n\n"
    "- Both splits are math-only and share the same schema of 12 columns.\n"
    "- Search-heavy routing dominates (≈58% of train, 57% of test), while Answer and Calculate "
    "routes form the next largest groups; Mixed/Other remain smaller but non-trivial.\n"
    "- Each example enumerates roughly three tool options on average (train ≈3.1, test ≈3.3), so "
    "the router must evaluate multiple plausible actions.\n"
    "- Tool families labeled as `SEARCH` vastly outnumber `CALCULATE` and `OTHER`, reinforcing the "
    "skew observed in the target labels."
))