<a href="https://colab.research.google.com/github/MarkusThill/BitBully/blob/master/notebooks/c4_analyze_runtimes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Benchmark Aggregation: BitBully vs Baseline C4

This notebook aggregates raw runtime measurements from multiple benchmark runs of the **BitBully** Connect-4 solver and a baseline (**Pons-C4**) and produces a compact markdown table for the README.

## What this notebook does

Given a directory of CSV files named like:

- `times_<nply>_ply_<nrepeats>_pos.csv`

(where each file contains paired per-position runtimes in seconds for `Bitbully` and `Pons-C4`), the notebook:

1. **Discovers benchmark files**
   - Scans the current directory and selects files matching the naming pattern.
   - Extracts `nply` and `nrepeats` from each filename and sorts results by `nply`.

2. **Loads paired timing data**
   - Reads each CSV into a pandas DataFrame with columns:
     - `Bitbully`
     - `Pons-C4`

3. **Computes summary statistics per depth**
   - Mean and standard deviation for both solvers.
   - **Speed-up** as: `mean(Pons-C4) / mean(Bitbully)` (values > 1 indicate BitBully is faster).

4. **Runs a paired significance test**
   - Applies the **Wilcoxon signed-rank test** on the paired timings with `alternative="less"`,
     testing whether BitBully runtimes are *systematically smaller* than Pons-C4 runtimes.
   - Reports the resulting **p-value** alongside the timing statistics.

5. **Emits a README-ready markdown table**
   - Formats all results into a single markdown table containing:
     `nply`, `nrepeats`, `Mean ± Std` for both solvers, `Speed-up`, and `p-value`.

## Outputs

- A printed markdown table that can be pasted directly into the project README.
- (Optional) The intermediate `times` list contains the parsed metadata and computed statistics for further plotting/analysis.


In [None]:
import re
from operator import itemgetter
from pathlib import Path

import pandas as pd
from scipy.stats import wilcoxon

# Compile once
FILENAME_RE = re.compile(r"^times_(\d+)_ply_(\d+)_pos\.csv$")


def find_time_files(directory: str | Path) -> list[tuple[Path, int, int]]:
    """Find matching CSV files and extract (nply, nrepeats).

    Returns:
        List of (path, nply, nrepeats)
    """
    directory = Path(directory)
    results: list[tuple[Path, int, int]] = []

    for path in directory.iterdir():
        if not path.is_file():
            continue

        m = FILENAME_RE.match(path.name)
        if not m:
            continue

        nply = int(m.group(1))
        nrepeats = int(m.group(2))
        results.append((path, nply, nrepeats))

    return results

In [None]:
files = find_time_files(".")
files = sorted(files, key=itemgetter(1))  # x[1] == nply

In [None]:
# files = {i: f"times_{i}_ply_1000_pos.csv" for i in range(8,16)}

times = []
for path, nply, nrepeats in files:
    df = pd.read_csv(path)

    # P-value
    x = df["Bitbully"]
    y = df["Pons-C4"]
    res = wilcoxon(x, y, alternative="less")
    times.append(
        {"nrepeats": nrepeats, "nply": nply, "means": dict(df.mean()), "std": dict(df.std()), "p-value": res.pvalue}
    )

In [None]:
import pandas as pd


def format_as_markdown(data: list[dict]) -> str:
    """Format the timing data as a Markdown table.

    Args:
        data (list[dict]): List of timing data dictionaries.

    Returns:
        str: Markdown formatted table.
    """
    table_data = []

    for i in range(0, len(data), 1):
        nply = data[i].get("nply", "-")
        nrepeats = data[i].get("nrepeats", "-")
        means = data[i].get("means", {})
        stds = data[i].get("std", {})
        p_value = data[i].get("p-value", "")
        significant = "*" if p_value < 0.05 else ""

        bitbully = f"{means.get('Bitbully', '-'):.4f} ± {stds.get('Bitbully', '-'):.4f}"
        pons_c4 = f"{means.get('Pons-C4', '-'):.4f} ± {stds.get('Pons-C4', '-'):.4f}"

        speed_up = "-"
        if means.get("Pons-C4") and means.get("Bitbully"):
            speed_up = f"{means['Pons-C4'] / means['Bitbully']:.2f}"

        table_data.append([nply, nrepeats, bitbully, pons_c4, speed_up, f"{p_value:.2e}", significant])

    df = pd.DataFrame(
        table_data,
        columns=[
            "nply",
            "nrepeats",
            "Bitbully (Mean ± Std)",
            "Pons-C4 (Mean ± Std)",
            "Speed-up",
            "p-value",
            "Significant",
        ],
    )

    return df.to_markdown(index=False)


# Print Markdown Table
print(format_as_markdown(times))