# Dashboard (Minimal)

このNotebookは `artifacts/` の既存結果を読むだけの軽量ダッシュボードです。

- 入力: `artifacts/predictions/predictions_*.csv`, `artifacts/reports/backtest_*.json`, `artifacts/reports/train_metrics.json`
- 重い再学習は行わず、可視化と確認だけを実施

In [None]:
from pathlib import Path
import json
import pandas as pd
import matplotlib.pyplot as plt

ROOT = Path.cwd().resolve().parents[0] if Path.cwd().name == "notebooks" else Path.cwd()
ART = ROOT / "artifacts"
PRED_DIR = ART / "predictions"
REP_DIR = ART / "reports"

def latest(path: Path, pattern: str) -> Path:
    files = sorted(path.glob(pattern))
    if not files:
        raise FileNotFoundError(f"No files: {path}/{pattern}")
    return files[-1]

pred_path = latest(PRED_DIR, "predictions_*.csv")
backtest_path = latest(REP_DIR, "backtest_*.json")
train_path = REP_DIR / "train_metrics.json"

pred_df = pd.read_csv(pred_path)
with backtest_path.open("r", encoding="utf-8") as f:
    backtest = json.load(f)
with train_path.open("r", encoding="utf-8") as f:
    train = json.load(f)

print("pred:", pred_path.name, "rows=", len(pred_df))
print("backtest:", backtest_path.name)

In [None]:
summary = pd.DataFrame([
    {
        "prediction_file": pred_path.name,
        "rows": int(len(pred_df)),
        "races": int(pred_df["race_id"].nunique()) if "race_id" in pred_df.columns else None,
        "top1_hit_rate": backtest.get("top1_hit_rate"),
        "top3_hit_rate": backtest.get("top3_hit_rate"),
        "top5_hit_rate": backtest.get("top5_hit_rate"),
        "simple_top1_win_roi": backtest.get("simple_top1_win_roi"),
        "train_auc": train.get("auc"),
        "train_logloss": train.get("logloss"),
    }
])
summary

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4.5))

axes[0].hist(pred_df["score"], bins=25, color="#3b82f6", alpha=0.9)
axes[0].set_title("Score Distribution")
axes[0].set_xlabel("score")

if "race_id" in pred_df.columns:
    race_top = pred_df.groupby("race_id", as_index=False)["score"].max()
    axes[1].boxplot(race_top["score"], vert=True)
    axes[1].set_title("Per-race Top Score")
    axes[1].set_ylabel("score")
else:
    axes[1].text(0.5, 0.5, "race_id not found", ha="center", va="center")

plt.tight_layout()
plt.show()

In [None]:
cols = [c for c in ["race_id", "horse_id", "horse_name", "score", "pred_rank", "rank"] if c in pred_df.columns]
pred_df[cols].sort_values("score", ascending=False).head(20).reset_index(drop=True)

## Next

- CLIで再生成: `python scripts/run_dashboard.py`
- 予測対象日を変えて比較: `python scripts/run_predict.py ... --race-date YYYY-MM-DD`
- その後 `run_backtest.py` で hit_rate / ROI を追跡