Skip to content

0trm/maldini

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

34 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Maldini

Maldini is one of Spain's most prominent football journalists. Every week on his YouTube channel @mundomaldini he makes explicit, probabilistic predictions about upcoming matches. This project captures every prediction, scores it objectively with a Brier score, and surfaces the answer in a live dashboard.

โ–ถ Live dashboard

maldini-youtube

Is Julio Maldonado ("Maldini") a superforecaster?

A Brier score measures the accuracy of probabilistic predictions โ€“ lower is better, 0 is perfect.

Benchmark Brier Score
Naive baseline (guess 1/3 each outcome) 0.222
Betting markets ~0.19
Superforecaster threshold < 0.20
Perfect forecaster 0.00

Maldini earns the superforecaster badge only when his all-time average Brier score drops below 0.20 โ€“ and only once he has 100+ scored predictions for statistical reliability.

The project tracks 1,500 predictions from 2022-Q4 to 2026-Q2.


Architecture

                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚  data/videos.csv                 โ”‚
                โ”‚  data/results_overrides.csv  (*) โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       maldini.pipeline                          โ”‚
โ”‚                                                                 โ”‚
โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚    โ”‚ ingest  โ”‚โ”€โ”€โ”€โ–บโ”‚ extract โ”‚โ”€โ”€โ”€โ–บโ”‚ results โ”‚โ”€โ”€โ”€โ–บโ”‚ scoring โ”‚     โ”‚
โ”‚    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚         โ–ผ              โ–ผ              โ–ผ              โ–ผ          โ”‚
โ”‚      YouTube         Claude        TheSportsDB     DuckDB       โ”‚
โ”‚      transcript      Haiku LLM     match data      Brier 2/3w   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ–ผ
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚  data/predictions.parquet            โ”‚
                โ”‚  one row per prediction; single      โ”‚
                โ”‚  source of truth, committed to git   โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ–ผ
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚            maldini.render            โ”‚
                โ”‚  DuckDB CTEs  โ†’  summary stats       โ”‚
                โ”‚  Jinja2       โ†’  dist/index.html     โ”‚
                โ”‚                  dist/index.en.html  โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                  โ–ผ
                       GitHub Pages (auto)

(*) Manual scoreline fixups for matches TheSportsDB can't auto-resolve.
Schedule: GitHub Actions cron, Sundays 08:00 UTC (.github/workflows/weekly.yml).

Parquet is the single source of truth. It lives in git, so every dashboard build is reproducible from a commit hash. The pipeline is idempotent โ€“ re-running it on the same videos.csv only processes new video_ids, and pending predictions (matches not yet played) are persisted with null results so the next run picks them up.


Transformation

All SQL runs in DuckDB in-process, embedded inside src/maldini/pipeline.py (scoring) and src/maldini/render.py (summary stats).

Brier score variants:

  • 3-outcome (league matches): ((p_home - I_home)ยฒ + (p_draw - I_draw)ยฒ + (p_away - I_away)ยฒ) / 3
  • 2-outcome (knockout, where pred_draw_pct = 0): renormalise home + away to sum to 1, then ((p_home - I_home)ยฒ + (p_away - I_away)ยฒ) / 2

Summary statistics (all-time average, accuracy, monthly trend, competition breakdown, Brier distribution) are computed by maldini.render from the parquet at render time โ€“ a few short CTEs, no separate materialised tables.


Stack

Layer Technology
Pipeline Python package (src/maldini/)
Transformations DuckDB (in-process SQL)
Storage Parquet file in git (data/predictions.parquet)
LLM Anthropic Claude Haiku
External APIs YouTube Data API v3, youtube-transcript-api, TheSportsDB
Dashboard Jinja2 โ†’ static HTML
Schedule GitHub Actions (weekly cron)
Hosting GitHub Pages

How to run locally

For a full step-by-step guide, see docs/SETUP.md. The summary below is enough to get going.

Prerequisites

git clone https://github.com/tomas-ravalli/maldini-stats.git
cd maldini-stats
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
cp .env.example .env   # fill in YOUTUBE_API_KEY and ANTHROPIC_API_KEY

Run

# 1. Ingest, extract, fetch results, score
python -m maldini.pipeline --file data/videos.csv

# 2. Generate static HTML from the parquet
python -m maldini.render

# 3. View
open dist/index.html

Equivalently, the package exposes maldini-pipeline and maldini-render console scripts. Run pytest for the unit tests.

To add new videos: append rows to data/videos.csv and re-run.


Design notes

  • Parquet lives in git โ€“ every dashboard build is reproducible from a commit hash. If scoring logic changes, rebuild from data/videos.csv.
  • DuckDB for everything SQL โ€“ no warehouse, no credentials, no quotas; the whole pipeline runs on a laptop or a free-tier GitHub Actions runner in under a minute.
  • Fuzzy team matching โ€“ normalisation strips accents, common prefixes (Real, Atlรฉtico), and applies Spanishโ†’English word substitutions before substring matching against TheSportsDB results.
  • No-date window โ€“ predictions without a match_date use a 45-day window from publish_date to find the matching fixture.
  • No-draw handling โ€“ when pred_draw_pct == 0, a 2-outcome Brier formula is applied automatically.

Documentation


License

MIT


Built by with AI.
ยฉ trm

About

๐Ÿ”ฎ Brier scores to evaluate predictions, wrapped in a terminal dashboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors