End-to-end pipeline that, every day at 12:00 PM ET, scrapes the MLB slate, grades every game using the Sharp Betting System, generates a betting card for each play that clears the edge + confidence filters, and pushes the cards to your phone (Slack / Discord / Telegram / email).
┌─ settle yesterday's pending bets via MLB Stats API
↓
┌─────────────────┐ ┌─────────────────┐ ┌────────────────┐ ┌──────────┐
│ scraper.py │ → │ grader.py │ → │ cards.md │ → │ notify.py│
│ Stats API + │ │ 0–100 grades + │ │ Markdown bet │ │ Slack / │
│ Statcast + │ │ NRFI / F5 / FG │ │ card per play │ │ Discord /│
│ Open-Meteo + │ │ probabilities + │ │ │ │ Telegram/│
│ Odds API │ │ unit sizing │ │ │ │ email │
└─────────────────┘ └─────────────────┘ └────────────────┘ └──────────┘
↓ ↑
┌────────────────┐ ┌──────────────────┐ │
│ GitHub Actions │ → │ bet_tracker.py │ ──────┘
│ daily cron │ │ append → log, │
│ @ 12 PM ET │ │ rebuild record.md│
└────────────────┘ └──────────────────┘
│
bet_log.csv ←─────┴─────→ record.md
(append-only) (overall + by category)
| Market | Notes |
|---|---|
| Full-game Moneyline | sides only, devigged vs Pinnacle for fair |
| Full-game Run Line | derived from win prob × cover translation |
| Full-game Total | normal-distribution model around expected runs |
| First-5 Total | pitching-heavy, no bullpen contribution |
| NRFI / YRFI | starter quality + park + top-of-order proxy |
Books shopped for the best price: FanDuel, DraftKings, BetMGM, Caesars. Pinnacle is included only as the sharp anchor for devig math — you never see it as a "best book to bet at".
.
├── README.md this file
├── MLB_Sharp_Betting_System.md the overall playbook
├── README_scraper.md deeper docs on the scraper
├── requirements.txt Python deps
├── mlb_data_scraper.py all data ingestion
├── mlb_grader.py 0-100 grade model + bet card generator
├── notify.py Slack/Discord/Telegram/email push
├── bet_tracker.py append, settle, report (overall + by category + CLV)
├── closing_snapshot.py captures closing-line value hourly during games
├── bankroll_sim.py replays log under flat / ladder / Kelly sizing
├── .github/workflows/
│ ├── daily-bets.yml 12 PM ET cron + commits results
│ └── closing-snapshot.yml hourly cron during MLB game windows
└── mlb_data/
├── bet_log.csv ← every bet ever recommended (append-only)
├── record.md ← overall + per-market record + CLV (rebuilt daily)
├── bankroll_sim.md ← five sizing strategies replayed (rebuilt daily)
└── <DATE>/
├── slate.json
├── odds.json
├── grades.json
├── cards.md ← what gets pushed to your phone
└── games/<gamePk>.json
Every bet the grader recommends is written to bet_log.csv as a pending row
on the day it's made. The next morning's run settles those bets by pulling
the final box score from the MLB Stats API and computing units P/L for each
market type:
| Market | Settlement |
|---|---|
| moneyline | side wins outright |
| runline | side covers ±1.5 (margin > -line) |
| total | full-game runs over/under, integer line = push |
| f5_total | runs through 5 innings, integer line = push |
| nrfi / yrfi | scoreless 1st inning vs run scored in 1st |
The settled rows are then aggregated into record.md showing:
- Overall record — bets, W-L-P, win %, units risked, units P/L, ROI %, avg edge predicted
- Per-category breakdown — same columns, one row per market (moneyline, runline, total, f5_total, nrfi)
- Time windows — All time, last 30 days, last 7 days
- By confidence tier — 9-10 / 7-8 / 5-6
- By book — fanduel / draftkings / betmgm / caesars
- Pending bets — what's still open
- Last 10 settled — recent results
Run any piece manually:
python bet_tracker.py append --date 2026-04-25 # add today's recs
python bet_tracker.py settle # close out finished games
python bet_tracker.py report # rebuild record.md
python bet_tracker.py daily --date 2026-04-25 # all three in one shotbet_log.csv is plain CSV — open it in Excel / Google Sheets to slice it
however you want, or load it into a notebook with pandas.read_csv().
A second workflow, closing-snapshot.yml, runs hourly during MLB game
windows. Each invocation:
- Loads
bet_log.csvand finds pending bets whose first pitch is in the next ~60 minutes and that don't yet have a closing snapshot. - Pulls the current odds, finds the same market/side/line at our four
target books, and records the best available price as
closing_price. - Devigs the same market on Pinnacle to compute
closing_fair_prob. - Writes
clv_pct = closing_fair_prob − bet_implied_probandbeat_close.
Then the daily report adds a CLV section:
## Closing Line Value (All Time)
| Bucket | Bets w/ Close | Beat Close % | Avg CLV % |
|--- |--------------:|-------------:|----------:|
| OVERALL | 9 | 66.7% | +1.04% |
| moneyline | 2 | 50.0% | +0.74% |
| nrfi | 3 | 100.0% | +1.91% |
Why it matters: Win/loss is high-variance over hundreds of bets. CLV is the only short-term proof your process is right. A sharp bettor averages +2-3% CLV over time. If your CLV is consistently negative, the model is miscalibrated — fix that first, regardless of recent W/L.
API budget: The Odds API free tier is 500 requests/month. The hourly closing cron only fires when there are pending bets in the window, and each invocation is one request — fits comfortably.
bankroll_sim.py replays every settled bet in bet_log.csv against five
position-sizing strategies and writes bankroll_sim.md:
| Strategy | What it does |
|---|---|
| flat_1u | Bet exactly 1% of current bankroll on every play |
| flat_2u | Bet exactly 2% — tests bankroll volatility tolerance |
| current_ladder | Sized exactly as the cards recommended (the system default) |
| half_kelly | 0.5 × Kelly fraction (capped at 5% of bankroll per bet) |
| quarter_kelly | 0.25 × Kelly — most conservative, lowest variance |
The report shows ending bankroll, growth %, ROI, and max drawdown per strategy — overall and broken out per market. Use it to answer:
- Is the current ladder beating flat 1u? (If not, simpler is better.)
- Is half-Kelly leaving money on the table or causing too much variance?
- Which markets have produced positive long-run growth and which haven't?
The simulator uses percent-of-current-bankroll sizing, so it auto- rebalances after wins and losses (compounding). This is closer to how sharp bettors actually scale — fixed-dollar simulations underrepresent the upside of a working edge.
Run manually:
python bankroll_sim.py # default $10,000 bankroll
python bankroll_sim.py --start-bankroll 5000 # customThe daily workflow rebuilds it automatically after each settlement pass.
git init mlb-sharp
cd mlb-sharp
# copy the files from this folder in
git add .
git commit -m "initial sharp betting system"
git branch -M main
git remote add origin git@github.com:<you>/mlb-sharp.git
git push -u origin mainRequired:
| Secret | Why |
|---|---|
ODDS_API_KEY |
Free key from https://the-odds-api.com — enables Pinnacle devig + multi-book shop |
Pick at least one delivery channel:
| Secret(s) | What it does |
|---|---|
SLACK_WEBHOOK_URL |
Slack Incoming Webhook URL |
DISCORD_WEBHOOK_URL |
Discord channel webhook |
TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID |
Telegram bot push |
EMAIL_SMTP_HOST + EMAIL_SMTP_PORT + EMAIL_SMTP_USER + EMAIL_SMTP_PASS + EMAIL_TO |
SMTP email |
Open the Actions tab, find Daily MLB Sharp Cards, click Run workflow to test manually. If the secrets are in place you should:
- see a green build,
- receive the cards in your channel,
- see a new commit
daily cards YYYY-MM-DDwith the data files.
After that, the cron at 12 PM ET handles it automatically.
Configured at the top of mlb_grader.py. Current defaults (calibrated for
the market-anchored model — see "How the model finds edges" below):
UNIT_LADDER = [
(0.060, 8, 1.5, "Strong"), # 6%+ edge AND 8/10 conf → rare
(0.045, 7, 1.0, "Standard"), # 4.5%+ edge, 7/10 conf
(0.030, 6, 0.5, "Lean"), # 3%+ edge, 6/10 conf
]
MIN_EDGE = {moneyline 2.5%, runline 3.5%, total 3%, f5_total 3.5%, nrfi 3.5%}
MIN_CONFIDENCE = 6
MAX_REASONABLE_EDGE = 12% # anything above this is treated as a calibration bug
MAX_CARDS_PER_SLATE = 5 # top by edge × confidence
MAX_UNITS_PER_SLATE = 6.0u # bankroll safety
MAX_UNITS_PER_GAME = 1.5u
MAX_CARDS_PER_GAME = 1 # never more than one bet per game1 unit = 1% of bankroll. Recalibrate quarterly using historical_validate.py.
Never increase after a loss.
The model is market-anchored. The Pinnacle-devigged fair price is the prior; our model nudges it by a bounded amount based on signals the market may not have fully priced (last-start velo drop, bullpen exhaustion, late-breaking lineup, weather, umpire). Specifically:
| Market | Prior | Max model deviation |
|---|---|---|
| Moneyline | Pinnacle devigged P(home) | ±5pp |
| Total runs | Pinnacle posted line | ±0.8 runs |
| F5 total | Pinnacle posted F5 line | ±0.5 runs |
| NRFI | Pinnacle devigged P(NRFI) | ±6pp |
This is the most important calibration choice in the system. Without the market anchor, an unanchored grade model will think every +200 underdog has a 50% chance and report fake 30%+ edges — that's textbook overfitting to your own model. With the anchor, edges of 2-4% are realistic, 6%+ is rare, and >12% is rejected as a sign of a calibration bug.
If a market is missing (rare for h2h, common for F5 / 1st-inning) the model
falls back to a league-baseline estimate — those constants are also in
mlb_grader.py and should be refreshed yearly via historical_validate.py.
Run historical_validate.py once a year to refresh the league constants
and park factors against actual MLB outcomes. Default window is 2015-2024
(10 seasons, ~24,000 regular-season games, fetched from statsapi.mlb.com
directly — no API key required).
python historical_validate.py # 2015-2024 default
python historical_validate.py --start 2014 --end 2024Output: mlb_data/historical_report_<start>_<end>.md with:
- Per-season league total / F5 / NRFI rate / home win pct (so you can spot drift in the constants)
- Park factors recomputed from the last 3 seasons (compare to PARKS dict)
- Naive-predictor RMSE per season (the floor any real model must beat)
- A "Recommended constants" code block ready to paste into the grader
The fetched dataset is cached to mlb_data/_history/ so re-runs are
instant. CLV (closing_snapshot.py) is the forward-looking edge proof;
this report is the backward-looking baseline check.
GitHub Actions cron is in UTC. We schedule two crons:
0 16 * * * (12 PM ET during EDT, March-Nov) and 0 17 * * * (12 PM ET during
EST, Nov-March). Whichever one is "wrong" for the current DST state runs an
idempotent overwrite of the same files — no harm done.
If you want a different push time (e.g., 11 AM or 2 PM), edit
.github/workflows/daily-bets.yml.
pip install -r requirements.txt
export ODDS_API_KEY=... # required
export DISCORD_WEBHOOK_URL=... # or SLACK / TELEGRAM / EMAIL
python mlb_data_scraper.py # writes ./mlb_data/<today>/
python mlb_grader.py # writes ./mlb_data/<today>/cards.md
python notify.py # pushes the card to your channel
cat mlb_data/$(date +%F)/cards.md # eyeball it### Bet #2 — F5 Over 4.5
| Field | Value |
|---|---|
| Game | Tampa Bay Rays @ Baltimore Orioles (2026-04-25T23:05:00Z) |
| Market | f5_total |
| Best Book | DRAFTKINGS -105 (line 4.5) |
| Fair Odds | -135 (57.5%) |
| Edge | 6.42% |
| Confidence | 7/10 |
| Risk | Strong |
| Unit Size | 1.5u |
**Reasoning**
- Expected F5 total 5.10 vs market 4.5
- away: velo down -7.8 mph — RED FLAG
- away: vs RHB xwOBA 0.345
- hitter park (pf_runs 102)
**Pass triggers**
- Either starter scratched
- Line moves through your fair value
- Weather forecast worsens
- The Odds API free tier is ~500 requests / month. One scrape per day leaves plenty of headroom even with the alternate-market calls.
- F5 totals and 1st-inning totals are not posted by every book for every game. When a market is missing the grader silently skips that bet type for that game.
- The NRFI model is a deliberately simple bayesian update around the league base rate. Improving it is the highest-ROI place to extend the system — add starter K%/BB% rates, top-3-batter wOBA in the 1st, and umpire K% from UmpScorecards.
- Grades are only as good as the data. If lineups aren't confirmed by the
time the cron runs at noon, the offense + lineup categories grade as
neutral. For the sharpest cards, run the grader again ~30 min before
first pitch with a manual
workflow_dispatch(the workflow already supports it).