# 📊 Historical Data Project Checklist

## Schema + Database
- [ ] Forward engineer schema from EER model into MariaDB
- [ ] Export schema as `.sql` and commit to repo
- [ ] Store DB connection config securely (`.env` file, not hardcoded)

## Data Ingestion
- [ ] Clean + normalize historical CSVs (consistent columns & datatypes)
- [ ] Write Python ETL:
  - [ ] Load CSVs into staging
  - [ ] Upsert into dimensions
  - [ ] Insert/Update fact table
  - [ ] Add logging (row counts, success messages)
- [ ] Run sanity checks:
  - [ ] Row counts match expected
  - [ ] No duplicate facts at the grain
  - [ ] Foreign keys resolve correctly

## Analysis Layer
- [ ] Write SQL queries / aggregations:
  - [ ] Comp × season summaries
  - [ ] Player career totals
  - [ ] Team vs player splits
- [ ] Save queries as SQL files or in Jupyter notebooks
- [ ] (Optional) Create SQL views for common aggregations

## Visualisation
- [ ] Connect queries → pandas → Plotly
- [ ] Export charts:
  - [ ] Static PNGs (`kaleido`) for embedding
  - [ ] Interactive HTML (`fig.write_html`) for site
- [ ] Test chart interactivity locally

## Site / Publishing
- [ ] Organize outputs:
  - [ ] `/assets/images/plots/` for PNGs
  - [ ] `/plots/` for HTML
- [ ] Create Jekyll/Quarto pages embedding plots
- [ ] Push to GitHub → verify GitHub Actions build → deploy to Pages
- [ ] Check iframe paths for interactive Plotly HTMLs

## Polish / Extras
- [ ] Add README with project overview + schema diagram
- [ ] Export EER diagram as image and include on site
- [ ] Add validation script for ETL (row counts, duplicates)
- [ ] Document environment (`requirements.txt` or `environment.yml`)
- [ ] (Optional) Add Makefile / batch script for end-to-end run