# F1 Strategy - Notebook 01: Exploratory Data Analysis

## Objectives
- Verify integrity of OpenF1 aggregates produced by `get_data.py`.
- Build a clean race- and driver-race-level snapshot for 2023-2024.
- Define modeling targets for strategy prediction (pit count, first pit lap, compound sequence).
- Identify data gaps to exclude or impute.

## Inputs (from `data/openf1_full/`)
- `sessions_all.csv`
- `stints_all.csv`
- `pit_all.csv`
- `weather_all.csv`
- `starting_grid_all.csv`
- `session_result_all.csv`
- `race_control_all.csv`
- `meetings_all.csv`

## Sanity checks
- Row counts by season and by session type.
- Uniqueness: `session_key × driver_number × stint_number`.
- Consistency: sum of stint lengths ≈ laps completed for finishers.
- Pit-stint coherence: number of pit entries ≈ compound changes.
- Weather coverage: timestamps span the race window.
- Missingness matrix per table and per season.

## Core EDA questions
- Distribution of pit counts per race and per season.
- Stint length by compound and by circuit.
- First-pit lap vs. starting position buckets.
- SC/VSC frequency by circuit and its relation to pit timing.
- Track temperature vs. stint length for each compound.
- Outliers: extreme pit durations, micro stints, irregular weather segments.

## Temporary data models (in-memory)
- **races_master**: one row per race with date, circuit hint, SC/VSC, weather summery.
- **driver_race**: one row per driver-race with targets and covariates:
    - Targets: `pit_count`, `first_pit_lap`, `compound_seq` (e.g. S-M-H).
    - Covariates: grid_position, finish status, SC/VSC exposure, simple weather summary.

## Visuals (preview plan)
- Bar: plt count distributions per race.
- Box: stint length by compound.
- Hist: first pit-lap.
- Line: track temperature over race time with vertical lines at SC and pit windows.
- Heatmap: compound usage across grid for a selected race.

## Outputs
- Short EDA conclusions and data caveats.
- Final target definitions for Notebook 02 (feature engineering)
- List of sessions with incomplete data to exclude or impute

## Next
- Notebook 02: feature engineering and target creation.
- Notebook 03: baseline models and calibrations.
- Notebook 04: error analysis per circuit and weather conditions.