# Exploratory Analysis — Motorsports Pit Strategy Optimizer

**Purpose:** This notebook demonstrates the pit strategy optimizer pipeline in an exploratory way: loading race data, computing derived features, fitting tire degradation, running pit window simulations, and comparing recommendations to historical team decisions. It is for exploration and portfolio display, not production use.

**Project overview:** The system models tire degradation and pit loss, recommends a pit lap (or stay out), and explains the recommendation using rule-based reasoning. Scope is single-car, dry races only. See [README](../README.md) and [PRD](../docs/PRD.md) for full scope.

In [None]:
# Imports & Configuration
import sys
from pathlib import Path

# Ensure project root is on path (for notebooks/ subdir)
ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Optional: plotly and FastF1 (used later for interactive plots and data)
try:
    import plotly.graph_objects as go
    import plotly.express as px
    PLOTLY_AVAILABLE = True
except ImportError:
    PLOTLY_AVAILABLE = False

try:
    import fastf1
    FASTF1_AVAILABLE = True
except ImportError:
    FASTF1_AVAILABLE = False

# Project modules
from src.data_pipeline import load_race, add_stint_features
from src.models.tire_degradation import TireDegradationModel, get_degradation_model
from src.models.diagnostics import degradation_curve, degradation_rate_seconds_per_lap, detect_cliffs
from src.strategy import optimize_pit_window, recommended_pit_lap, explain_strategy, get_pit_loss, sensitivity_pit_loss
from src.validation import run_validation, load_validation_results
from src.visualization import (
    plot_predicted_vs_actual,
    plot_degradation_curves_by_compound,
    plot_strategy_timeline_from_laps,
)

# Display options for exploratory output
pd.set_option("display.max_columns", 20)
pd.set_option("display.width", 120)
pd.set_option("display.max_rows", 30)
%matplotlib inline
plt.rcParams["figure.figsize"] = (10, 5)
print("Imports and configuration done.")

---
## 3. Load Sample Race Data

Load one or two dry races using the data pipeline. Requires network on first run (FastF1 API); subsequent runs use cache.

In [None]:
# Load 1–2 sample races (dry only; wet races raise ValueError)
races_to_load = [(2023, "Bahrain"), (2023, "Monaco")]
sample_data = {}

for year, race_name in races_to_load:
    try:
        data = load_race(year, race_name)
        sample_data[(year, race_name)] = data
        print(f"Loaded {year} {race_name}: {len(data.laps)} laps, {len(data.pit_stops)} pit stops")
    except ValueError as e:
        print(f"Skipping {year} {race_name}: {e}")
    except Exception as e:
        print(f"Skipping {year} {race_name}: {e}")

# Display sample: first race laps (first 15 rows, key columns)
if sample_data:
    year, name = list(sample_data.keys())[0]
    laps = sample_data[(year, name)].laps
    cols = [c for c in ["DriverNumber", "LapNumber", "Compound", "LapTime"] if c in laps.columns]
    display(laps[cols].head(15) if cols else laps.head(15))

---
## 4. Compute Derived Features

Stint identification (from pit stops), lap number within stint, and estimated fuel load (linear decay). These feed the degradation model and optimizer.

In [None]:
# Add stint features to first loaded race
if sample_data:
    year, name = list(sample_data.keys())[0]
    data = sample_data[(year, name)]
    laps_with_features = add_stint_features(data.laps, data.pit_stops)

    # Table: key derived columns for one driver
    driver = str(laps_with_features["DriverNumber"].iloc[0])
    one_driver = laps_with_features[laps_with_features["DriverNumber"] == driver].head(25)
    cols = ["LapNumber", "Compound", "stint_id", "lap_in_stint", "estimated_fuel_kg"]
    cols = [c for c in cols if c in one_driver.columns]
    display(one_driver[cols])

    # Optional: simple plot of fuel decay and stint segments
    fig, ax = plt.subplots(figsize=(10, 4))
    if "estimated_fuel_kg" in one_driver.columns:
        ax.plot(one_driver["LapNumber"], one_driver["estimated_fuel_kg"], "o-", label="Estimated fuel (kg)", markersize=4)
    ax.set_xlabel("Lap number")
    ax.set_ylabel("Estimated fuel (kg)")
    ax.set_title(f"Derived features — {year} {name}, driver {driver}")
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
else:
    print("No sample data loaded; run the previous cell first.")

---
## 5. Tire Degradation Modeling

Per-track, per-compound linear model: predicted lap time vs lap-in-stint (and fuel). Degradation curves are reproducible; cliff detection flags laps where slope increases (for linear models this is typically none).

In [None]:
# Fit degradation model from first race (if not already fitted) and plot curves
if sample_data:
    year, name = list(sample_data.keys())[0]
    data = sample_data[(year, name)]
    laps_with_features = add_stint_features(data.laps, data.pit_stops)
    track_id = name
    model = get_degradation_model()

    # Fit for each compound present (model fits per track/compound)
    for comp in ["SOFT", "MEDIUM", "HARD"]:
        subset = laps_with_features[laps_with_features["Compound"].str.upper().str.strip() == comp]
        if len(subset) >= 2:
            try:
                model.predict_lap_time(track_id, comp, 1, 100.0)
            except ValueError:
                model.fit(
                    laps_with_features,
                    track_id,
                    comp,
                    lap_time_col="LapTime",
                    lap_in_stint_col="lap_in_stint",
                    fuel_col="estimated_fuel_kg",
                )
    try:
        model.save()
    except OSError:
        pass

    # Degradation curves per compound (fixed fuel for simplicity)
    curves = {}
    for comp in ["SOFT", "MEDIUM", "HARD"]:
        try:
            curves[comp] = degradation_curve(track_id, comp, fuel_kg=80.0, lap_in_stint_max=35, model=model)
        except ValueError:
            pass

    if curves:
        ax = plot_degradation_curves_by_compound(
            curves,
            title=f"Tire degradation curves — {year} {name}",
            xlabel="Lap in stint",
            ylabel="Predicted lap time (s)",
        )
        plt.tight_layout()
        plt.show()

    # Cliff detection (linear model often has no cliffs)
    if curves:
        comp = list(curves.keys())[0]
        cliffs_df = detect_cliffs(track_id, comp, 80.0, lap_in_stint_max=35, model=model)
        cliff_laps_list = cliffs_df[cliffs_df["is_cliff_candidate"]]["lap_in_stint"].tolist()
        print(f"Cliff candidates (compound {comp}): {cliff_laps_list if cliff_laps_list else 'None (linear model).'}")
else:
    print("No sample data; run load cell first.")

---
## 6. Pit Window Simulation

Run the optimizer at a sample lap: ranked pit stop options (current lap and N future laps, plus stay-out), total projected time, and time deltas. Timeline plot shows stints, pit stops, and recommended pit window.

In [None]:
# Pit window simulation at a sample lap
if sample_data:
    year, name = list(sample_data.keys())[0]
    data = sample_data[(year, name)]
    laps_with_features = add_stint_features(data.laps, data.pit_stops)
    track_id = name
    total_race_laps = int(laps_with_features["LapNumber"].max())
    current_lap = min(15, total_race_laps - 5)
    driver = str(laps_with_features["DriverNumber"].iloc[0])
    driver_laps = laps_with_features[laps_with_features["DriverNumber"] == driver].sort_values("LapNumber")
    row_at = driver_laps[driver_laps["LapNumber"] == current_lap]
    if row_at.empty:
        row_at = driver_laps.iloc[:1]
    current_compound = str(row_at["Compound"].iloc[0]).strip().upper()
    lap_in_stint = int(row_at["lap_in_stint"].iloc[0])
    model = get_degradation_model()

    results = optimize_pit_window(
        current_lap=current_lap,
        current_compound=current_compound,
        lap_in_stint=lap_in_stint,
        total_race_laps=total_race_laps,
        track_id=track_id,
        new_compound="MEDIUM",
        degradation_model=model,
    )
    rec = recommended_pit_lap(results)

    # Ranked options and time deltas
    display(results.head(12))
    print(f"Recommended: pit on lap {rec}" if rec is not None else "Recommended: stay out")

    # Timeline with recommended pit window
    pit_window = (rec - 2, rec + 2) if rec is not None else None
    ax = plot_strategy_timeline_from_laps(
        laps_with_features,
        data.pit_stops,
        driver_filter=driver,
        pit_window=pit_window,
        title=f"Strategy timeline — {year} {name}, driver {driver} (pit window shaded)",
    )
    plt.tight_layout()
    plt.show()
else:
    print("No sample data; run load cell first.")

---
## 7. Strategy Explanation

Rule-based reasoning: why the pit window opens, when degradation overtakes pit loss, cost of delaying or advancing. Human-readable text from intermediate calculations only (no ML).

In [None]:
# Strategy explanation from optimizer results
if sample_data and "results" in dir() and "model" in dir():
    ex = explain_strategy(
        results,
        track_id,
        current_compound,
        pit_loss_sec=get_pit_loss(track_id),
        degradation_model=model,
    )
    # Bullet-point display for portfolio
    print("Strategy explanation:\n")
    print(ex.get("summary_display", ex["summary"]))
else:
    # Fallback if running cells out of order: use same setup as pit window cell
    if sample_data:
        year, name = list(sample_data.keys())[0]
        data = sample_data[(year, name)]
        laps_with_features = add_stint_features(data.laps, data.pit_stops)
        track_id = name
        total_race_laps = int(laps_with_features["LapNumber"].max())
        current_lap = min(15, total_race_laps - 5)
        driver = str(laps_with_features["DriverNumber"].iloc[0])
        driver_laps = laps_with_features[laps_with_features["DriverNumber"] == driver].sort_values("LapNumber")
        row_at = driver_laps[driver_laps["LapNumber"] == current_lap].iloc[:1] if not driver_laps[driver_laps["LapNumber"] == current_lap].empty else driver_laps.iloc[:1]
        current_compound = str(row_at["Compound"].iloc[0]).strip().upper()
        lap_in_stint = int(row_at["lap_in_stint"].iloc[0])
        model = get_degradation_model()
        results = optimize_pit_window(current_lap=current_lap, current_compound=current_compound, lap_in_stint=lap_in_stint, total_race_laps=total_race_laps, track_id=track_id, new_compound="MEDIUM", degradation_model=model)
        ex = explain_strategy(results, track_id, current_compound, degradation_model=model)
        print("Strategy explanation:\n")
        print(ex.get("summary_display", ex["summary"]))
    else:
        print("Run pit window and load cells first.")

---
## 8. Historical Validation Comparison

Compare model recommendation vs actual team decisions at each pit stop: lap delta (recommended − actual) and alignment within ±3 laps. Small plot of lap deltas across decisions.

In [None]:
# Historical validation: run optimizer at each real pit decision, compare to actual
races_for_validation = [(2023, "Bahrain"), (2023, "Monaco")]
model = get_degradation_model()
details = pd.DataFrame()
summary = {}

try:
    details, summary = run_validation(races_for_validation, degradation_model=model)
except Exception as e:
    print(f"Validation failed (ensure models fitted for tracks): {e}")

if not details.empty:
    display(details[["year", "track_id", "driver_number", "actual_pit_lap", "recommended_pit_lap", "lap_delta", "alignment_within_3"]].head(20))
    print("\nSummary:", summary)

    # Small plot: lap delta distribution (valid rows only)
    valid = details[details["lap_delta"].notna()]
    if not valid.empty:
        fig, ax = plt.subplots(figsize=(8, 3))
        ax.bar(range(len(valid)), valid["lap_delta"], color=["green" if a else "gray" for a in valid["alignment_within_3"]], alpha=0.8)
        ax.axhline(y=0, color="black", linewidth=0.5)
        ax.set_xlabel("Decision index")
        ax.set_ylabel("Lap delta (rec − actual)")
        ax.set_title("Model vs actual pit lap (green = within ±3 laps)")
        plt.tight_layout()
        plt.show()
else:
    print("No validation details; run validation on loaded races with fitted models.")

---
## 9. Case Study Summary

Summarize results: race name, predicted vs actual pit lap, outcome delta, and short insights. Derived from validation details above.

In [None]:
# Case study summary table and markdown
if not details.empty and "summary" in dir():
    summary_df = details[["track_id", "year", "driver_number", "actual_pit_lap", "recommended_pit_lap", "lap_delta", "alignment_within_3"]].copy()
    summary_df["outcome_delta"] = summary_df["lap_delta"]
    display(summary_df.head(10))

    # Markdown-style summary
    print("\n--- Case study summary ---")
    print(f"Races: {details['track_id'].unique().tolist()}")
    print(f"Total decisions: {summary.get('total_decisions', 0)}")
    print(f"Within ±3 laps: {summary.get('count_within_3', 0)} ({summary.get('pct_within_3', 0)}%)")
    print(f"Mean |lap delta|: {summary.get('mean_abs_lap_delta', 'N/A')}")
    print("\nInsight: Model recommendations align with team decisions when lap delta is small; alignment metric uses ±3 laps as acceptable window.")
else:
    print("Run historical validation cell first to populate details and summary.")

---
## 10. Interactive Widgets (Optional)

Sliders to adjust current lap, fuel load assumptions, and pit delay; strategy recommendation updates live. Requires `ipywidgets` (`pip install ipywidgets`).

In [None]:
# Optional: interactive sliders for current lap, initial fuel, pit loss
try:
    import ipywidgets as widgets
    from IPython.display import display as ipy_display

    # Only build widgets if we have sample data and model
    if sample_data and "model" in dir():
        year, name = list(sample_data.keys())[0]
        data = sample_data[(year, name)]
        laps_with_features = add_stint_features(data.laps, data.pit_stops)
        track_id = name
        total_race_laps = int(laps_with_features["LapNumber"].max())
        driver = str(laps_with_features["DriverNumber"].iloc[0])
        driver_laps = laps_with_features[laps_with_features["DriverNumber"] == driver].sort_values("LapNumber")

        lap_slider = widgets.IntSlider(value=15, min=1, max=total_race_laps - 1, description="Current lap")
        fuel_slider = widgets.FloatSlider(value=110.0, min=80, max=120, step=2, description="Initial fuel (kg)")
        pit_loss_slider = widgets.FloatSlider(value=get_pit_loss(track_id), min=18, max=28, step=0.5, description="Pit loss (s)")
        out = widgets.Output()

        def update_strategy(change=None):
            with out:
                out.clear_output(wait=True)
                current_lap = lap_slider.value
                initial_fuel = fuel_slider.value
                pit_override = {track_id.lower(): pit_loss_slider.value}
                row_at = driver_laps[driver_laps["LapNumber"] == current_lap]
                if row_at.empty:
                    row_at = driver_laps.iloc[min(current_lap - 1, len(driver_laps) - 1) : min(current_lap, len(driver_laps))]
                if row_at.empty:
                    print("No lap data for this lap.")
                    return
                current_compound = str(row_at["Compound"].iloc[0]).strip().upper()
                lap_in_stint = int(row_at["lap_in_stint"].iloc[0])
                res = optimize_pit_window(current_lap=current_lap, current_compound=current_compound, lap_in_stint=lap_in_stint, total_race_laps=total_race_laps, track_id=track_id, new_compound="MEDIUM", degradation_model=model, initial_fuel_kg=initial_fuel, pit_loss_overrides=pit_override)
                rec = recommended_pit_lap(res)
                print(f"Recommendation: pit on lap {rec}" if rec is not None else "Recommendation: stay out")
                display(res.head(8))

        lap_slider.observe(update_strategy, names="value")
        fuel_slider.observe(update_strategy, names="value")
        pit_loss_slider.observe(update_strategy, names="value")

        ipy_display(widgets.VBox([widgets.HBox([lap_slider, fuel_slider, pit_loss_slider]), out]))
        update_strategy()
    else:
        print("Load sample data and run degradation/pit window cells first.")
except ImportError:
    print("ipywidgets not installed. Install with: pip install ipywidgets")

---
## 11. Conclusion

This notebook demonstrated the Motorsports Pit Strategy Optimizer pipeline in an exploratory way:

1. **Data:** Loaded dry race data via FastF1 and the project pipeline; displayed sample laps and pit stops.
2. **Derived features:** Stint identification, lap-in-stint, and estimated fuel load; table and simple plot.
3. **Degradation:** Per-compound degradation curves and optional cliff detection; linear model, reproducible outputs.
4. **Pit window:** Optimizer at a sample lap with ranked options and time deltas; strategy timeline with recommended pit window.
5. **Explanation:** Rule-based strategy text (why pit window opens, when degradation overtakes pit loss, cost of delaying/advancing).
6. **Validation:** Compared model recommendation vs actual team decisions; lap delta and alignment within ±3 laps; small plot.
7. **Case study:** Summary table and insights from validation.
8. **Optional:** Interactive sliders for current lap, fuel, and pit loss with live recommendation update.

The notebook is for exploration and portfolio display only; production use is via the CLI (`run_strategy.py`) and Python API. See [README](../README.md) and [docs/CASE_STUDIES.md](../docs/CASE_STUDIES.md) for full usage and case study reproduction.