# 03 - Practice vs Qualifying times


## Motivation

This notebook investigates how practice and qualifying lap times relate in Formula 1.
Each race weekend features three free‑practice sessions (FP1, FP2, FP3) where drivers test setups and prepare for the qualifying session that determines the grid.
Although practice laps are usually slower than qualifying laps, drivers often set quick “simulation” laps in practice that can foreshadow their qualifying performance and hint at the eventual pecking order.


## Method

1. **Qualifying**: we simply take the fastest qualifying lap for each driver.
2. **Practice**: we collect every lap a driver completes across all practice sessions and keep the quickest one. The mean lap time would be less representative because drivers typically run at least one “qualifying‑simulation” lap, while the number of slower test laps varies widely.
3. **IdealPractice**: for each driver, we sum the fastest sector times recorded during practice. This represents the best possible lap if the driver could combine all his fastest sectors in a single run.


## Gathering data

All required data is available thanks be the FastF1 API. For a given season we:

1. Retrieve the event schedule and keep only events whose qualifying session has finished.
2. For each event, fetch the qualifying session and the three practice sessions (or the single practice session on sprint weekends).

The heavy lifting comes later, when we process the raw lap data.


In [None]:
import logging
from datetime import datetime

import fastf1 as ff1
import pandas as pd

from f1_analysis.fastf1_wrapper import FastF1Wrapper

logging.disable()

fastf1_wrapper = FastF1Wrapper()

season = 2025

events = []

for _, event in ff1.get_event_schedule(season, include_testing=False).iterrows():
    round_number = event["RoundNumber"]

    qualifying = ff1.get_session(season, round_number, "Q")

    if qualifying.date > datetime.now():
        continue

    practice_sessions = [ff1.get_session(season, round_number, "FP1")]

    if event["EventFormat"] == "conventional":
        practice_sessions.append(ff1.get_session(season, round_number, "FP2"))
        practice_sessions.append(ff1.get_session(season, round_number, "FP3"))

    events.append(
        {
            "event": event,
            "qualifying": qualifying,
            "practice_sessions": practice_sessions,
        }
    )

## Gathering Data II

With all relevant event sessions identified, we now need to retrieve the lap times of interest. Although the times can be loaded directly from `fastf1`, using the `fastf1_wrapper` caches the data, eliminating repeated load times later on.

After loading the lap data we:

1. Convert the duration columns to milliseconds.
2. Group by driver to determine the fastest laps and sector times.
3. Add auxiliary columns that flag the best overall lap and the best sectors.

Finally, we merge the table of fastest practice laps with the table of fastest qualifying laps. The resulting dataset contains one row per driver per race weekend, showing:

- the driver’s quickest practice lap **Practice**,
- the driver's ideal practice lap constructed from the driver’s fastest sectors **IdealPractice**
- the driver’s actual qualifying lap **Qualifying**


In [None]:
def load_fastest_quali_laps(qualifying):
    qualifying_laps = fastf1_wrapper.load_session_laps(qualifying)
    qualifying_laps["Qualifying"] = (
        pd.to_timedelta(qualifying_laps["LapTime"]).dt.total_seconds() * 1000
    )
    qualifying_laps = qualifying_laps.dropna(subset=["Qualifying"]).reset_index(
        drop=True
    )

    fastest_qualifying_laps_row_indices = qualifying_laps.groupby("Driver")[
        "Qualifying"
    ].idxmin()
    fastest_qualifying_laps = qualifying_laps.loc[fastest_qualifying_laps_row_indices][
        ["Driver", "Qualifying"]
    ]
    fastest_qualifying_laps = fastest_qualifying_laps.reset_index(drop=True)

    return fastest_qualifying_laps


def load_fastest_practice_laps(practices):
    practice_laps = pd.DataFrame()

    for practice in practices:
        practice_laps = pd.concat(
            [practice_laps, fastf1_wrapper.load_session_laps(practice)]
        )

    practice_laps["Practice"] = (
        pd.to_timedelta(practice_laps["LapTime"]).dt.total_seconds() * 1000
    )
    practice_laps["Sector1"] = (
        pd.to_timedelta(practice_laps["Sector1Time"]).dt.total_seconds() * 1000
    )
    practice_laps["Sector2"] = (
        pd.to_timedelta(practice_laps["Sector2Time"]).dt.total_seconds() * 1000
    )
    practice_laps["Sector3"] = (
        pd.to_timedelta(practice_laps["Sector3Time"]).dt.total_seconds() * 1000
    )

    fastest_practice_laps = (
        practice_laps.groupby("Driver")
        .agg(
            {
                "Sector1": "min",
                "Sector2": "min",
                "Sector3": "min",
                "Practice": "min",
            }
        )
        .reset_index()
    )

    fastest_practice_laps["IdealPractice"] = (
        fastest_practice_laps["Sector1"]
        + fastest_practice_laps["Sector2"]
        + fastest_practice_laps["Sector3"]
    )

    return fastest_practice_laps


df = pd.DataFrame()

for event in events:
    round_number = event["event"]["RoundNumber"]

    fastest_qualifying_laps = load_fastest_quali_laps(event["qualifying"])
    fastest_qualifying_laps["RoundNumber"] = round_number

    fastest_practice_laps = load_fastest_practice_laps(event["practice_sessions"])
    fastest_practice_laps["RoundNumber"] = round_number

    event_data = pd.merge(
        fastest_qualifying_laps,
        fastest_practice_laps,
        on=["Driver", "RoundNumber"],
        how="left",
    )

    df = pd.concat([df, event_data])

## Correlating the data

To get an initial sense of the relationships in our dataset, we examine the correlation among the three metrics we have.

First, we compute the difference between each driver’s **Practice**, **IdealPractice** and **Qualifying** times to get **PracticeQualifyingDelta** and **IdealPracticeQualifyingDelta**. Large differences often indicate outliers, for example a driver crashing in practice and recording an unusually slow lap; so we filter those cases out before calculating correlations.

The results confirm a strong overall correlation, as expected. However, the **IdealPractice** time shows a slightly stronger relationship with qualifying performance than the raw **Practice** time. Thus, we will adopt the combined sector metric as our primary predictor for qualifying times moving forward.


In [None]:
df["PracticeQualifyingDelta"] = df["Practice"] - df["Qualifying"]
df["IdealPracticeQualifyingDelta"] = df["IdealPractice"] - df["Qualifying"]

print("Correlation in original data")
print(df[["Practice", "IdealPractice", "Qualifying"]].corr())

print("\nPractice correlation without outliers")
cleaned_lap = df[df["PracticeQualifyingDelta"].abs() < 2000]
print(cleaned_lap[["Practice", "Qualifying"]].corr())

print("\nIdealPractice correlation without outliers")
cleaned_sectors = df[df["IdealPracticeQualifyingDelta"].abs() < 2000]
print(cleaned_sectors[["IdealPractice", "Qualifying"]].corr())

df = cleaned_sectors

## Plotting the data

A box‑plot of **IdealPracticeQualifyingDelta** visualizes the distribution for each race weekend while hiding outliers. The plot reveals that, on average, practice laps are around 500ms slower than qualifying laps. However, the spread varies significantly between events, with some races showing much larger gaps, and a few (for example those with adverse qualifying weather) even having faster qualifying laps than practice laps, highlighting the influence of race‑specific conditions on lap‑time differences.


In [None]:
import plotly.graph_objects as go

delta_by_round_trace = go.Box(
    name="IdealPractice vs. Qualifying",
    x=df["RoundNumber"],
    y=df["IdealPracticeQualifyingDelta"],
    boxpoints="outliers",
    hovertext=df["Driver"],
)

layout = go.Layout(
    title="Ideal practice times (sum of fastest sectors) vs. Qualifying times",
    xaxis_title="Round",
    yaxis_title="Delta in MS",
    xaxis=dict(dtick=1),
    boxmode="group",
    height=800,
)

fig = go.Figure(data=[delta_by_round_trace], layout=layout)
fig.show()

Repeating the box‑plot, but aggregating the time differences by driver instead of by event, yields a noticeably tighter distribution. This indicates that venue‑specific factors (circuit layout, temperature, weather, etc.) have a larger impact on the offset than individual driver performance.

The median offset remains around 500ms, but the range is much smaller than when the data are grouped by race, confirming that the venue‑driven variation dominates the observed differences.


In [None]:
delta_by_driver_trace = go.Box(
    name="IdealPractice vs. Qualifying",
    x=df["Driver"],
    y=df["IdealPracticeQualifyingDelta"],
    boxpoints="outliers",
    hovertext=df["RoundNumber"],
)

fig = go.Figure(data=[delta_by_driver_trace], layout=layout)
fig.update_layout(xaxis_title="Driver")
fig.show()

To investigate differences in driver further, we compute a second metric: the gap between each driver’s qualifying lap and the pole‑sitter’s qualifying lap for the same event (**QualifyingPoleDelta**). Plotting both metrics together provides a clear visual comparison:

- Blue bars: mean **IdealPracticeQualifyingDelta** by driver
- Red line: mean **QualifyingPoleDelta** by driver

One could expect that the difference between practice and qualifying differs more for generally quicker drivers than for drivers at the back of the field. By sorting the columns of the bar chart descending, we can check this hypothesis; if true, the read line chart should resemble a straight line, steadily rising or falling.


In [None]:
best_per_round = df.groupby("RoundNumber")["Qualifying"].transform("min")
df["QualifyingPoleDelta"] = df["Qualifying"] - best_per_round

mean_deltas = (
    df.groupby("Driver")
    .agg(
        {
            "IdealPracticeQualifyingDelta": "mean",
            "QualifyingPoleDelta": "mean",
            "Qualifying": "mean",
        }
    )
    .reset_index()
)

mean_deltas = mean_deltas.sort_values("IdealPracticeQualifyingDelta", ascending=False)

mean_qualifying_delta_trace = go.Bar(
    name="IdealPractice vs. Qualifying",
    x=mean_deltas["Driver"],
    y=mean_deltas["IdealPracticeQualifyingDelta"],
)

mean_pole_delta_trace = go.Scatter(
    name="Qualifying vs. Pole lap",
    x=mean_deltas["Driver"],
    y=mean_deltas["QualifyingPoleDelta"],
    mode="lines+markers",
    line_color="red",
)

fig = go.Figure(
    data=[mean_qualifying_delta_trace, mean_pole_delta_trace], layout=layout
)
fig.show()

The red line, representing the average gap to the pole‑sitter, is noisy. It stays relatively high and flat toward the right‑hand side of the chart, indicating that drivers who make the smallest improvement from practice to qualifying are also the ones farthest from the leading pace. For the majority of the field, however, the red line shows no clear trend, making it difficult to draw firm conclusions.

Overall, the results suggest that a driver’s practice performance does not consistently predict their qualifying speed. Some drivers already display their true pace in practice, while others only reveal their full potential during qualifying, regardless of whether they are generally fast or slow.
