# Driver and engine performance impact on F1 race results

Łukasz Filo, Klaudiusz Grobelski

## Problem formulation [0-5 pts]:

- is the problem clearly stated [1 pt]
- what is the point of creating model, are potential use cases defined [1 pt]
- where do data comes from, what does it containt [1 pt]
- DAG has been drawn [1 pt]
- confoundings (pipe, fork, collider) were described [1 pt]

In this notebook, we will develop a Bayesian multilevel binomial regression model to predict driver performance across recent Formula 1 seasons. Specifically, we will use data from the 2019–2024 seasons. The input data includes information about drivers, constructors, and engine suppliers used by each team. The primary objective of this analysis is to understand and predict how various factors—such as driver skill, team changes, and engine suppliers—affect driver performance over time. This model could assist teams in making strategic decisions, such as evaluating whether changes in drivers or engine suppliers might enhance performance. It may also be valuable to fans and analysts seeking to assess the relative impact of technical and human factors on race results.

We use historical race data sourced from FastF1, which includes finishing positions, lap times, and team-driver pairings for each race. Driver skill ratings are obtained from the EA Sports F1 game, while engine usage data (i.e., which power unit each constructor used in a given season) is collected from Wikipedia, covering the 2019–2024 period.

To gain a clearer understanding of the relationships between variables and to identify potential sources of bias, we will construct a Directed Acyclic Graph (DAG). The DAG will include nodes such as Driver Skill, Team, Engine, and Performance, and will help visualize dependencies and causal paths within the data.

In [None]:
from IPython.display import Image
Image("images/model_1_DAG.png")

In [None]:
Image("images/model_2_DAG.png")

DAGs illustrate the main types of confounding structures:

 - Forks: Present in both models where multiple variables (e.g., `α`, `constructor`, `β`, `engine`) influence the same downstream node (the linear model).

- Pipes: Clear in the chain `model` → `θ` → `y`, representing a direct causal path.

In [None]:
import numpy as np
from cmdstanpy import CmdStanModel
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# import logging
# logging.getLogger("cmdstanpy").setLevel(logging.WARNING)

## Data preprocessing [0-2 pts]:
- is preprocessing step clearly described [1 pt]
- reasoning and types of actions taken on the dataset have been described [1 pt]

In [None]:
df = pd.read_csv('data/processed_data/data.csv')
unique_drivers = df['DriverId'].unique()
driver_id_map = {driver: idx + 1 for idx, driver in enumerate(unique_drivers)}
df['DriverId'] = df['DriverId'].map(driver_id_map)
drivers = df['DriverId'].values

unique_team = df['TeamId'].unique()
team_id_map = {team: idx + 1 for idx, team in enumerate(unique_team)}
df['TeamId'] = df['TeamId'].map(team_id_map)
teams = df['TeamId'].values

unique_engine = df['Engine'].unique()
engine_id_map = {engine: idx + 1 for idx, engine in enumerate(unique_engine)}
df['Engine'] = df['Engine'].map(engine_id_map)
engines = df['Engine'].values

unique_season = df['Season'].unique()
season_id_map = {season: idx + 1 for idx, season in enumerate(unique_season)}
df['Season'] = df['Season'].map(season_id_map)
seasons = df['Season'].values

In [None]:
def standardize_group(group):
    mean = group['Rating'].mean()
    std = group['Rating'].std()
    group['Rating'] = (group['Rating'] - mean) / std
    return group


df = df.groupby('Season', group_keys=False, observed=True).apply(standardize_group)
ratings = df["Rating"].values
df['Position'] = df['Position'].astype(int)

In [None]:
order_col = ['DriverId', 'Rating', 'TeamId', 'Engine', 'Season','Position']
df = df[order_col]
df.head()

## Model [0-4 pts]
- are two different models specified [1 pt]
- are difference between two models explained [1 pt]
- is the difference in the models justified (e.g. does adding aditional parameter makes sense? ) [1 pt]
- are models sufficiently described (what are formulas, what are parameters, what data are required ) [1 pt]

### Model 1: Yearly driver rating and team performace.

## Priors [0-4 pts]
- Is it explained why particular priors for parameters were selected [1 pt]
- Have prior predictive checks been done for parameters (are parameters simulated from priors make sense) [1 pt]
- Have prior predictive checks been done for measurements (are measurements simulated from priors make sense) [1 pt]
- How prior parameters were selected [1 pt]

## The Prior tests were prepared for the best, average, and weakest driver.

### Model 1 PPC

In [None]:
model_1_ppc = CmdStanModel(stan_file='stan/model_1_ppc.stan')

In [None]:
def draw_plots_ppc_model_1(sigmas, driver_rating):
    fig, axes = plt.subplots(3, 4, figsize=(20, 15))
    colors = ["#130582", "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd"]

    for s_i in range(3):
        sigma = {'sigma': sigmas[s_i], 'driver_rating': driver_rating}
        model_1_ppc_sim = model_1_ppc.sample(data=sigma, iter_warmup=1, fixed_param=True, seed=25062025)
        
        axes[s_i, 0].hist(model_1_ppc_sim.stan_variable('alpha_driver').flatten(), bins=100, density=True, color=colors[1], alpha=0.8)
        axes[s_i, 0].set_yticks([])
        axes[s_i, 0].set_title(f'alpha_driver ~ Normal(0, {sigmas[s_i]})', fontweight='bold')

        axes[s_i, 1].hist(model_1_ppc_sim.stan_variable('constructor').flatten(), bins=100, density=True, color=colors[2], alpha=0.8)
        axes[s_i, 1].set_yticks([])
        axes[s_i, 1].set_title(f'constructor ~ Normal(0, {sigmas[s_i]})', fontweight='bold')

        axes[s_i, 2].hist(model_1_ppc_sim.stan_variable('theta').flatten(), bins=100, density=True, color=colors[4], alpha=0.8)
        axes[s_i, 2].set_yticks([])
        axes[s_i, 2].set_title('theta = constructor - alpha_driver * driver_rating', fontweight='bold')

        positions = model_1_ppc_sim.stan_variable('y_ppc').flatten() + 1
        n_bins = np.arange(22) - 0.5
        axes[s_i, 3].hist(positions, bins=n_bins, rwidth=0.85, density=True, color=colors[5], alpha=0.85, label="Simulated Positions")
        axes[s_i, 3].set_xticks(range(22))
        axes[s_i, 3].set_xlim([0, 21])
        axes[s_i, 3].set_yticks([])
        axes[s_i, 3].set_title('Position', fontweight='bold')

    for i in range(4):
        axes[2, i].set_xlabel(['alpha_driver', 'year_constructor', 'theta', 'Position'][i], fontsize=13, fontweight='bold')

    for ax_row in axes:
        for ax in ax_row:
            ax.tick_params(axis='both', which='major', labelsize=11)
            ax.spines['top'].set_visible(False)
            ax.spines['right'].set_visible(False)

    fig.suptitle("Prior Predictive Checks for Model 1", fontsize=18, fontweight='bold')
    fig.tight_layout(pad=2.0)
    plt.show()

In [None]:
sigmas = [0.8, 1, 1.2]

#### The driver with the best results.

In [None]:
driver_rating = 2.2

draw_plots_ppc_model_1(sigmas, driver_rating)

#### The driver with the average results.

In [None]:
driver_rating = 0

draw_plots_ppc_model_1(sigmas, driver_rating)

#### The driver with the weakest results.

In [None]:
driver_rating = -3

draw_plots_ppc_model_1(sigmas, driver_rating)

### Model 2 PPC

In [None]:
model_2_ppc = CmdStanModel(stan_file='stan/model_2_ppc.stan')

In [None]:
def draw_plots_ppc_model_2(sigmas, driver_rating):
    fig, axes = plt.subplots(3, 5, figsize=(24, 15))
    colors = ["#130582", "#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd"]

    for s_i in range(3):
        sigma = {'sigma': sigmas[s_i], 'driver_rating': driver_rating}
        model_2_ppc_sim = model_2_ppc.sample(data=sigma, iter_warmup=1, fixed_param=True, seed=25062025)

        axes[s_i, 0].hist(model_2_ppc_sim.stan_variable('engine').flatten(), bins=100, density=True, color=colors[0], alpha=0.8)
        axes[s_i, 0].set_yticks([])
        axes[s_i, 0].set_title(f'engine ~ Normal(0, {sigmas[s_i]})', fontweight='bold')

        axes[s_i, 1].hist(model_2_ppc_sim.stan_variable('alpha_driver').flatten(), bins=100, density=True, color=colors[1], alpha=0.8)
        axes[s_i, 1].set_yticks([])
        axes[s_i, 1].set_title(f'alpha_driver ~ Normal(0, {sigmas[s_i]})', fontweight='bold')

        axes[s_i, 2].hist(model_2_ppc_sim.stan_variable('year_constructor').flatten(), bins=100, density=True, color=colors[2], alpha=0.8)
        axes[s_i, 2].set_yticks([])
        axes[s_i, 2].set_title(f'year_constructor ~ Normal(0, {sigmas[s_i]})', fontweight='bold')

        axes[s_i, 3].hist(model_2_ppc_sim.stan_variable('theta').flatten(), bins=100, density=True, color=colors[4], alpha=0.8)
        axes[s_i, 3].set_yticks([])
        axes[s_i, 3].set_title('theta = engine + alpha_constructor_year \n - alpha_driver * driver_rating', fontweight='bold')

        positions = model_2_ppc_sim.stan_variable('y_ppc').flatten() + 1
        n_bins = np.arange(22) - 0.5
        axes[s_i, 4].hist(positions, bins=n_bins, rwidth=0.85, density=True, color=colors[5], alpha=0.85, label="Simulated Positions")
        axes[s_i, 4].set_xticks(range(22))
        axes[s_i, 4].set_xlim([0, 21])
        axes[s_i, 4].set_yticks([])
        axes[s_i, 4].set_title('Position', fontweight='bold')

    for i in range(5):
        axes[2, i].set_xlabel(['engine', 'alpha_driver', 'year_constructor', 'theta', 'Position'][i], fontsize=13, fontweight='bold')

    for ax_row in axes:
        for ax in ax_row:
            ax.tick_params(axis='both', which='major', labelsize=11)
            ax.spines['top'].set_visible(False)
            ax.spines['right'].set_visible(False)

    fig.suptitle("Prior Predictive Checks for Model 2", fontsize=18, fontweight='bold')
    fig.tight_layout(pad=2.0)
    plt.show()

In [None]:
sigmas = [0.8, 1, 1.2]

#### The driver with the best results.

In [None]:
driver_rating = 2.2

draw_plots_ppc_model_2(sigmas, driver_rating)

#### An average driver from the middle of the field.

In [None]:
driver_rating = 0

draw_plots_ppc_model_2(sigmas, driver_rating)

#### The driver with the worst results.

In [None]:
driver_rating = -3

draw_plots_ppc_model_2(sigmas, driver_rating)

We choose sigma 1.0 for our model

## Posterior analysis (model 1) [0-4 pts]
- were there any issues with the sampling? if there were what kind of ideas for mitigation were used [1 pt]
- are the samples from posterior predictive distribution analyzed [1 pt]
- are the data consistent with posterior predictive samples and is it sufficiently commented (if they are not then is the justification provided)
have parameter marginal disrtibutions been analyzed (histograms of individual parametes plus summaries, are they diffuse or concentrated, what can we say about values) [1 pt]

In [None]:
model_1 = CmdStanModel(stan_file='stan/model_1.stan')

In [None]:
model_1_data = {'N': len(df),
                'C': len([*team_id_map.values()]),
                'D': len([*driver_id_map.values()]),
                'driver_rating': ratings,
                'constructor': teams,                
                'driver': drivers,
                'position': df['Position'] - 1} 

model_1_fit = model_1.sample(data=model_1_data, seed=25062025, iter_warmup=1000)

In [None]:
drivers_names = ['hamilton', 'norris', 'leclerc']
fig, axes = plt.subplots(1, len(drivers_names), figsize=(5 * len(drivers_names), 4), sharey=True)

n_bins = np.arange(22) - 0.5

for d_i, d_name in enumerate(drivers_names):
    ax = axes[d_i]
    driver_id = driver_id_map[d_name]
    results = df[df['DriverId'] == driver_id]
    results_idx = results.index

    ax.hist((results['Position']).tolist(),
            bins=n_bins,
            rwidth=0.9,
            histtype='step',
            edgecolor='black',
            density=True,
            label='Observed')

    ax.hist(model_1_fit.stan_variable('y_hat').T[results_idx].flatten() + 1,
            bins=n_bins,
            rwidth=0.9,
            color='cornflowerblue',
            edgecolor='royalblue',
            alpha=0.7,
            density=True,
            label='Simulated')

    ax.set_xticks(range(22))
    ax.set_xlim([0, 21])
    ax.set_yticks([])
    ax.set_title(d_name.upper() + '\nfinishing positions (2020-2024)', fontsize=11)
    ax.set_xlabel('Position')
    ax.legend(loc='upper right', fontsize=8)

fig.tight_layout()
plt.show()

## Posterior analysis (model 2) [0-4 pts]
- were there any issues with the sampling? if there were what kind of ideas for mitigation were used [1 pt]
- are the samples from posterior predictive distribution analyzed [1 pt]
 are the data consistent with posterior predictive samples and is it sufficiently commented (if they are not then is the justification provided)
have parameter marginal disrtibutions been analyzed (histograms of individual parametes plus summaries, are they diffuse or concentrated, what can we say about values) [1 pt]

In [None]:
model_2 = CmdStanModel(stan_file='stan/model_2.stan')

In [None]:
model_2_data = {'N': len(df),
                'C': len([*team_id_map.values()]),
                'E': len([*engine_id_map.values()]),
                'D': len([*driver_id_map.values()]),
                'Y': len([*season_id_map.values()]),
                'driver_rating': ratings,
                'engine': engines,
                'constructor': teams,                
                'driver': drivers,
                'year': seasons,
                'position': df['Position'] - 1} 

model_2_fit = model_2.sample(data=model_2_data, seed=25062025,iter_warmup=1000)

In [None]:
drivers_names = ['hamilton', 'norris', 'max_verstappen']
fig, axes = plt.subplots(1, len(drivers_names), figsize=(5 * len(drivers_names), 4), sharey=True)

n_bins = np.arange(22) - 0.5

for d_i, d_name in enumerate(drivers_names):
    ax = axes[d_i]
    driver_id = driver_id_map[d_name]
    results = df[df['DriverId'] == driver_id]
    results_idx = results.index

    ax.hist((results['Position']).tolist(),
            bins=n_bins,
            rwidth=0.9,
            histtype='step',
            edgecolor='black',
            density=True,
            label='Observed')

    ax.hist(model_2_fit.stan_variable('y_hat').T[results_idx].flatten() + 1,
            bins=n_bins,
            rwidth=0.9,
            color='cornflowerblue',
            edgecolor='royalblue',
            alpha=0.7,
            density=True,
            label='Simulated')

    ax.set_xticks(range(22))
    ax.set_xlim([0, 21])
    ax.set_yticks([])
    ax.set_title(d_name.upper() + '\nfinishing positions (2020–2024)', fontsize=11)
    ax.set_xlabel('Position')
    ax.legend(loc='upper right', fontsize=8)

fig.tight_layout()
plt.show()

## Model comaprison [0-4 pts]
- Have models been compared using information criteria [1 pt]
- Have result for WAIC been discussed (is there a clear winner, or is there an overlap, were there any warnings) [1 pt]
- Have result for PSIS-LOO been discussed (is there a clear winner, or is there an overlap, were there any warnings) [1 pt]
- Whas the model comparison discussed? Do authors agree with information criteria? Why in your opinion one model better than another [1 pt]