# 01: Drivers vs. Teammate


## Motivation

In Formula 1, one of the best indicators of a driver’s performance is how he fares against his teammate. Modern F1 teams each field two drivers who share essentially identical machinery. While inter‑team comparisons are complicated by differing car performance, intra‑team comparisons are far more straightforward and reliable. If a driver consistently finishes ahead of his teammate in equal machinery, he is very likely the stronger driver.


## Method

To assess how each driver measures up against his teammates over a career, we first compute a **WinRatio** (wins ÷ total head‑to‑head races). However, a 50 % win‑ratio is far more impressive for a driver with only a handful of races than for one with hundreds. Therefore, we also factor in the **NumberOfRaces** and display both metrics on a scatter plot, visualizing **WinRatio** in the context of sample size, highlighting drivers who consistently outperform their teammates across many races.


## Gathering data

Thanks to a database wrapper and a prepared SQL query, we can easily obtain an up‑to‑date table of every race result in F1 history. The query returns one row for each driver‑teammate pairing in every Grand Prix they entered. Each row includes information on:

- race
- team
- driver
- teammate
- result (`win`, `loss` or `draw`) indicating whether the driver finished _higher_, _lower_, or _tied_ with his teammate.
- (if a teammate DNFs, the race is recorded as a `win` for the driver; if both DNF, it’s recorded as a `draw`)

**Example**

If, in a particular race, **Verstappen** finishes first and his teammate **Pérez** finishes second, the result set will contain two rows:

| RaceID | Team     | Driver     | Teammate   | Result |
| ------ | -------- | ---------- | ---------- | ------ |
| 1036   | Red Bull | Verstappen | Pérez      | win    |
| 1036   | Red Bull | Pérez      | Verstappen | loss   |


In [None]:
import pandas as pd

from f1_analysis.f1db import F1DB

f1db = F1DB()

df = pd.DataFrame(f1db.execute_sql_query("driver_vs_teammate_h2h.sql"))
df.columns = [
    "RaceId",
    "Year",
    "ConstructorId",
    "ConstructorName",
    "DriverId",
    "DriverAbbreviation",
    "DriverName",
    "TeammateId",
    "TeammateAbbreviation",
    "TeammateName",
    "Result",
]

## Processing the data

To compute each driver’s **WinRatio**, we first group the data by **DriverId**. For each driver, we calculate:

| Metric               | Definition                                                           |
| -------------------- | -------------------------------------------------------------------- |
| **NumberOfWins**     | Count of rows where `Result = 'win'`                                 |
| **NumberOfMatchups** | Total number of rows for the driver (each teammate‑race combination) |
| **NumberOfRaces**    | Count of distinct races the driver actually participated in          |

The distinction between **NumberOfMatchups** and **NumberOfRaces** matters because, in some eras of Formula 1, a team fielded more than two drivers. Counting every row would inflate the race total, since a single race can generate multiple matchup rows for the same driver.

### Example

In race 1, driver **Fagioli** faces three teammates — **Farina**, **Parnell**, and **Fangio**. The result set contains three rows for Fagioli:

| RaceID | Team       | Driver  | Teammate | Result |
| ------ | ---------- | ------- | -------- | ------ |
| 1      | Alfa Romeo | Fagioli | Farina   | loss   |
| 1      | Alfa Romeo | Fagioli | Parnell  | win    |
| 1      | Alfa Romeo | Fagioli | Fangio   | win    |

- **NumberOfMatchups** = 3 (one row per teammate)
- **NumberOfWins** = 2
- **WinRatio** = 2 / 3 ≈ 0.66
- **NumberOfRaces** = 1

When counting **NumberOfRaces**, all three rows correspond to a single race, so they contribute **1** to the race total. This ensures the win ratio reflects performance per matchup, while the race count reflects actual participation.


In [None]:
df = (
    df.groupby("DriverId")
    .agg(
        DriverName=("DriverName", "first"),
        NumberOfMatchups=("DriverId", "count"),
        NumberOfWins=("Result", lambda r: (r == "win").sum()),
        NumberOfRaces=("RaceId", pd.Series.nunique),
    )
    .reset_index()
)

df["WinRatio"] = (df["NumberOfWins"] / df["NumberOfMatchups"]).round(4)

## Plotting the data

To get a quick overview, we plot the data on a scatter chart.


In [None]:
import plotly.graph_objects as go

all_drivers = go.Scatter(
    name="Drivers",
    x=df["NumberOfRaces"],
    y=df["WinRatio"],
    text=df["DriverName"],
    mode="markers",
)

layout = go.Layout(
    xaxis_title="Number of races",
    yaxis_title="Win ratio",
    xaxis=dict(dtick=25),
    yaxis=dict(dtick=0.1),
    yaxis_range=[-0.1, 1],
    height=800,
)

go.Figure(data=[all_drivers], layout=layout).show()

As expected, several notable drivers appear as outliers in the upper‑right corner, having competed in many races and achieved above‑average win ratios.

We also see that drivers from more recent eras cluster toward the right, reflecting the steady increase in the number of races per season.

It is, however, difficult to identify individual drivers; or those with consistently strong track records, so let us refine the plot.


## Refining the data

First, we exclude drivers who have started fewer than 25 races. A modern Formula 1 season typically includes about 25 events, so drivers with fewer starts provide limited insight into overall trends.

Next, we highlight drivers who have won at least one race. A race win is a notable achievement, and comparing the **WinRatio** of race winners to that of the broader driver pool could reveal performance differences.

Finally, we highlight world champions. Only a small number of drivers have secured a championship, so we expect them to cluster in the upper‑right region of the plot. It will also be interesting to see whether any champions appear among the more centrally located, average‑performing drivers.


In [None]:
race_winner_list = (
    pd.DataFrame(
        f1db.execute_raw_sql_query(
            "SELECT driver_id FROM race_result WHERE position_number = 1"
        )
    )[0]
    .unique()
    .tolist()
)

world_champion_list = (
    pd.DataFrame(
        f1db.execute_raw_sql_query(
            "SELECT driver_id FROM season_driver WHERE position_number = 1 AND year != strftime('%Y', 'now')"
        )
    )[0]
    .unique()
    .tolist()
)

seasoned_drivers = df[df["NumberOfRaces"] > 24]

race_winners = seasoned_drivers[seasoned_drivers["DriverId"].isin(race_winner_list)]
world_champions = seasoned_drivers[
    seasoned_drivers["DriverId"].isin(world_champion_list)
]

seasoned_drivers_scatter = go.Scatter(
    name="Seasoned Drivers",
    x=seasoned_drivers["NumberOfRaces"],
    y=seasoned_drivers["WinRatio"],
    text=seasoned_drivers["DriverName"],
    mode="markers",
)

race_winners_scatter = go.Scatter(
    name="Race Winners",
    x=race_winners["NumberOfRaces"],
    y=race_winners["WinRatio"],
    text=race_winners["DriverName"],
    mode="markers",
    marker=dict(color="orange"),
)

world_champions_scatter = go.Scatter(
    name="World Champions",
    x=world_champions["NumberOfRaces"],
    y=world_champions["WinRatio"],
    text=world_champions["DriverName"],
    mode="markers",
    marker=dict(color="red"),
)

go.Figure(
    data=[seasoned_drivers_scatter, race_winners_scatter, world_champions_scatter],
    layout=layout,
).show()