**This project builds an F1 Tyre Degradation & Pit Strategy Decision Model using real Formula 1 data.**

The objective is not to predict who wins a race.
Instead, the goal is to support race strategy decisions, specifically:

 1 Should a team pit or stay out?

 2 How much time is lost due to tyre degradation?

 3 How much time is lost due to a pit stop?

 4 At what point does pitting become worthwhile?

To answer this, we combine two independent real-world data sources:

FastF1 data → to model tyre degradation

ERGAST historical data → to estimate pit stop time loss

This mirrors how real F1 strategy teams combine data from multiple systems.

In [1]:
import pandas as pd

lap_times = pd.read_csv('/content/lap_times.csv')
pit_stops = pd.read_csv('/content/pitstops.csv')

lap_times.head()

Unnamed: 0,season,round,lapNumber,driverId,position,time
0,1996,1,1,villeneuve,1,1:43.702
1,1996,1,1,damon_hill,2,1:44.243
2,1996,1,1,irvine,3,1:44.981
3,1996,1,1,michael_schumacher,4,1:45.188
4,1996,1,1,alesi,5,1:46.506


DATASET 2 (FastF1)

In [2]:
!pip install fastf1


Collecting fastf1
  Downloading fastf1-3.7.0-py3-none-any.whl.metadata (5.2 kB)
Collecting rapidfuzz (from fastf1)
  Downloading rapidfuzz-3.14.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (12 kB)
Collecting requests-cache>=1.0.0 (from fastf1)
  Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Collecting signalrcore (from fastf1)
  Downloading signalrcore-0.9.5-py3-none-any.whl.metadata (10 kB)
Collecting timple>=0.1.6 (from fastf1)
  Downloading timple-0.1.8-py3-none-any.whl.metadata (2.0 kB)
Collecting cattrs>=22.2 (from requests-cache>=1.0.0->fastf1)
  Downloading cattrs-25.3.0-py3-none-any.whl.metadata (8.4 kB)
Collecting url-normalize>=1.4 (from requests-cache>=1.0.0->fastf1)
  Downloading url_normalize-2.2.1-py3-none-any.whl.metadata (5.6 kB)
Collecting websocket-client==1.0.0 (from signalrcore->fastf1)
  Downloading websocket_client-1.0.0-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting msgpack==1.0.2 (from signalrcore->fastf1)
  Dow

In [3]:
import os
os.makedirs('/content/fastf1_cache', exist_ok=True)

Loading Real Formula 1 Race Data (FastF1)

In this step, we load real Formula 1 lap-by-lap race data using the FastF1 Python library.

FastF1 provides access to official F1 timing feeds and allows us to retrieve:

 - Lap times

 - Tyre compounds

 - Stint information

 - Pit in / pit out laps

 - Track status (green flag, safety car, etc.)

This dataset forms the core input for modelling tyre degradation and race strategy.

In [4]:
import fastf1

fastf1.Cache.enable_cache('/content/fastf1_cache')

session = fastf1.get_session(2023, 'Bahrain', 'R')
session.load()

laps = session.laps
laps.head()


core           INFO 	Loading data for Bahrain Grand Prix - Race [v3.7.0]
INFO:fastf1.fastf1.core:Loading data for Bahrain Grand Prix - Race [v3.7.0]
req            INFO 	No cached data found for session_info. Loading data...
INFO:fastf1.fastf1.req:No cached data found for session_info. Loading data...
_api           INFO 	Fetching session info data...
INFO:fastf1.api:Fetching session info data...
req            INFO 	Data has been written to cache!
INFO:fastf1.fastf1.req:Data has been written to cache!
req            INFO 	No cached data found for driver_info. Loading data...
INFO:fastf1.fastf1.req:No cached data found for driver_info. Loading data...
_api           INFO 	Fetching driver list...
INFO:fastf1.api:Fetching driver list...
req            INFO 	Data has been written to cache!
INFO:fastf1.fastf1.req:Data has been written to cache!
req            INFO 	No cached data found for session_status_data. Loading data...
INFO:fastf1.fastf1.req:No cached data found for session_status_d

Unnamed: 0,Time,Driver,DriverNumber,LapTime,LapNumber,Stint,PitOutTime,PitInTime,Sector1Time,Sector2Time,...,FreshTyre,Team,LapStartTime,LapStartDate,TrackStatus,Position,Deleted,DeletedReason,FastF1Generated,IsAccurate
0,0 days 01:04:15.902000,VER,1,0 days 00:01:39.019000,1.0,1.0,NaT,NaT,NaT,0 days 00:00:42.414000,...,False,Red Bull Racing,0 days 01:02:36.652000,2023-03-05 15:03:38.501,12,1.0,False,,False,False
1,0 days 01:05:53.876000,VER,1,0 days 00:01:37.974000,2.0,1.0,NaT,NaT,0 days 00:00:31.342000,0 days 00:00:42.504000,...,False,Red Bull Racing,0 days 01:04:15.902000,2023-03-05 15:05:17.751,12,1.0,False,,False,True
2,0 days 01:07:31.882000,VER,1,0 days 00:01:38.006000,3.0,1.0,NaT,NaT,0 days 00:00:31.388000,0 days 00:00:42.469000,...,False,Red Bull Racing,0 days 01:05:53.876000,2023-03-05 15:06:55.725,1,1.0,False,,False,True
3,0 days 01:09:09.858000,VER,1,0 days 00:01:37.976000,4.0,1.0,NaT,NaT,0 days 00:00:31.271000,0 days 00:00:42.642000,...,False,Red Bull Racing,0 days 01:07:31.882000,2023-03-05 15:08:33.731,1,1.0,False,,False,True
4,0 days 01:10:47.893000,VER,1,0 days 00:01:38.035000,5.0,1.0,NaT,NaT,0 days 00:00:31.244000,0 days 00:00:42.724000,...,False,Red Bull Racing,0 days 01:09:09.858000,2023-03-05 15:10:11.707,1,1.0,False,,False,True


We will preserve raw data and work on a clean version.

In [5]:
laps_raw = laps.copy()


KEEP ONLY ACCURATE LAPS

In [6]:
laps_clean = laps_raw[laps_raw['IsAccurate'] == True]


REMOVE PIT IN / PIT OUT LAPS

How lap time increases because of tyre wear, not traffic or safety car.”

So we must keep ONLY:
 - Normal racing laps
 - Same tyre stint
 - Accurate timing

If we don’t remove these → model will be wrong.

In [7]:
laps_clean = laps_clean[
    laps_clean['PitInTime'].isna() &
    laps_clean['PitOutTime'].isna()
]

REMOVE SAFETY CAR / VSC LAPS

In [8]:
laps_clean = laps_clean[laps_clean['TrackStatus'] == '1']


REMOVE DELETED LAPS

In [10]:
laps_clean = laps_clean[laps_clean['Deleted'] == False]

KEEP ONLY LAPS WITH LAPTIME

In [11]:
laps_clean = laps_clean[laps_clean['LapTime'].notna()]


Lap in Stint Feature

Tyre degradation depends on how many laps a tyre has already completed, not the absolute lap number of the race. To capture tyre age accurately, a new feature LapInStint is created, which counts the number of laps completed within each stint for a driver.

This variable acts as a direct proxy for tyre wear and is a key independent variable in the tyre degradation model.

In [12]:
# Sort laps chronologically within each stint
laps_clean = laps_clean.sort_values(
    ['Driver', 'Stint', 'LapNumber']
)

# Create lap count within each stint (tyre age)
laps_clean['LapInStint'] = (
    laps_clean.groupby(['Driver', 'Stint'])
    .cumcount() + 1
)


Lap time is stored as a timedelta (e.g. 0 days 00:01:38.006000), which is not directly usable in most statistical and machine-learning models.
Therefore, lap time is converted into total seconds to create a numeric, modelling-ready variable.

In [14]:
# Convert lap time from timedelta to seconds
laps_clean['LapTimeSeconds'] = (
    laps_clean['LapTime'].dt.total_seconds()
)


I cleaned raw F1 lap-level timing data by removing pit laps, safety-car laps, inaccurate laps, and engineered a lap-in-stint feature to model tyre degradation

TYRE DEGRADATION MODEL (CORE MODEL)

How much slower does the tyre get per lap?

In [17]:
from sklearn.linear_model import LinearRegression

soft = laps_clean[laps_clean['Compound'] == 'SOFT']

X = soft[['LapInStint']]
y = soft['LapTimeSeconds']

model_soft = LinearRegression()
model_soft.fit(X, y)

soft_deg = model_soft.coef_[0]
soft_base = model_soft.intercept_

soft_deg, soft_base


(np.float64(-0.024426283503507857), np.float64(99.14554565419044))

In this step, we build the core tyre degradation model of the project.

The objective is to quantify how tyre performance deteriorates over time by estimating:

The base lap pace on a fresh tyre

The rate at which lap time increases as the tyre wears

This model allows us to translate raw lap data into actionable strategy insights.

How many seconds slower does a tyre get for every additional lap it is used?

In [18]:
results = []

for tyre in ['SOFT','MEDIUM','HARD']:
    data = laps_clean[laps_clean['Compound'] == tyre]
    X = data[['LapInStint']]
    y = data['LapTimeSeconds']

    model = LinearRegression().fit(X, y)

    results.append({
        'Tyre': tyre,
        'BasePace_sec': model.intercept_,
        'Degradation_sec_per_lap': model.coef_[0]
    })

import pandas as pd
degradation_results = pd.DataFrame(results)
degradation_results


Unnamed: 0,Tyre,BasePace_sec,Degradation_sec_per_lap
0,SOFT,99.145546,-0.024426
1,MEDIUM,96.941607,0.235393
2,HARD,98.202934,0.00743


In [20]:
degradation_results.to_csv('tyre_degradation_results.csv', index=False)


In this step, we load historical Formula 1 pit stop timing data from the ERGAST dataset.

This dataset contains raw records of pit stop durations across multiple races and seasons, measured in milliseconds.

The purpose of loading this data is to estimate the time cost of a pit stop, which is a crucial input for evaluating race strategy decisions.

In [21]:
pit_stops = pd.read_csv('/content/pitstops.csv')
pit_stops.head()

Unnamed: 0,season,round,driverId,lap,stop,time,duration
0,2011,1,alguersuari,1,1,17:05:23,26.898
1,2011,1,michael_schumacher,1,1,17:05:52,25.021
2,2011,1,webber,11,1,17:20:48,23.426
3,2011,1,alonso,12,1,17:22:34,23.251
4,2011,1,massa,13,1,17:24:10,23.842


Convert duration to seconds (because formats are mixed)

Our duration column contains:

26.898 → seconds

16:44.718 → minutes:seconds.milliseconds

In [32]:
def duration_to_seconds(x):
    x = str(x)
    if ':' in x:
        m, s = x.split(':')
        return int(m) * 60 + float(s)
    else:
        return float(x)

pit_stops['duration_seconds'] = pit_stops['duration'].apply(duration_to_seconds)


Calculate Average Pit Stop Loss (Strategy Cost)

In [33]:
PIT_LOSS = pit_stops['duration_seconds'].mean()
PIT_LOSS


np.float64(85.23049555887785)

PIT STRATEGY DECISION MODEL
QUESTION

“Pit now or stay out?”

Logic

Staying out → tyre degrades

Pitting → lose PIT_LOSS but reset tyre

In [34]:
def total_time(stint_laps, base, deg):
    return sum(base + deg * lap for lap in range(1, stint_laps+1))

# Example: 20-lap stint on soft
stay_out_time = total_time(20, soft_base, soft_deg)

# Pit after 10 laps
pit_time = (
    total_time(10, soft_base, soft_deg) +
    PIT_LOSS +
    total_time(10, soft_base, soft_deg)
)

stay_out_time, pit_time


(np.float64(1977.7813935480722), np.float64(2065.454517457301))

In this step, we compare alternative race strategies using the outputs of:

The tyre degradation model

The pit stop time loss estimation

Rather than predicting race results, this analysis focuses on decision-making under trade-offs.

“Given tyre degradation and pit stop cost, is it faster to pit or to stay out?”

In [35]:
strategy = pd.DataFrame({
    'Strategy': ['No Pit','One Pit'],
    'TotalTime_sec': [stay_out_time, pit_time]
})

strategy


Unnamed: 0,Strategy,TotalTime_sec
0,No Pit,1977.781394
1,One Pit,2065.454517


In [36]:
import os

os.listdir('/content')


['.config',
 'pitstops.csv',
 'lap_times.csv',
 'fastf1_cache',
 'circuits.csv',
 'tyre_degradation_results.csv',
 'races.csv',
 'drivers.csv',
 'sample_data']

**Conclusion: Strategy Decision Insight**

Based on the tyre degradation and pit stop cost models, the No-Pit strategy produced a lower total race time than the One-Pit strategy for the modeled scenario.

**Key Result**
Strategy	Total Time (sec)
No Pit	1977.78
One Pit	2065.45
Interpretation of the Result

This result indicates that, under the assumptions used in this model:

Tyre degradation on the selected compound was gradual

The cumulative time lost due to tyre wear was less than the time lost during a pit stop

As a result, taking a pit stop was not justified from a total-time perspective

In practical terms, the performance gain from fresh tyres did not outweigh the pit stop time loss.

**Business Insight**

This finding highlights an important strategy principle:

A pit stop should only be taken when the time saved through reduced tyre degradation exceeds the time lost during the pit stop

Strategy decisions should be based on quantified trade-offs rather than intuition

This mirrors the type of decision logic used by professional Formula 1 strategy teams.

**Scope and Assumptions**

This conclusion is specific to the modeled scenario and depends on:

Estimated tyre degradation rates

Average pit stop time loss

Selected tyre compounds

Absence of external race events such as safety cars

The model is intentionally simplified to establish a clear baseline decision framework.

**Final Takeaway**

In the baseline scenario analyzed, remaining on track without pitting was the optimal strategy, as tyre degradation alone did not justify the time cost of a pit stop.