Single Race Exploratory Data Analysis -

The purpose of this notebook is to wrangle and explore data from the Fast-F1 API for a single race. This serves as a starting point to establish a clean workflow for data acquisition, preparation, and early exploration before scaling to a multi-race analysis. I will focus on building a data dictionary, assessing data quality, generating descriptive statistics, and creating preliminary visualizations to understand the structure and sufficiency of the data. The outcome of this notebook will be a reproducible workflow, baseline insights, and a foundation for future feature engineering and multi-race EDA.

The code below adds the parent directory to Python’s module search path and configures logging to suppress all **FastF1** logs below the warning level. This will enable subsequent code blocks that use imports to work seamlessly and keep my resulting code compilations clean and easy to read.

In [19]:
import sys
import os
import logging

# Add the root directory to sys.path
root = os.path.abspath("..")
sys.path.append(root)

# Suppress FastF1 info logs globally
logging.getLogger('fastf1').setLevel(logging.INFO)

In this section, I import Python libraries for data visualization, numerical analysis, and working with the Pandas DataFrames that structure much of the FastF1 API. I also import custom modules for accessing preprocessed F1 data and constants. To support exploration and cleaning, I configure Pandas display options to show all rows and columns, ensuring the full dataset is visible without truncation.

In [20]:
from src.data import f1data
from src.utils import f1constants

from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Set pandas display options
pd.set_option('display.max_rows', None)  # show all rows
pd.set_option('display.max_columns', None)  # show all columns


The following code initializes a single F1 race session by defining parameters such as year, location, and session type. These values are passed into the custom F1Session class (from f1data.py), which creates a session object built on top of FastF1. This object provides access to key race data and custom functions I’ve implemented. A driver can also be specified using their three-letter code for further analysis.

In [21]:
# Define session parameters
year = 2022
grand_prix = f1constants.F1Constants.LOCATIONS["Abu Dhabi"]
session_type = f1constants.F1Constants.SESSIONS["Q"]

# select driver by their three-letter code
driver = "LEC"

# Call session object
session = f1data.F1Session(year, grand_prix, session_type)

core           INFO 	Loading data for Abu Dhabi Grand Prix - Qualifying [v3.6.0]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['1', '11', '16', '55', '44', '63', '4', '31', '5', '3', '14', '22', '47', '18', '24', '20', '10', '77', '23', '6']


In [23]:
# Return third qualifying session only for a particular driver
q1, q2, q3 = session.get_laps(driver).split_qualifying_sessions()

q2

Unnamed: 0,Time,Driver,DriverNumber,LapTime,LapNumber,Stint,PitOutTime,PitInTime,Sector1Time,Sector2Time,Sector3Time,Sector1SessionTime,Sector2SessionTime,Sector3SessionTime,SpeedI1,SpeedI2,SpeedFL,SpeedST,IsPersonalBest,Compound,TyreLife,FreshTyre,Team,LapStartTime,LapStartDate,TrackStatus,Position,Deleted,DeletedReason,FastF1Generated,IsAccurate
40,0 days 00:40:31.394000,LEC,16,NaT,6.0,2.0,0 days 00:38:03.952000,NaT,NaT,0 days 00:00:54.923000,0 days 00:00:53.008000,NaT,0 days 00:39:38.446000,0 days 00:40:31.394000,256.0,200.0,223.0,202.0,False,SOFT,6.0,False,Ferrari,0 days 00:23:57.375000,2022-11-19 14:13:26.527,1,,False,,False,False
41,0 days 00:41:56.739000,LEC,16,0 days 00:01:25.345000,7.0,2.0,NaT,NaT,0 days 00:00:17.478000,0 days 00:00:36.900000,0 days 00:00:30.967000,0 days 00:40:48.872000,0 days 00:41:25.772000,0 days 00:41:56.739000,290.0,320.0,218.0,325.0,True,SOFT,7.0,False,Ferrari,0 days 00:40:31.394000,2022-11-19 14:30:00.546,1,,False,,False,True
42,0 days 00:43:42.074000,LEC,16,0 days 00:01:45.335000,8.0,2.0,NaT,0 days 00:43:40.965000,0 days 00:00:20.190000,0 days 00:00:44.982000,0 days 00:00:40.163000,0 days 00:42:16.929000,0 days 00:43:01.911000,0 days 00:43:42.074000,238.0,225.0,,247.0,False,SOFT,8.0,False,Ferrari,0 days 00:41:56.739000,2022-11-19 14:31:25.891,1,,False,,False,False
43,0 days 00:49:09.271000,LEC,16,NaT,9.0,3.0,0 days 00:46:41.741000,NaT,NaT,0 days 00:00:50.480000,0 days 00:00:47.826000,NaT,0 days 00:48:21.479000,0 days 00:49:09.451000,219.0,236.0,224.0,195.0,False,SOFT,1.0,True,Ferrari,0 days 00:43:42.074000,2022-11-19 14:33:11.226,1,,False,,False,False
44,0 days 00:50:33.788000,LEC,16,0 days 00:01:24.517000,10.0,3.0,NaT,NaT,0 days 00:00:17.316000,0 days 00:00:36.382000,0 days 00:00:30.819000,0 days 00:49:26.587000,0 days 00:50:02.969000,0 days 00:50:33.788000,291.0,320.0,219.0,325.0,True,SOFT,2.0,True,Ferrari,0 days 00:49:09.271000,2022-11-19 14:38:38.423,1,,False,,False,True
45,0 days 00:52:23.909000,LEC,16,0 days 00:01:50.121000,11.0,3.0,NaT,0 days 00:52:22.767000,0 days 00:00:22.795000,0 days 00:00:47.022000,0 days 00:00:40.304000,0 days 00:50:56.583000,0 days 00:51:43.605000,0 days 00:52:23.909000,217.0,218.0,,204.0,False,SOFT,3.0,True,Ferrari,0 days 00:50:33.788000,2022-11-19 14:40:02.940,1,,False,,False,False


In [8]:
# Get the fastest lap for the specified driver
fastest_lap = session.get_fastest_lap(driver)

# default scrollable display of dataframe
# fastest_lap

In [9]:
# Retrieve telemetry data for the fastest lap
telemetry_of_fastest = session.get_telemetry(fastest_lap)

# telemetry_of_fastest

In [10]:
# Retrieve car data for the specified driver's fastest lap
car_data = fastest_lap.get_car_data()

# car_data