In [1]:
import fastf1
from utils import *
from plots import *

# Introduction
The aim of this project was to investigate and possibly improve some of the existing visualisation techniques used in F1. The raw data was retrieved using the FastF1 Python package, although the data was not always accurate, this was sufficient for this project which is more focused on visualisation than accuracy. Pandas dataframes, which are compatible with FastF1 formats, were used to manage the data. 

Plotly was used for the visualisation, in particular plotly.graph_objects, which offers a fair amount of customisation that plotly express would not, as it is suitable for very standard plots with a few lines of code. The mindset for creating the plots took into account the concept explained during the course and the end users who will be using them (ordinary people interested in the sport, engineers, drivers).

Finally, Dash was used to build an interactive app with all the plots, using the data from the API in real time. Unfortunately, this has the disadvantage that the application becomes very slow after some use because the API has a timeout after a certain number of requests. 

To make this report easy to read, I will describe each plot, show the shape of the data behind it, and finally show the plot with the selected data.

## 1 - Choropleth Maps: Races per Country
This and the next plot where the first plots were made to take confidence with plotly, the choropleth map was used to show the nations that hosted the races from 1980 to 2024.
The colour was adjusted from white for the least present nations to blue for the nation that hosted more races. Hovering over each nation shows the exact number of races and the name of the nation.

The data has been retrieved from the API and stored in a csv file. 

The visualisation has been designed for the general public who do not need much knowledge to understand it.

In [2]:
df = get_country_counts_ISO(1980, 2024)
df

Unnamed: 0,Country,Count,ISO
0,Italy,76,ITA
1,Germany,52,DEU
2,Spain,50,ESP
3,Brazil,44,BRA
4,Monaco,44,MCO
5,Belgium,43,BEL
7,Canada,41,CAN
8,Hungary,39,HUN
9,Australia,38,AUS
10,Japan,38,JPN


<img src="static_plots/1_races_per_country.png" alt="TyreCompounds" width="1000"/>

## 2 - Bar Plot: Wins by Team
This graph shows the cumulative wins of each team from 1980 to 2024 (although API has some missing data from 1991 to 2017).  The x-axis shows the name of the team and the y-axis shows the number of wins. The bars have been sorted from highest to lowest and the value of the wins has been inserted within the bars to make them easier to read. The colour of each bar corresponds to the colour of each team extracted from the API; where colours are missing, the default colour is white.

This graph has also been created for the general public


In [4]:
race_winners_df = pd.read_csv('data/race_winners_1980_to_2024.csv', sep=',', encoding='utf-8')
race_winners_df

Unnamed: 0,Season,Race,Winner,Team
0,1980,Argentine Grand Prix,Alan Jones,Williams
1,1980,Brazilian Grand Prix,René Arnoux,Renault
2,1980,South African Grand Prix,René Arnoux,Renault
3,1980,United States Grand Prix West,Nelson Piquet,Brabham
4,1980,Belgian Grand Prix,Didier Pironi,Ligier
...,...,...,...,...
300,2024,Emilia Romagna Grand Prix,Max Verstappen,Red Bull Racing
301,2024,Monaco Grand Prix,Charles Leclerc,Ferrari
302,2024,Canadian Grand Prix,Max Verstappen,Red Bull Racing
303,2024,Spanish Grand Prix,Max Verstappen,Red Bull Racing


<img src="static_plots/2_wins_by_team.png" alt="TyreCompounds" width="1000"/>

## 2 - Scatter plot (line mode): Telemetry Data
This graph is more technical, shows the performance on the fastest lap of two drivers. The x-axis is the distance travelled by the car and the y-axis is the speed in km/h. This is a very common graph in the F1 world, so to improve the standard visualisation I have added another channel representing all the corners, labelled as C1, C2, etc.
Then I added a custom hover to the lines, showing the exact value at each point and the name of the driver. Finally, a very simple legend shows the abbreviated name of the driver with his line colour, which is obtained from the team colour of from the API.

The next one is the same concept but instead of the speed it shows the throttle percentage on the y-axis.

This graph is designed for the more advanced sports enthusiast, engineers and drivers for a post session review.

In [6]:
#Loading the data for the 2024 Monaco Grand Prix Qualifying session from the API
fastf1.plotting.setup_mpl(misc_mpl_mods=False)
session = fastf1.get_session(2024, 'Monaco Grand Prix', 'Q')
session.load()

core           INFO 	Loading data for Monaco Grand Prix - Qualifying [v3.3.5]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['16', '81', '55', '4', '63', '1', '44', '22', '23', '10', '31', '27', '3', '18', '20', '14', '2', '11', '77', '24']


In [7]:
#Getting the fastest lap of each selected driver
driver1_laps = session.laps.pick_driver("LEC").pick_quicklaps().reset_index()
driver2_laps = session.laps.pick_driver("PIA").pick_quicklaps().reset_index()
driver1_laps.head()

Unnamed: 0,index,Time,Driver,DriverNumber,LapTime,LapNumber,Stint,PitOutTime,PitInTime,Sector1Time,...,FreshTyre,Team,LapStartTime,LapStartDate,TrackStatus,Position,Deleted,DeletedReason,FastF1Generated,IsAccurate
0,1,0 days 00:18:50.588000,LEC,16,0 days 00:01:12.839000,2.0,1.0,NaT,NaT,0 days 00:00:19.297000,...,True,Ferrari,0 days 00:17:37.749000,2024-05-25 14:04:42.432,1,,False,,False,True
1,3,0 days 00:21:40.073000,LEC,16,0 days 00:01:12.452000,4.0,1.0,NaT,NaT,0 days 00:00:18.794000,...,True,Ferrari,0 days 00:20:27.621000,2024-05-25 14:07:32.304,1,,False,,False,True
2,6,0 days 00:26:30.492000,LEC,16,0 days 00:01:11.653000,7.0,2.0,NaT,NaT,0 days 00:00:18.702000,...,False,Ferrari,0 days 00:25:18.839000,2024-05-25 14:12:23.522,12,,False,,False,True
3,8,0 days 00:29:21.868000,LEC,16,0 days 00:01:11.584000,9.0,2.0,NaT,NaT,0 days 00:00:18.597000,...,False,Ferrari,0 days 00:28:10.284000,2024-05-25 14:15:14.967,1,,False,,False,True
4,11,0 days 00:41:19.188000,LEC,16,0 days 00:01:11.356000,12.0,3.0,NaT,NaT,0 days 00:00:18.703000,...,True,Ferrari,0 days 00:40:07.832000,2024-05-25 14:27:12.515,1,,False,,False,True


In [8]:
#Retrieving the telemetry data for the first driver
driver1_telemetry = driver1_laps.get_car_data().add_distance()
driver1_telemetry.head()

Unnamed: 0,Date,RPM,Speed,nGear,Throttle,Brake,DRS,Source,Time,SessionTime,Distance
0,2024-05-25 14:04:42.652,11135,277,7,100,False,12,car,0 days 00:00:00.220000,0 days 00:17:37.969000,16.927778
1,2024-05-25 14:04:43.132,11245,282,7,100,False,8,car,0 days 00:00:00.700000,0 days 00:17:38.449000,54.527778
2,2024-05-25 14:04:43.453,10935,282,7,100,True,8,car,0 days 00:00:01.021000,0 days 00:17:38.770000,79.672778
3,2024-05-25 14:04:43.853,10032,252,7,3,True,8,car,0 days 00:00:01.421000,0 days 00:17:39.170000,107.672778
4,2024-05-25 14:04:44.093,10485,212,6,1,True,8,car,0 days 00:00:01.661000,0 days 00:17:39.410000,121.806111


<img src="static_plots/3_fastest_lap.png" alt="TyreCompounds" width="1000"/>

Same concept as the previous graph, but with the throttle percentage on the y-axis (made specifically for use in the interactive dashboard).

<img src="static_plots/4_fastest_lap_throttle_speed.png" alt="TyreCompounds" width="1000"/>

## 3 - Bar Plot with negative values: Speed difference between two drivers
This graph was an experiment in visualisation using the bar graph with negative values. The x-axis shows all the corner numbers of the track, while the y-axis shows the speed difference (positive or negative) between the reference driver and a second driver. The important channels in this graph are the green colour when the driver is gaining on the opponent and red when he is losing, and the length of the bars. The idea behind this graph is that a driver comes back into the box and has a few seconds to work out where he can improve before he goes out again. Or his engineers, who has to pass on this information. The annotation of the speed difference value within the bar makes it faster to read, and the legend makes it clear who the reference driver is.

Disclaimer!: \
Due to the tick rate of the measurements in a lap, which are different for each driver's car (in this API), the graph is not very accurate, but the concept is still valid.

In [11]:
df = get_avg_speed_diff_drivers(session, 'LEC', 'PIA')
df.head()

Unnamed: 0,LEC_avg_speed,corner_number,start_dist,end_dist,PIA_avg_speed,start_dist.1,end_dist.1,speed_diff
1,214.0,1.0,0.0,180.722222,219.08,0.0,176.430556,-5.08
2,197.76,2.0,180.722222,546.755,206.28,176.430556,570.538333,-8.52
3,242.64,3.0,546.755,726.677222,241.89,570.538333,725.205,0.75
4,174.92,4.0,726.677222,864.989167,177.4,725.205,862.940556,-2.48
5,166.25,5.0,864.989167,1091.059167,167.41,862.940556,1090.975833,-1.16


<img src="static_plots/5_fastest_lap_speed_diff.png" alt="TyreCompounds" width="1000"/>

## 4 - Scatter plot: Laptimes and compounds (tyre selection)

### A brief introduction to compounds
The type of compound is one of the most important elements in F1 as different ones have different performances.
The available ones are:
* Soft (red)
* Medium (yellow)
* Hard (white)
* Intermediate (green)
* Fully wet (blue) 

The softer a compound is, the more performance it gives in the short term and the more it degrades over time. The last two, intermediate and full wet, are used for light and heavy rain respectively.

### Visualisation explanation

This graph shows the lap times and the compound types for the data of one driver's session. On the x-axis is the lap number and on the y-axis is the lap time. Another important channel of information is the colour of the points that represent the type of compound, as explained in the legend. 

The shape of the dots gives additional information about tyre degradation and fuel consumption. If the lap times are getting slower every lap, a diagonal upward pattern (as in the example I have chosen) means that there is more tyre degradation than fuel consumption gains, making the car lighter every lap.
On the other hand, if we have a downward diagonal pattern, it means that the car is getting lighter each lap for the fuel consumption and the tyres are not degrading much.

To make it easier to interpret, I have added a hover annotation to each point, giving the exact lap time and lap.

This graph serves as the basis for the next one, which is the idea for the experiment, in which there is a comparison between two drivers.

<img src="static_plots/6_compound_type.png" alt="TyreCompounds" width="1000"/>

### Merged Scatter Plot: Driver Laptime/Compounds Comparison
The aim of this graph is the same as the last one, but this time we have two drivers. If we used the same system as the previous one, we would have the problem of not being able to distinguish between the drivers, as the colour of the dots represents only the compounds. We also want to keep the information about the type of compound. 

The simplest solution would have been to make the dots of each driver different colours and add the type of compound in the hover annotation. 

The more advanced solution I came up with, which still makes the plot fairly readable, is to use the colour of the dots to represent the driver, while an outer edge of the dot is used for the compound type. This solution was also inspired by how tyre compounds look in the real world (image below). 

<img src="images/tyrecompounds.jpg" alt="TyreCompounds" width="300"/>



This allows you to quickly compare the performance of one driver to another just by looking at the shapes. The addition of the hover annotation, which tells you exactly "DRIVER-COMPOUND-LAP-LAPTIME", and the legend, which specifies each colour with each border, the possibility of clicking and activating/deactivating some compounds, makes this graph quite readable, even if there is a lot of information inside.

The target of this plot is both an entusiast of the sport and engineers who want to review the session.

<img src="static_plots/7_compound_type_lap_times.png" alt="TyreCompounds" width="1000"/>

## 5 - Parallel Coordinates: Multidata Strategy Analysis
The last graph was the most challenging in terms of preparing the data well. The concept of this experiment is to represent the data and extract useful patterns regarding the optimal strategy, to do this the parallel coordinates plot has been used.
The data represented are:
* Qualifying position (starting position)
* Compound strategy 
* Number of pit stops
* Finish position of the race 

Each unique strategy used in the race has been extracted and annotated in the CompoundStrategy column, keeping only the first letter of each compound to improve readability. The same procedure is used for the number of pit stops. The colour lines have been adjusted to show the first position as red and the last positions as white. 
This makes it really easy to spot patterns by understanding the most common strategies of compounds and the number of stops that lead to the best final positions. The ability to swap columns and filter certain parts of the columns improves readability.

The target user of this graph is an engineer who knows what a parallel coordinates plot is.

In [15]:
df = get_parallel_coordinates_plot_dataset("2024: British Grand Prix")
df

core           INFO 	Loading data for British Grand Prix - Race [v3.3.5]
req            INFO 	Using cached data for session_info
req            INFO 	Using cached data for driver_info
req            INFO 	Using cached data for session_status_data
req            INFO 	Using cached data for lap_count
req            INFO 	Using cached data for track_status_data
req            INFO 	Using cached data for _extended_timing_data
req            INFO 	Using cached data for timing_app_data
core           INFO 	Processing timing data...
req            INFO 	Using cached data for car_data
req            INFO 	Using cached data for position_data
req            INFO 	Using cached data for weather_data
req            INFO 	Using cached data for race_control_messages
core           INFO 	Finished loading data for 20 drivers: ['44', '1', '4', '81', '55', '27', '18', '14', '23', '22', '2', '20', '3', '16', '77', '31', '11', '24', '63', '10']

The behavior of DataFrame concatenation with empty or all-NA 

Unnamed: 0,Driver,Stops,CompoundStrategy,QualiPosition,FinishPosition
18,RUS,1,"[MEDIUM, INTERMEDIATE]",1,19
0,HAM,2,"[MEDIUM, INTERMEDIATE, SOFT]",2,1
2,NOR,2,"[MEDIUM, INTERMEDIATE, SOFT]",3,3
1,VER,2,"[MEDIUM, INTERMEDIATE, HARD]",4,2
3,PIA,2,"[MEDIUM, INTERMEDIATE, MEDIUM]",5,4
5,HUL,2,"[MEDIUM, INTERMEDIATE, SOFT]",6,6
4,SAI,3,"[MEDIUM, INTERMEDIATE, HARD, SOFT]",7,5
6,STR,2,"[MEDIUM, INTERMEDIATE, MEDIUM]",8,7
8,ALB,2,"[MEDIUM, INTERMEDIATE, MEDIUM]",9,9
7,ALO,2,"[MEDIUM, INTERMEDIATE, MEDIUM]",10,8


<img src="static_plots/8_parallel_coordinates.png" alt="TyreCompounds" width="1000"/>

## Dash app
It is possible to view all these graphs interactively using the dashboard.py script. Note that this can be slow due to API limitations.
After launching the application, the default event selected will be the last one that has already happened in the calendar, as the data is retrieved in real time (but can of course be changed). It is necessary to select the session, which will load the drivers available in the dropdowns below. Finally is possible to select the two drivers that we want to compare to visualise the plots, which can take from a few seconds to a few minutes to be visualised as the data is retrieved.