# Exploring the evolution of lap and pole times for Silverstone, Monza and Monaco

### Table of Contents

* [Introduction](#Intro)
* [Data](#Data)
    * [Limiting Kaggle Data to Silverstone, Monza and Monaco](#section2_1)
    * [gp racing stats Dataset Heads](#section1_2)
* [Qualifying Times](#Quali)
    * [Kaggle Data](#section3_1)
    * [Kaggle Data Plots](#section3_2)
* [Data Cleaning](#clean)
* [Project Questions and Outline](#Project)

### Introduction <a class="anchor" id="Intro"></a>

In this notebook we will explore the data on Fastest Laps and Qualifying pole times for three tracks Silverstone, Monza, and Monaco.

These tracks were chosen because they have seen almost continuous racing year on year since Formula 1's inception in 1950.

Questions we are looking to answer include:
- How has the pace of the cars evolved?
- Have they evolved at the same pace for each track?

### Data <a class="anchor" id="Data"></a>
First up we will load in the required libraries and data

In [112]:
#Import Librabies
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [113]:
## Loading in the datasets

#Kaggle Data
circuits = pd.read_csv("kaggle_data//circuits.csv")
constructor_results = pd.read_csv("kaggle_data//constructor_results.csv")
constructor_standings = pd.read_csv("kaggle_data//constructor_standings.csv")
constructors = pd.read_csv("kaggle_data//constructors.csv")
driver_standings = pd.read_csv("kaggle_data//driver_standings.csv")
drivers = pd.read_csv("kaggle_data//drivers.csv")
lap_times = pd.read_csv("kaggle_data//lap_times.csv")
qualifying = pd.read_csv("kaggle_data//qualifying.csv")
results = pd.read_csv("kaggle_data//results.csv")
races = pd.read_csv("kaggle_data//races.csv")

#gp racing stats Data
silverstone_poles_data = pd.read_csv("gpracingstats_data//silverstone_poles_data.csv")
monza_poles_data = pd.read_csv("gpracingstats_data//monza_poles_data.csv")
monaco_poles_data = pd.read_csv("gpracingstats_data//monaco_poles_data.csv")
monza_fastest_lap_data = pd.read_csv("gpracingstats_data//monza_fastest_lap_data.csv")
silverstone_fastest_lap_data = pd.read_csv("gpracingstats_data//silverstone_fastest_lap_data.csv")
monaco_fastest_lap_data = pd.read_csv("gpracingstats_data//monaco_fastest_lap_data.csv")

#### Limiting to Silverstone, Monza and Monaco <a class="anchor" id="section2_1"></a>

This step is only required for the kaggle data sets, since our gp racing stats data are already split by circuit.

In [114]:
#find Silverstone, Monza and Monaco ID's and limit data to these tracks
circuits.circuitRef.unique()

array(['albert_park', 'sepang', 'bahrain', 'catalunya', 'istanbul',
       'monaco', 'villeneuve', 'magny_cours', 'silverstone',
       'hockenheimring', 'hungaroring', 'valencia', 'spa', 'monza',
       'marina_bay', 'fuji', 'shanghai', 'interlagos', 'indianapolis',
       'nurburgring', 'imola', 'suzuka', 'vegas', 'yas_marina', 'galvez',
       'jerez', 'estoril', 'okayama', 'adelaide', 'kyalami', 'donington',
       'rodriguez', 'phoenix', 'ricard', 'yeongam', 'jacarepagua',
       'detroit', 'brands_hatch', 'zandvoort', 'zolder', 'dijon',
       'dallas', 'long_beach', 'las_vegas', 'jarama', 'watkins_glen',
       'anderstorp', 'mosport', 'montjuic', 'nivelles', 'charade',
       'tremblant', 'essarts', 'lemans', 'reims', 'george', 'zeltweg',
       'aintree', 'boavista', 'riverside', 'avus', 'monsanto', 'sebring',
       'ain-diab', 'pescara', 'bremgarten', 'pedralbes', 'buddh',
       'americas', 'red_bull_ring', 'sochi', 'baku', 'portimao',
       'mugello', 'jeddah', 'losail', 

In [115]:
circuits[(circuits.circuitRef == "monaco") | (circuits.circuitRef == "monza") | (circuits.circuitRef == "silverstone") ]

Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
5,6,monaco,Circuit de Monaco,Monte-Carlo,Monaco,43.7347,7.42056,7,http://en.wikipedia.org/wiki/Circuit_de_Monaco
8,9,silverstone,Silverstone Circuit,Silverstone,UK,52.0786,-1.01694,153,http://en.wikipedia.org/wiki/Silverstone_Circuit
13,14,monza,Autodromo Nazionale di Monza,Monza,Italy,45.6156,9.28111,162,http://en.wikipedia.org/wiki/Autodromo_Naziona...


Now we know that Monaco has circuit id of 6, Monza is 14 and Silverstone is 9. Next we limit the races database only these tracks.

In [116]:
#Limiting races to these tracks
races_limited = races[(races.circuitId.isin([6,9,14]))]

#Drop down races_limited to just the columns we need
races_limited = races_limited[["raceId", "year", "circuitId", "name"]]

### Qualifying Pole Times <a class="anchor" id="Quali"></a>

#### Kaggle Data <a class="anchor" id="section3_1"></a>
We will first exlore the Kaggle data, this data is only from 1994 which will become apparent as we work with the data.

In [117]:
#Inspect layout of qualifying dataframe
print(qualifying.head())

   qualifyId  raceId  driverId  constructorId  number  position        q1  \
0          1      18         1              1      22         1  1:26.572   
1          2      18         9              2       4         2  1:26.103   
2          3      18         5              1      23         3  1:25.664   
3          4      18        13              6       2         4  1:25.994   
4          5      18         2              2       3         5  1:25.960   

         q2        q3  
0  1:25.187  1:26.714  
1  1:25.315  1:26.869  
2  1:25.452  1:27.079  
3  1:25.691  1:27.178  
4  1:25.518  1:27.236  


To continue our analysis, we need to rework qualifying session times from q1 and q3 into a seconds format. To do this we split on the colon and then convert to float values.

In [118]:
#Converting q3 to a time in seconds and making column float
q3_split = qualifying.q3.str.split(":")
qualifying["q3min"] = q3_split.str.get(0)
qualifying["q3sec"] = q3_split.str.get(1)
qualifying["q3min"] = qualifying["q3min"].astype("float")
qualifying["q3sec"] = qualifying["q3sec"].astype("float")
qualifying["q3"] = 60*qualifying["q3min"] + qualifying["q3sec"]

q1_split = qualifying.q1.str.split(":")
qualifying["q1min"] = q1_split.str.get(0)
qualifying["q1sec"] = q1_split.str.get(1)
qualifying["q1min"] = qualifying["q1min"].astype("float")
qualifying["q1sec"] = qualifying["q1sec"].astype("float")
qualifying["q1"] = 60*qualifying["q1min"] + qualifying["q1sec"]

#Droping qualifying down to required columns
qualifying = qualifying[["qualifyId", "raceId", "driverId", "constructorId", "q1", "q3", "position"]]
qualifying.head()



Unnamed: 0,qualifyId,raceId,driverId,constructorId,q1,q3,position
0,1,18,1,1,86.572,86.714,1
1,2,18,9,2,86.103,86.869,2
2,3,18,5,1,85.664,87.079,3
3,4,18,13,6,85.994,87.178,4
4,5,18,2,2,85.96,87.236,5


Next we identify the pole sitters by looking only at position entry equaling 1st. Before dropping down to only Monaco, Monza and Silverstone.

In [119]:
#To find pole times, we limit to position = 1
qualifying_poles = qualifying[qualifying.position == 1]

#remove position column
qualifying_poles = qualifying_poles[["qualifyId", "raceId", "driverId", "constructorId", "q1", "q3"]]

In [120]:
#Drop down to Monaco, Monza and Silverstone
merged_df = qualifying_poles.merge(races_limited, how = "inner", on = "raceId")   

In the mid 2000's, Formula 1 saw a change in qualifying format, going from a single session: "q1", to 3 sessions: "q1", "q2" and "q3". 

To work around this. we make a new column called pole time, we assign it intially to q1 values, then we cycle through every row and if a q3 value is present (i.e. the new quali format) we assign q3 as the pole time.

In [121]:

#Create new pole time column, we want to assign it the q3 value if non null, otherwise we take q1, this is due to a change in qualifying formats
merged_df["pole_time"] = merged_df["q1"]

for i in range(len(merged_df)):
    entry = merged_df.iloc[i][5]
    if np.all(pd.notnull([entry])):
        merged_df.iat[i,9] = entry

In [122]:
#Rename merged_df and drop down to required columns
qualifying_pole_times = merged_df[["year", "circuitId", "name", "pole_time"]]

#Sort by year
qualifying_pole_times=qualifying_pole_times.sort_values(by=['year'])

#Split into Monaco, Silvestone and Monza
monaco_qualifying_pole_times = qualifying_pole_times[qualifying_pole_times.circuitId == 6]
monza_qualifying_pole_times = qualifying_pole_times[qualifying_pole_times.circuitId == 14]
silverstone_qualifying_pole_times = qualifying_pole_times[qualifying_pole_times.circuitId == 9]

In [144]:
silverstone_qualifying_pole_times


Unnamed: 0,year,circuitId,name,pole_time
20,1994,9,British Grand Prix,84.96
18,1995,9,British Grand Prix,88.124
15,1997,9,British Grand Prix,81.598
13,1998,9,British Grand Prix,83.271
12,2000,9,British Grand Prix,85.703
29,2003,9,British Grand Prix,81.209
32,2004,9,British Grand Prix,78.233
9,2005,9,British Grand Prix,79.905
7,2006,9,British Grand Prix,80.253
4,2007,9,British Grand Prix,79.997


#### Exploring kaggle pole times plots <a class="anchor" id="section3_2"></a>

In [145]:
#plt.figure(figsize = (15,4))
plt.plot(monaco_qualifying_pole_times.year,monaco_qualifying_pole_times.pole_time, label = "Monaco")
#plt.plot(monza_qualifying_pole_times.year,monza_qualifying_pole_times.pole_time, label = "Monza")
#plt.plot(silverstone_qualifying_pole_times.year,silverstone_qualifying_pole_times.pole_time, label = "Silverstone")

plt.legend()
plt.title("Evolution of Qualifying Pole times 1994 to Present")
plt.xlabel("Year")
plt.ylabel("Qualifying Pole Time in Seconds")
plt.show()
plt.clf()

In [124]:
silverstone_poles_data_split = silverstone_poles_data['pole_time'].str.split(":")
silverstone_poles_data["mins"] = silverstone_poles_data_split.str.get(0)
silverstone_poles_data["secs"] = silverstone_poles_data_split.str.get(1)
silverstone_poles_data["mins"] = silverstone_poles_data["mins"].astype("float")
silverstone_poles_data["secs"] = silverstone_poles_data["secs"].astype("float")
silverstone_poles_data["pole_time"] = 60*silverstone_poles_data["mins"] + silverstone_poles_data["secs"]
silverstone_poles_data = silverstone_poles_data[["year", "pole_time"]]


#rename column
silverstone_poles_data = silverstone_poles_data.rename(columns = {"pole_time": "silver_pole_time"})
print(silverstone_poles_data.head())


   year  silver_pole_time
0  1950             110.8
1  1951             103.4
2  1952             110.0
3  1953             108.0
4  1954             105.0


In [125]:
monza_poles_data_split = monza_poles_data['pole_time'].str.split(":")
monza_poles_data["mins"] = monza_poles_data_split.str.get(0)
monza_poles_data["secs"] = monza_poles_data_split.str.get(1)
monza_poles_data["mins"] = monza_poles_data["mins"].astype("float")
monza_poles_data["secs"] = monza_poles_data["secs"].astype("float")
monza_poles_data["pole_time"] = 60*monza_poles_data["mins"] + monza_poles_data["secs"]
monza_poles_data = monza_poles_data[["year", "driver", "pole_time"]]

#rename column
monza_poles_data = monza_poles_data.rename(columns = {"pole_time": "monza_pole_time"})

In [126]:
monaco_poles_data_split = monaco_poles_data['pole_time'].str.split(":")
monaco_poles_data["mins"] = monaco_poles_data_split.str.get(0)
monaco_poles_data["secs"] = monaco_poles_data_split.str.get(1)
monaco_poles_data["mins"] = monaco_poles_data["mins"].astype("float")
monaco_poles_data["secs"] = monaco_poles_data["secs"].astype("float")
monaco_poles_data["pole_time"] = 60*monaco_poles_data["mins"] + monaco_poles_data["secs"]
monaco_poles_data = monaco_poles_data[["year", "driver", "pole_time"]]

#rename column
monaco_poles_data = monaco_poles_data.rename(columns = {"pole_time": "monaco_pole_time"})

In [127]:
plt.figure(figsize = (15,4))
plt.plot(monaco_poles_data.year,monaco_poles_data.monaco_pole_time, label = "Monaco")
plt.plot(monza_poles_data.year,monza_poles_data.monza_pole_time, label = "Monza")
plt.plot(silverstone_poles_data.year,silverstone_poles_data.silver_pole_time, label = "Silverstone")

plt.legend()
plt.title("Evolution of Qualifying Pole times 1950 to Present")
plt.xlabel("Year")
plt.ylabel("Qualifying Pole Time in Seconds")
plt.show()
plt.clf()

<IPython.core.display.Javascript object>

# Fastest lap speeds

In [128]:
#Changing fastest lap speed to float
results.fastestLapSpeed = results.fastestLapSpeed.astype("float64")

#Grouping per racea and selecting the fastest speed
grouped_results = results.groupby("raceId").max("fastestLapSpeed")

#drop down to require columns
grouped_results = grouped_results[["fastestLapSpeed"]]
grouped_results.head()

Unnamed: 0_level_0,fastestLapSpeed
raceId,Unnamed: 1_level_1
1,217.668
2,206.483
3,174.289
4,206.049
5,202.484


In [129]:
results_races_merged = grouped_results.merge(races, on = "raceId")
print(results_races_merged)

#Dropping down to required columns
results_races_merged = results_races_merged[["raceId", "fastestLapSpeed", "year", "circuitId", "name"]]
results_races_merged = results_races_merged.sort_values(by = "year")
results_races_merged.head()

      raceId  fastestLapSpeed  year  round  circuitId                   name  \
0          1          217.668  2009      1          1  Australian Grand Prix   
1          2          206.483  2009      2          2   Malaysian Grand Prix   
2          3          174.289  2009      3         17     Chinese Grand Prix   
3          4          206.049  2009      4          3     Bahrain Grand Prix   
4          5          202.484  2009      5          4     Spanish Grand Prix   
...      ...              ...   ...    ...        ...                    ...   
1086    1106          210.786  2023      8          7    Canadian Grand Prix   
1087    1107          231.970  2023      9         70    Austrian Grand Prix   
1088    1108          234.922  2023     10          9     British Grand Prix   
1089    1109          195.910  2023     11         11   Hungarian Grand Prix   
1090    1110          234.978  2023     12         13     Belgian Grand Prix   

            date      time             

Unnamed: 0,raceId,fastestLapSpeed,year,circuitId,name
838,839,,1950,14,Italian Grand Prix
837,838,,1950,55,French Grand Prix
836,837,,1950,13,Belgian Grand Prix
835,836,,1950,66,Swiss Grand Prix
834,835,,1950,19,Indianapolis 500


In [130]:
results_races_merged.isna().sum()

raceId               0
fastestLapSpeed    714
year                 0
circuitId            0
name                 0
dtype: int64

In [131]:
fastest_lap_speed_monaco = results_races_merged[results_races_merged.circuitId == 6]
fastest_lap_speed_monza = results_races_merged[results_races_merged.circuitId == 14]
fastest_lap_speed_silverstone = results_races_merged[results_races_merged.circuitId == 9]

fastest_lap_speed_silverstone = fastest_lap_speed_silverstone.rename(columns = {"fastestLapSpeed": "silver_fastest_lap_speed"})
fastest_lap_speed_monza = fastest_lap_speed_monza.rename(columns = {"fastestLapSpeed": "monza_fastest_lap_speed"})
fastest_lap_speed_monaco = fastest_lap_speed_monaco.rename(columns = {"fastestLapSpeed": "monaco_fastest_lap_speed"})


fastest_lap_speed_silverstone.head()

Unnamed: 0,raceId,silver_fastest_lap_speed,year,circuitId,name
832,833,,1950,9,British Grand Prix
828,829,,1951,9,British Grand Prix
820,821,,1952,9,British Grand Prix
812,813,,1953,9,British Grand Prix
802,803,,1954,9,British Grand Prix


In [132]:
plt.figure(figsize=(15,4))
plt.plot(fastest_lap_speed_monaco.year,fastest_lap_speed_monaco.monaco_fastest_lap_speed, label = "Monaco")
plt.plot(fastest_lap_speed_monza.year,fastest_lap_speed_monza.monza_fastest_lap_speed, label = "Monza")
plt.plot(fastest_lap_speed_silverstone.year,fastest_lap_speed_silverstone.silver_fastest_lap_speed, label = "Silverstone")

plt.legend()
plt.title("Evolution of Fastest Lap Average Speeds")
plt.xlabel("Year")
plt.ylabel("Average speed during fastest lap (km/h)")
plt.show()
plt.clf()

<IPython.core.display.Javascript object>

In [133]:
results_races_merged[(results_races_merged.year == 2000) & (results_races_merged.circuitId == 6)]

Unnamed: 0,raceId,fastestLapSpeed,year,circuitId,name
163,164,,2000,6,Monaco Grand Prix


# Fastest Lap Times

In [134]:
monza_fastest_lap_data_split = monza_fastest_lap_data['lap_time'].str.split(":")
monza_fastest_lap_data["mins"] = monza_fastest_lap_data_split.str.get(0)
monza_fastest_lap_data["secs"] = monza_fastest_lap_data_split.str.get(1)
monza_fastest_lap_data["mins"] = monza_fastest_lap_data["mins"].astype("float")
monza_fastest_lap_data["secs"] = monza_fastest_lap_data["secs"].astype("float")
monza_fastest_lap_data["lap_time"] = 60*monza_fastest_lap_data["mins"] + monza_fastest_lap_data["secs"]
monza_fastest_lap_data = monza_fastest_lap_data[["year", "lap_time"]]
monza_fastest_lap_data = monza_fastest_lap_data.rename(columns = {"lap_time": "monza_fastest_lap_time"})


In [135]:
monaco_fastest_lap_data_split = monaco_fastest_lap_data['lap_time'].str.split(":")
monaco_fastest_lap_data["mins"] = monaco_fastest_lap_data_split.str.get(0)
monaco_fastest_lap_data["secs"] = monaco_fastest_lap_data_split.str.get(1)
monaco_fastest_lap_data["mins"] = monaco_fastest_lap_data["mins"].astype("float")
monaco_fastest_lap_data["secs"] = monaco_fastest_lap_data["secs"].astype("float")
monaco_fastest_lap_data["lap_time"] = 60*monaco_fastest_lap_data["mins"] + monaco_fastest_lap_data["secs"]
monaco_fastest_lap_data = monaco_fastest_lap_data[["year", "lap_time"]]
monaco_fastest_lap_data = monaco_fastest_lap_data.rename(columns = {"lap_time": "monaco_fastest_lap_time"})


In [136]:
silverstone_fastest_lap_data_split = silverstone_fastest_lap_data['lap_time'].str.split(":")
silverstone_fastest_lap_data["mins"] = silverstone_fastest_lap_data_split.str.get(0)
silverstone_fastest_lap_data["secs"] = silverstone_fastest_lap_data_split.str.get(1)
silverstone_fastest_lap_data["mins"] = silverstone_fastest_lap_data["mins"].astype("float")
silverstone_fastest_lap_data["secs"] = silverstone_fastest_lap_data["secs"].astype("float")
silverstone_fastest_lap_data["lap_time"] = 60*silverstone_fastest_lap_data["mins"] + silverstone_fastest_lap_data["secs"]
silverstone_fastest_lap_data = silverstone_fastest_lap_data[["year", "lap_time"]]
silverstone_fastest_lap_data = silverstone_fastest_lap_data.rename(columns = {"lap_time": "silver_fastest_lap_time"})


In [137]:
plt.figure(figsize = (15,4))
plt.plot(monaco_fastest_lap_data.year,monaco_fastest_lap_data.monaco_fastest_lap_time, label = "Monaco")
plt.plot(monza_fastest_lap_data.year,monza_fastest_lap_data.monza_fastest_lap_time, label = "Monza")
plt.plot(silverstone_fastest_lap_data.year,silverstone_fastest_lap_data.silver_fastest_lap_time, label = "Silverstone")

plt.legend()
plt.title("Evolution of Fastest times 1950 to Present")
plt.xlabel("Year")
plt.ylabel("Qualifying Pole Time in Seconds")
plt.show()
plt.clf()

<IPython.core.display.Javascript object>

# Creating Combined Table

In [138]:
#Start with silverstone fastest laps
fastest_laps_and_poles = silverstone_fastest_lap_data
fastest_laps_and_poles.head()

Unnamed: 0,year,silver_fastest_lap_time
0,1950,110.6
1,1951,104.0
2,1952,112.0
3,1953,110.0
4,1954,110.0


In [139]:
#Merge with silverstone fastest lap
fastest_laps_and_poles = fastest_laps_and_poles.merge(fastest_lap_speed_silverstone[["year", "silver_fastest_lap_speed"]], on = "year", how = "left")

fastest_laps_and_poles.head()

Unnamed: 0,year,silver_fastest_lap_time,silver_fastest_lap_speed
0,1950,110.6,
1,1951,104.0,
2,1952,112.0,
3,1953,110.0,
4,1954,110.0,


In [140]:
#Merge with silverstone pole times
fastest_laps_and_poles = fastest_laps_and_poles.merge(silverstone_poles_data[["year", "silver_pole_time"]], on = "year", how = "left")

fastest_laps_and_poles.head()

Unnamed: 0,year,silver_fastest_lap_time,silver_fastest_lap_speed,silver_pole_time
0,1950,110.6,,110.8
1,1951,104.0,,103.4
2,1952,112.0,,110.0
3,1953,110.0,,108.0
4,1954,110.0,,105.0


In [141]:
#Repeat with monza and monaco
fastest_laps_and_poles = fastest_laps_and_poles.merge(monza_fastest_lap_data[["year", "monza_fastest_lap_time"]], on = "year", how = "left")
fastest_laps_and_poles = fastest_laps_and_poles.merge(fastest_lap_speed_monza[["year", "monza_fastest_lap_speed"]], on = "year", how = "left")
fastest_laps_and_poles = fastest_laps_and_poles.merge(monza_poles_data[["year", "monza_pole_time"]], on = "year", how = "left")

fastest_laps_and_poles = fastest_laps_and_poles.merge(monaco_fastest_lap_data[["year", "monaco_fastest_lap_time"]], on = "year", how = "left")
fastest_laps_and_poles = fastest_laps_and_poles.merge(fastest_lap_speed_monaco[["year", "monaco_fastest_lap_speed"]], on = "year", how = "left")
fastest_laps_and_poles = fastest_laps_and_poles.merge(monaco_poles_data[["year", "monaco_pole_time"]], on = "year", how = "left")



fastest_laps_and_poles.head()

Unnamed: 0,year,silver_fastest_lap_time,silver_fastest_lap_speed,silver_pole_time,monza_fastest_lap_time,monza_fastest_lap_speed,monza_pole_time,monaco_fastest_lap_time,monaco_fastest_lap_speed,monaco_pole_time
0,1950,110.6,,110.8,120.0,,118.3,111.0,,110.2
1,1951,104.0,,103.4,116.5,,113.2,,,
2,1952,112.0,,110.0,126.1,,125.7,,,
3,1953,110.0,,108.0,124.5,,122.7,,,
4,1954,110.0,,105.0,120.8,,119.0,,,


In [142]:
fastest_laps_and_poles.shape

(58, 10)

In [143]:
fastest_laps_and_poles.to_csv("fastest_laps_and_poles.csv", index=False)