The high speeds and the brake neck overtakes at some of the most notorious corners in racing. The sheer adrenaline of watching someone extract every single unit of the power from the engine and the level of tech and engineering that is poured into these cars makes F1 a brilliant sport. The Williams of 1992, with which Nigel Mansel swept the competition apart, was primarily due to the car maker inventing a groundbreaking suspension technology that would adapt based on different road conditions. From intricate technologies like active suspension to the unsophisticated nature of rear-view mirrors and steering wheel buttons, F1's engineering has enabled us to experience some of these in our road cars, making them a lot of efficient, safer and faster. 

The following is an explorative analysis of F1 to break down some historical data on circuits, drivers and racing data to understand what makes them the best or the worst. 

In [2183]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import datetime 

### Reading the csv files 

In [680]:
circuits = pd.read_csv('circuits.csv')
laptimes = pd.read_csv('lap_times.csv')
pitstops = pd.read_csv('pit_stops.csv')
seasons = pd.read_csv('seasons.csv',parse_dates=['year'])
status = pd.read_csv('status.csv')

In [6]:
constructor_standings = pd.read_csv('constructor_standings.csv')
constructors = pd.read_csv('constructors.csv')
driver_standings = pd.read_csv('driver_standings.csv')
drivers = pd.read_csv('drivers.csv')

In [1902]:
races = pd.read_csv('races.csv',parse_dates=['year'])
constructor_results = pd.read_csv('constructor_results.csv')
results = pd.read_csv('results.csv')
qualifying = pd.read_csv('qualifying.csv')

### Most successful constructors 

Many well-known, established names have competed at the highest level of racing with little success. A good reason for this is that F1 is not the best use of value creation, according to car makers, as the costs and returns associated with running an F1 team are close to breaking even at best. Good old finance says that any investment's return on equity being less than the cost of capital is a venture that destroys value. But this might not be entirely true in all cases as F1 provides a great platform to advertise and market the respective carmaker's brand; it's a cost that could be incurred to increase the top-line growth. Let's look at two charts to understand which constructors have consistently put in the money and engineering prowess to stay at the top. i.e. if you're a mega fan of the sport, you already know the answer. 

In [493]:
#merging the constructors dataframe with race results

team = constructors.merge(results,on='constructorId',how = 'left')

In [2146]:
#extracting the columns needed and grouping it by constructor name, extracting the total races entered

best = team[['name','points','raceId']]
best = best.groupby('name')['raceId'].nunique().sort_values(ascending=False).reset_index(name = 'races')
best = best[best['races'] >= 100]
best.head() 

Unnamed: 0,name,races
0,Ferrari,1043
1,McLaren,872
2,Williams,786
3,Tyrrell,433
4,Renault,403


In [510]:
#building a formula to calculate points per race 

func = lambda x: x.points.sum()/x.raceId.nunique()
data = team[team['name'].isin(best.name)].groupby('name').apply(func).sort_values(ascending=False).reset_index(name = 'points_per_race')
data.head(10)

Unnamed: 0,name,points_per_race
0,Mercedes,25.611538
1,Red Bull,17.667656
2,Ferrari,9.479386
3,McLaren,6.961009
4,Force India,5.179245
5,Williams,4.569975
6,Renault,4.409429
7,Benetton,3.311538
8,BRM,2.581731
9,Team Lotus,2.518987


In [601]:
#plotting the results

fig = go.Figure(
    data=[go.Bar(x = data.name, y=data['points_per_race'])],
    layout_title_text="Constructor's Points per Race"
    
)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.show()

Mercedes and Redbull have been highly consistent in the past decade, reflecting their points per race stat. On the contrary, Ferrari is yet to win a world championship since 2008. An interesting observation is Force India; considering it was a small-scale budget team compared to the giants like Mercedes and Ferrari, it did fantastic work on the track with an average of five points per race. 

In [508]:
#calculating historic overall points of top 10 constructors

historic_points = team.groupby('name').agg({'points':'sum'}).sort_values('points',ascending=False).reset_index().head(10)
historic_points

Unnamed: 0,name,points
0,Ferrari,9887.0
1,Mercedes,6659.0
2,McLaren,6070.0
3,Red Bull,5954.0
4,Williams,3592.0
5,Renault,1777.0
6,Force India,1098.0
7,Team Lotus,995.0
8,Benetton,861.0
9,Tyrrell,711.0


In [600]:
#plotting a bar chart

fig = go.Figure(
    data=[go.Bar(x = historic_points.name, y=historic_points['points'])],
    layout_title_text="Constructor's Historic Points"
)
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.show()

The chart's most impressive is Mercedes AMG Petronas being second, considering they entered Formula in 2010. They've covered two-thirds of Ferrari's historical points in twelve years, an astonishing feat. 

### Do higher altitude circuits cause more engine failures? 

In [520]:
#merging circuits, races, results and race status dataframes

df = circuits.merge(races,how='left',left_on = 'circuitid',right_on = 'circuitId')
df2 = df.merge(results,how='left',on='raceId')
status_df = df2.merge(status,how='inner',left_on = 'statusId',right_on= 'statusid')

In [521]:
#cosmetic changes: dropping columns and renaming

status_df.drop(['name_y','url_y','url_x','time_y'],axis=1,inplace=True)
status_df.rename(columns={'name_x':'name','time_x':'time'},inplace=True)

In [574]:
#including rows with issues correlated with thin air in higher altitudes, setting the year to last 7 to include Mexico GP

altitude = status_df[status_df['status'].isin(['Transmission','Engine','Overheating'])]
altitude = altitude[altitude['year'] >= 2015]
altitude.head()

Unnamed: 0,circuitid,circuitref,name,location,country,lat,lng,alt,raceId,year,...,points,laps,milliseconds,fastestLap,rank,fastestLapTime,fastestLapSpeed,statusId,statusid,status
14065,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,926,2015,...,0.0,32.0,,30.0,13.0,01:34.3,202.458,5.0,5,Engine
14066,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,948,2016,...,0.0,38.0,,15.0,21.0,01:33.9,203.327,5.0,5,Engine
14067,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,948,2016,...,0.0,21.0,,21.0,5.0,01:30.7,210.48,5.0,5,Engine
14068,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,989,2018,...,0.0,13.0,,13.0,18.0,01:30.6,210.601,5.0,5,Engine
14069,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-37.8497,144.968,10,1010,2019,...,0.0,9.0,,9.0,20.0,01:30.9,210.022,5.0,5,Engine


In [1301]:
#grouping by track name and altitude and renaming columns 

circuit_altitudes = altitude.groupby(['name','alt'])['status'].count().sort_values(ascending = False).reset_index().head(10)
circuit_altitudes.rename(columns={'status':'engine & transmission failures'},inplace=True)
circuit_altitudes

Unnamed: 0,name,alt,engine & transmission failures
0,Autódromo Hermanos Rodríguez,2227,7
1,Bahrain International Circuit,7,7
2,Red Bull Ring,678,6
3,Albert Park Grand Prix Circuit,10,5
4,Autodromo Nazionale di Monza,162,5
5,Circuit de Spa-Francorchamps,401,4
6,Marina Bay Street Circuit,18,4
7,Yas Marina Circuit,3,4
8,Circuit de Barcelona-Catalunya,109,3
9,Hockenheimring,103,3


In [2182]:
#plotting a bubble chart: bigger the size of the bubble, higher the altitude

df = circuit_altitudes

fig = px.scatter(df, x="alt", y="engine & transmission failures",
         size="alt", color="name",
                  log_x=True, size_max= 80)
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show()

### The case for the best F1 drivers  

Seventy-two years of F1 history has been 34 different champions. Cars over the years been tremendous changes. From the early 2000s V10 screaming notes to the exceptionally engineered and safer V6, the sport has considerably changed in the last 20 years. A few drivers on the grid witnessed their best in one era before giving way to younger drivers to take over. Let's look at the first chart that depicts historic champion distribution by nation. 

#### Distribution by Geography

In [2165]:
# grouoing by nationality, counting the driver and plotting a pie chart

driver_nationality = drivers.groupby('nationality')['nationality'].count().sort_values(ascending = False).reset_index(name = 'number of drivers').head(10)
fig = go.Figure(data=[go.Pie(labels=driver_nationality.nationality, values=driver_nationality['number of drivers'])])
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(
    title="Historical Driver Nationality Distribution since 1950")
fig.show()

In [2155]:
#merging drivers, driver standings and race data 

driver_position = drivers.merge(driver_standings,left_on='driverid',right_on='driverId',how = 'left')
driver_position = driver_position.merge(races,on = 'raceId',how = 'left')

In [2156]:
#grouping by nationality year and surname to get the max points achieved every season

champions = driver_position.groupby(['nationality','year','surname'])[['points','wins']
                                            ].max().sort_values('points',ascending = False).reset_index()
champions.drop_duplicates(subset=['year'], inplace=True)

In [2157]:
# counting the number of times a nation ended in P1 and plotting a pie chart

champion_nations = champions.nationality.value_counts().to_frame()
fig = go.Figure(data=[go.Pie(labels=champion_nations.index, values=champion_nations['nationality'])])
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(
    title="Distribution of Historic Champions by Nation")
fig.show()

To understand why F1 saw the most British drivers and champions, we'll have to transcend back to WWII and the prevailing aerial duels over the British channel against the Germans. Due to the constant aerial warfare, the British were forced to build massive airfields to defend against the Germans. Post-WWII and the fall of Nazi Germany made these airfields utterly unserviceable until a band of British motor enthusiasts decided to turn these airfields into creative race tracks. Soon enough, this attracted race car drivers and engineers who worked on complex fighter jet engines in the war to build the best race cars and test them on the now-converted race tracks. One of the airfields went on to become the "mecca" of racing, The Silverstone Circuit. Over the years, the influx of racing talent to Britain has seen many F1 teams set up their headquarters in the region. 6 out of 10  constructors in 2022 have their offices in the United Kingdom. 

In [1420]:
#grouping by nationality year and surname to get the max points achieved every season and dropping year duplicates

champion_drivers = driver_position.groupby(['nationality','year','surname'])[['points','wins']
                                            ].max().sort_values('points',ascending = False).reset_index()
champion_drivers.drop_duplicates(subset=['year'], inplace=True)

#grouping by nationality and counting the surname of drivers 

final = champion_drivers.groupby('nationality')['surname'].nunique().reset_index(name = 'champions').sort_values(
    by='champions',ascending = False)

#merging both the datasets and creating a column to calculate the ratio

ratios = final.merge(driver_nationality,on='nationality',how='inner')
ratios['perc_winners'] = (ratios.champions/ratios['number of drivers']*100).round(2)
ratios = ratios.sort_values('perc_winners',ascending = False)
ratios.head(5) 

Unnamed: 0,nationality,champions,number of drivers,perc_winners
2,Finnish,3,9,33.33
6,Austrian,2,15,13.33
5,Australian,2,17,11.76
12,New Zealander,1,9,11.11
1,Brazilian,3,32,9.38


In [1431]:
#creating a bar chart

df = ratios
fig = px.bar(df, x='nationality', y='perc_winners',
         hover_data=['champions','number of drivers'], color='number of drivers',
         height=400)
fig.update_traces(textfont_size=20,
              marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(
    title="Champions from a nation with respect to total drivers from the nation")
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

Can we put the success of the Finnish in F1 down to chance and luck with just nine drivers entering the grid, or is it much more than that? If we start digging deeper, we realise what makes a country of just five million people such good racing drivers. One reason could be the exposure to automobiles at a young age because the skillset required to drive in the harsh cold terrains of Finnish lands needs to be introduced as early as possible. This was also quoted by Kimi Raikonnen (The last Ferrari world champion and Finnish driver), "Our roads and long winters. You really have to be a good driver to survive in Finland. It is always slippery and bumpy." The second reason is Finland has a lively racing culture amongst the old and the young, with the country having nearly 20 official carting tracks, one of the highest per capita in the world. Motor racing requires a level of composure that's uncalled for in regular day-to-day instances, mainly due to blasting down the straights and curves at over 250kmph. The Finnish seem to have a knack for this in their blood; they call it Sisu. In English, it loosely translates into a stoic sense of determination and purpose while accepting the outcome without judgement. They call Kimi Raikonnen "The Iceman" for no reason; the Finnish drivers are racers of few words. When Kimi's car caught fire in the 2006 Monaco GP while fighting for first place, he simply strolled out of the car and onto his yacht and watched the race while having beers and champagne. 

#### Most wins by a driver in a single season

In [710]:
#merging driver data, their standings and race data

driver_position = drivers.merge(driver_standings,left_on='driverid',right_on='driverId',how = 'left')
driver_position = driver_position.merge(races,on = 'raceId',how = 'left')

In [733]:
#filtering the dataset to include only where the position is 1 and grouping by name, year and extracting the max wins

positions = driver_position[driver_position['position'] == 1].groupby(
    ['surname','year'])['wins'].max().sort_values(ascending=False).reset_index(name = 'Wins')
positions.head(20)
positions.year = positions.year.dt.year
positions.rename(columns={'surname':'name'},inplace=True)
positions.Wins = positions.Wins.astype('int64')

positions.head(20)

Unnamed: 0,name,year,Wins
0,Schumacher,2004,13
1,Vettel,2013,13
2,Hamilton,2019,11
3,Vettel,2011,11
4,Schumacher,2002,11
5,Hamilton,2020,11
6,Hamilton,2018,11
7,Hamilton,2014,11
8,Hamilton,2015,10
9,Verstappen,2021,10


In [735]:
#plotting a bubble chart

fig = px.scatter(positions.head(30), x="year", y="Wins", color="name",
                 title="Most wins by a driver in a single season",size = 'Wins')
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.show()

#### Most competitive seasons by points difference

In [889]:
competition = driver_position[driver_position['year'].dt.year == 1991
               ].groupby(['surname','year']).points.max().sort_values(ascending = False).reset_index().head(5)
competition.year = competition.year.dt.year
competition.iloc[0:2,0]         

Unnamed: 0,surname,year,points
0,Senna,1991,96.0
1,Mansell,1991,72.0
2,Patrese,1991,53.0
3,Berger,1991,43.0
4,Prost,1991,34.0


In [1412]:
def rivalry(x):
    competition = driver_position[driver_position['year'].dt.year == x
               ].groupby(['surname','year']).points.max().sort_values(ascending = False).reset_index().head(5)
    competition.year = competition.year.dt.year    
    
    h = driver_position.merge(circuits,left_on='circuitId',right_on='circuitid',how = 'left')
    h.rename(columns={'name_y':'circuit_name'},inplace=True)
    viz = h.loc[:,['date','year','circuit_name','surname','points','wins']]

    viz.dropna(inplace = True)

    viz.points = viz.points.astype('int64')
    viz.wins = viz.wins.astype('int64')
    viz.year = viz.year.dt.year
    viz.date = pd.to_datetime(viz.date)
    
    top_five = viz[viz.loc[:,'year'] == x]
    top_five = top_five.groupby(['surname'])[['points','wins']].max().sort_values('points',ascending = False).head(6).reset_index()
    
    duo = competition.iloc[0:2]
    if competition.iloc[0,2] - competition.iloc[1,2] <= 10:
        print('\033[1m' + 'A rivalry in the history books!')
    elif competition.iloc[0,2] - competition.iloc[1,2] <= 20:
        print('\033[1m' + 'Spicy!')
    elif competition.iloc[0,2] - competition.iloc[1,2] < 30:
        print('\033[1m' + 'Meh!')
    elif competition.iloc[0,2] - competition.iloc[1,2] >= 30:
        print('\033[1m' + 'Snore Fest!')

        
    df = top_five
    fig = px.bar(df, x='surname', y='points',
             hover_data=['wins'], color='points',
            height=400,color_continuous_scale= 'turbo')
    fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
    fig.update_xaxes(showgrid=False)
    fig.update_yaxes(showgrid=False)
    
    print('----------------------------------')
    x = fig.show()
    return x 

In [2173]:
rivalry(2007)      

[1mA rivalry in the history books!
----------------------------------


### Who has the fastest lap time in every circuit?

In [1651]:
#merging and extraction of important columns

fast = circuits.merge(races,left_on = 'circuitid',right_on='circuitId',how = 'left')
fast = fast.merge(results,on='raceId',how = 'left')
fast = fast.merge(drivers,left_on='driverId',right_on='driverid',how = 'inner')
fast.rename(columns={'name_x':'circuit_name'},inplace = True)
fast = fast[['circuit_name','country','surname','fastestLapTime','nationality','year']]

# dropping null values and converting fastestlaptime into seconds

fast = fast.dropna()
fast['fastestLapTime_seconds']=fast['fastestLapTime'].apply(lambda x: float(x.split(':')[0])*60+float(x.split(':')[1])) 

In [1652]:
fast['fastest_recorded_lap'] = fast.groupby(['circuit_name'])['fastestLapTime'].transform('min')
fastest = fast[fast['fastest_recorded_lap']==fast['fastestLapTime']].sort_values('country').reset_index(col_level = 1)
fastest.drop(['index','fastest_recorded_lap'],inplace=True,axis = 1)
fastest['year'] = fastest.year.dt.year
fastest

Unnamed: 0,circuit_name,country,surname,fastestLapTime,nationality,year,fastestLapTime_seconds
0,Albert Park Grand Prix Circuit,Australia,Leclerc,01:20.3,Monegasque,2022,80.3
1,Red Bull Ring,Austria,Sainz,01:05.6,Spanish,2020,65.6
2,Baku City Circuit,Azerbaijan,Leclerc,01:43.0,Monegasque,2019,103.0
3,Bahrain International Circuit,Bahrain,Russell,00:55.4,British,2020,55.4
4,Circuit de Spa-Francorchamps,Belgium,Räikkönen,01:45.1,Finnish,2004,105.1
5,Autódromo José Carlos Pace,Brazil,Bottas,01:10.5,Finnish,2018,70.5
6,Circuit Gilles Villeneuve,Canada,Bottas,01:13.1,Finnish,2019,73.1
7,Shanghai International Circuit,China,Schumacher,01:32.2,German,2004,92.2
8,Circuit Paul Ricard,France,Vettel,01:32.7,German,2019,92.7
9,Circuit de Nevers Magny-Cours,France,Schumacher,01:15.4,German,2004,75.4


In [2195]:
fastest_viz = fastest.surname.value_counts().rename_axis('driver').reset_index(name= 'fastest laps') 

df = fastest_viz
fig = px.bar(df, x='driver', y='fastest laps',
         hover_data=['fastest laps'], color='fastest laps',
        height=400,color_continuous_scale= 'Blues')
fig.update_layout(
    title="Drivers with the most fastest ever laps")
fig.update_traces(textfont_size=20,
              marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

####  

### Qualifying Pole = Race Pole? 

In F1, finishing at the pole on Saturday often does not guarantee the same on race day. What makes the mark of a great driver? Finishing pole every time he starts at the pole? A perfect ratio of 1? Or someone who's won more races than qualifying wins?

##### Calculating grid pole positions

In [1799]:
#merging the (1) two dataframes [drivers and qualifying] needed and (2) performing aggregation function

driver_quali = results.merge(drivers,left_on = 'driverId',right_on='driverid',how = 'left')
driver_quali['full_name'] = driver_quali['forename'] + ' ' + driver_quali['surname']
driver_quali = driver_quali[['full_name','grid','position']]


quali_wins = driver_quali[driver_quali['grid'] == 1].groupby('full_name')['grid'].sum().reset_index(
                                        name = 'grid poles').sort_values('grid poles',ascending = False)

#resetting index numbers 

quali_wins = quali_wins.reset_index(col_level=0)
quali_wins.drop('index',axis = 1,inplace = True)
quali_wins.head(20)

Unnamed: 0,full_name,grid poles
0,Lewis Hamilton,103
1,Michael Schumacher,68
2,Ayrton Senna,65
3,Sebastian Vettel,57
4,Jim Clark,34
5,Alain Prost,33
6,Nigel Mansell,32
7,Nico Rosberg,30
8,Juan Fangio,29
9,Mika Häkkinen,26


##### Calculating race wins  

In [1798]:
#merging (1) dataframes [drivers, results], (2) creating a full name column and (3) selecting the important columns 

race_wins = drivers.merge(results,left_on='driverid',right_on='driverId',how = 'left')
race_wins['full_name'] = race_wins['forename'] + ' ' + race_wins['surname']

race_wins = race_wins[['full_name','position']]

# grouping by surname and counting the number of races won

highest_rw = race_wins[race_wins['position'] == 1.0].groupby('full_name').count().sort_values(
    'position', ascending = False).reset_index()
highest_rw.head(20)

Unnamed: 0,full_name,position
0,Lewis Hamilton,103
1,Michael Schumacher,91
2,Sebastian Vettel,53
3,Alain Prost,51
4,Ayrton Senna,41
5,Fernando Alonso,32
6,Nigel Mansell,31
7,Jackie Stewart,27
8,Max Verstappen,26
9,Niki Lauda,25


##### Calculating RacePole / GridPole 

In [1821]:
#merging the race wins alongside starting at grid position 1 

racexpole = highest_rw.merge(quali_wins,on = 'full_name',how = 'left')
racexpole = racexpole[racexpole['grid poles'] > 10]                    #setting minimum grid poles as > 10 
racexpole.dropna(inplace=True)                                         #dropping 28 null values

#cosmetic changes and calculation of racexgrid pole ratio

racexpole.rename(columns={'position':'race poles'},inplace=True)
racexpole['grid poles'] = racexpole['grid poles'].astype(int)
racexpole['racexgrid'] = (racexpole['race poles']/racexpole['grid poles']).round(2)
racexpole = racexpole.sort_values('racexgrid',ascending=False).reset_index(col_level=0)
racexpole.drop('index',axis = 1,inplace = True)
racexpole.head(15)

Unnamed: 0,full_name,race poles,grid poles,racexgrid
0,Max Verstappen,26,16,1.62
1,Jackie Stewart,27,17,1.59
2,Alain Prost,51,33,1.55
3,Fernando Alonso,32,22,1.45
4,Michael Schumacher,91,68,1.34
5,Kimi Räikkönen,21,18,1.17
6,Damon Hill,22,20,1.1
7,David Coulthard,13,12,1.08
8,Jack Brabham,14,13,1.08
9,Graham Hill,14,13,1.08


In [1833]:
df = racexpole.head(12)
fig = px.bar(df, x='full_name', y='racexgrid',
         hover_data=['race poles','grid poles'], color='racexgrid',
        height=400,color_continuous_scale= 'gray')
fig.update_layout(
    title="Race Poles/Grid Poles Ratio")
fig.update_traces(textfont_size=20,
              marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

The above chart indicates Max Verstappen has more wins (26) than pole positions (16). His innate ability to thrive under pressure, as we witnessed in the last lap of the 2021 season and his aggressive driving stance helps him win races even if he isn't the best in qualifying. This is also a case for why Schumacher would be considered the best ever because he's extracted those race points and championships from more challenging grid positions than Lewis Hamilton. 

### Worst tracks based on overtaking action

In [2190]:
tracks = circuits.merge(races, left_on='circuitid',right_on='circuitId',how = 'inner') 
tracks = tracks.merge(results,on = 'raceId',how = 'left')
tracks = tracks[['name_x','circuitid','driverId','position','grid','raceId','year']]
tracks.rename(columns={'name_x':'circuit'},inplace=True)
tracks.dropna(inplace = True)


tracks = tracks.loc[(tracks['year'] >= '2010-01-01')]
tracks

Unnamed: 0,circuit,circuitid,driverId,position,grid,raceId,year
302,Albert Park Grand Prix Circuit,1,18.0,1.0,4.0,338,2010-01-01
303,Albert Park Grand Prix Circuit,1,9.0,2.0,9.0,338,2010-01-01
304,Albert Park Grand Prix Circuit,1,13.0,3.0,5.0,338,2010-01-01
305,Albert Park Grand Prix Circuit,1,4.0,4.0,3.0,338,2010-01-01
306,Albert Park Grand Prix Circuit,1,3.0,5.0,6.0,338,2010-01-01
...,...,...,...,...,...,...,...
25623,Miami International Autodrome,79,817.0,13.0,14.0,1078,2022-01-01
25624,Miami International Autodrome,79,849.0,14.0,19.0,1078,2022-01-01
25625,Miami International Autodrome,79,854.0,15.0,15.0,1078,2022-01-01
25626,Miami International Autodrome,79,825.0,16.0,16.0,1078,2022-01-01


In [2191]:
tracks['position_status'] = np.where(tracks['position'] == tracks['grid'],1,0)
tracks

Unnamed: 0,circuit,circuitid,driverId,position,grid,raceId,year,position_status
302,Albert Park Grand Prix Circuit,1,18.0,1.0,4.0,338,2010-01-01,0
303,Albert Park Grand Prix Circuit,1,9.0,2.0,9.0,338,2010-01-01,0
304,Albert Park Grand Prix Circuit,1,13.0,3.0,5.0,338,2010-01-01,0
305,Albert Park Grand Prix Circuit,1,4.0,4.0,3.0,338,2010-01-01,0
306,Albert Park Grand Prix Circuit,1,3.0,5.0,6.0,338,2010-01-01,0
...,...,...,...,...,...,...,...,...
25623,Miami International Autodrome,79,817.0,13.0,14.0,1078,2022-01-01,0
25624,Miami International Autodrome,79,849.0,14.0,19.0,1078,2022-01-01,0
25625,Miami International Autodrome,79,854.0,15.0,15.0,1078,2022-01-01,1
25626,Miami International Autodrome,79,825.0,16.0,16.0,1078,2022-01-01,1


In [2192]:
#counting total number of races held in each track and attaching it to the previously created tracks column

total_races = races.loc[(races['year'] >= '2010-01-01')]
total_races = total_races.circuitId.value_counts().reset_index()
total_races.rename(columns={"index":'circuitid','circuitId':'num_races'},inplace=True)
tracks = tracks.merge(total_races,on='circuitid',how = 'left')  
tracks

Unnamed: 0,circuit,circuitid,driverId,position,grid,raceId,year,position_status,num_races
0,Albert Park Grand Prix Circuit,1,18.0,1.0,4.0,338,2010-01-01,0,11
1,Albert Park Grand Prix Circuit,1,9.0,2.0,9.0,338,2010-01-01,0,11
2,Albert Park Grand Prix Circuit,1,13.0,3.0,5.0,338,2010-01-01,0,11
3,Albert Park Grand Prix Circuit,1,4.0,4.0,3.0,338,2010-01-01,0,11
4,Albert Park Grand Prix Circuit,1,3.0,5.0,6.0,338,2010-01-01,0,11
...,...,...,...,...,...,...,...,...,...
4384,Miami International Autodrome,79,817.0,13.0,14.0,1078,2022-01-01,0,1
4385,Miami International Autodrome,79,849.0,14.0,19.0,1078,2022-01-01,0,1
4386,Miami International Autodrome,79,854.0,15.0,15.0,1078,2022-01-01,1,1
4387,Miami International Autodrome,79,825.0,16.0,16.0,1078,2022-01-01,1,1


In [2193]:
circuit_rating = tracks[tracks['num_races'] >= 5].groupby(['circuit','num_races'],as_index=False)['position_status'].value_counts()

#checking for tracks that had the least amount of overtaking action
boring = circuit_rating[circuit_rating['position_status'] == 1]
boring['boring_score'] = boring['count'].divide(boring['num_races'])
boring.rename(columns={'count':'count_unchanged_position'},inplace=True)
boring_tracks = boring.sort_values('boring_score',ascending=False).reset_index(col_level=0)
boring_tracks.drop('index',axis = 1,inplace = True)
boring_tracks



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,circuit,num_races,position_status,count_unchanged_position,boring_score
0,Yas Marina Circuit,13,1,48,3.692308
1,Hockenheimring,6,1,19,3.166667
2,Circuit de Barcelona-Catalunya,13,1,41,3.153846
3,Circuit of the Americas,10,1,30,3.0
4,Circuit de Monaco,12,1,35,2.916667
5,Marina Bay Street Circuit,11,1,32,2.909091
6,Autódromo Hermanos Rodríguez,7,1,20,2.857143
7,Shanghai International Circuit,10,1,28,2.8
8,Autódromo José Carlos Pace,12,1,33,2.75
9,Circuit Gilles Villeneuve,11,1,30,2.727273


In [2194]:
df = boring_tracks.head(10)
fig = px.bar(df, x='circuit', y='boring_score',
         hover_data=['num_races','count_unchanged_position'], color='num_races',
        height=400,color_continuous_scale= 'ice')
fig.update_layout(
    title="Worst Tracks to Overtake")
fig.update_traces(textfont_size=20,
              marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

The data is filtered to include only observations after 2009. Monaco, considered an iconic track for the generational history of F1, makes it to this list. Monaco GP is poorly designed for modern-day F1 cars, which are much bigger than the race cars of the 1990s. The lack of room in the track makes it extremely hard for drivers to overtake on race day.