<a href="https://www.kaggle.com/code/sanketdevhare98/f1-eda?scriptVersionId=120051663" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **What is F1?**

Formula One is one of the most popular sports in the world. It is the highest class of international racing for single-seater formula racing cars. Formula One is sanctioned by the Fédération Internationale de l’Automobile (FIA) which was established on 20 June 1904. Formula One was inaugurated on 13 May 1950 as the World Drivers’ Championship at Silverstone in the United Kingdom. In 1981 it became known as the FIA Formula One World Championship.

Several races called Grand Prix are held all over the world over a season. These races taken together are called a Formula One season. The word ‘Formula” refers to a set of rules that all participating teams have to adhere to. Grand Prix is a French word that translates as grand prize in English. The races are run of tracks that are graded “1” by the FIA. Hence the name Formula One was adopted.

The races take place on purpose-built tracks certified by the FIA. Most tracks are situated in remote locations well connected with cities. There are a few races such as the British Grand Prix and the Singapore Grand Prix that are held on closed public roads. Formula One is one of the premium forms of racing around the world and draws huge audiences.

A driver participating in a Formula One race should hold a valid Super Licence issued by the FIA. The performance of the drivers and the constructors of the car are evaluated at the end of each race by a points system. At the end of a season, the FIA aggregates the points scored by each and awards two annual World Championships: one each for the drivers and the constructors.

# **Importing Required Library**

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
import datetime 

# **Getting the Data**

In [2]:
circuits = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/circuits.csv')
laptimes = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/lap_times.csv')
pitstops = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/pit_stops.csv')
seasons = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/seasons.csv' , parse_dates = ['year'])
status = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/status.csv')
constructor_standings = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/constructor_standings.csv')
constructors = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/constructors.csv')
driver_standings = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/driver_standings.csv')
drivers = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/drivers.csv')
races = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/races.csv', parse_dates = ['year'])
constructor_results = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/constructor_results.csv')
results = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/results.csv')
qualifying = pd.read_csv('/kaggle/input/formula-1-world-championship-1950-2020/qualifying.csv')

# **Who are the Most successful constructors ?**

Constructors in Formula 1 are the companies, entities, or manufacturers that are responsible for developing and constructing the cars used for racing in F1. Each constructor is represented by a racing team and two racing drivers that compete in every Grand Prix to win World Championship points.

Formula 1 constructors are a vital aspect of the sport. The constructors are the teams behind the cars and the drivers, and without the constructor’s championship, no team would push as hard for success as they do

In [3]:
#merging the constructors dataframe with race results
team = constructors.merge(results,on = 'constructorId',how = 'left')

In [4]:
#extracting the columns needed and grouping it by constructor name, extracting the total races entered
best = team[['name','points','raceId']]
best = best.groupby('name')['raceId'].nunique().sort_values(ascending = False).reset_index(name = 'races')
best = best[best['races'] >= 100]
best.head()

Unnamed: 0,name,races
0,Ferrari,1054
1,McLaren,883
2,Williams,797
3,Tyrrell,433
4,Renault,403


In [5]:
#building a formula to calculate points per race 
func = lambda x:x.points.sum()/x.raceId.nunique()
data = team[team['name'].isin(best.name)].groupby('name').apply(func).sort_values(ascending = False).reset_index(name = 'points_per_race')
data.head(10)

Unnamed: 0,name,points_per_race
0,Mercedes,25.548487
1,Red Bull,18.235632
2,Ferrari,9.615057
3,McLaren,6.962061
4,Force India,5.179245
5,Williams,4.514429
6,Renault,4.409429
7,Benetton,3.313462
8,BRM,2.584135
9,Team Lotus,2.518987


In [6]:
#plotting the results
fig = go.Figure(data = [go.Bar(x = data.name,y = data['points_per_race'])],layout_title_text = "constructor's Points Per Race")
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.show()

Mercedes and Redbull have been highly consistent in the past decade, reflecting their points per race stat. On the contrary, Ferrari is yet to win a world championship since 2008. An interesting observation is Force India; considering it was a small-scale budget team compared to the giants like Mercedes and Ferrari, it did fantastic work on the track with an average of five points per race.

In [7]:
#calculating historic overall points of top 10 constructors
historic_points = team.groupby('name').agg({'points':'sum'}).sort_values('points',ascending = False).reset_index().head(10)
historic_points

Unnamed: 0,name,points
0,Ferrari,10134.27
1,Mercedes,6923.64
2,Red Bull,6346.0
3,McLaren,6147.5
4,Williams,3598.0
5,Renault,1777.0
6,Force India,1098.0
7,Team Lotus,995.0
8,Benetton,861.5
9,Tyrrell,711.0


In [8]:
#plotting a bar chart
fig = go.Figure(
    data=[go.Bar(x = historic_points.name, y=historic_points['points'])],
    layout_title_text="Constructor's Historic Points"
)
fig.update_traces(textfont_size=20,
                  marker=dict(line=dict(color='#000000', width=2)))
fig.show()

The chart's most impressive is Mercedes AMG Petronas being second, considering they entered Formula in 2010. They've covered two-thirds of Ferrari's historical points in twelve years, an astonishing feat.

# **Who are some of the best F1 drivers ?**

Seventy-two years of F1 history has been 34 different champions. Cars over the years been tremendous changes. From the early 2000s V10 screaming notes to the exceptionally engineered and safer V6, the sport has considerably changed in the last 20 years. A few drivers on the grid witnessed their best in one era before giving way to younger drivers to take over. Let's look at the first chart that depicts historic champion distribution by nation.



**Distribution by Geography**

In [9]:
# grouping by nationality, counting the driver and plotting a pie chart

driver_nationality = drivers.groupby('nationality')['nationality'].count().sort_values(ascending = False).reset_index(name = 'number of drivers')
fig = go.Figure(data=[go.Pie(labels=driver_nationality.nationality.head(10), values=driver_nationality['number of drivers'])])
fig.update_traces(textfont_size=18,marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(title="Historical Driver Nationality Distribution since 1950")
fig.show()

In [10]:
#merging drivers, driver standings and race data 

driver_position = drivers.merge(driver_standings,left_on='driverId',right_on='driverId',how = 'left')
driver_position = driver_position.merge(races,on = 'raceId',how = 'left')


In [11]:
#grouping by nationality year and surname to get the max points achieved every season

champions = driver_position.groupby(['nationality','year','surname'])[['points','wins']].max().sort_values('points',ascending = False).reset_index()
champions.drop_duplicates(subset=['year'], inplace=True)

In [12]:
# counting the number of times a nation ended in P1 and plotting a pie chart

champion_nations = champions.nationality.value_counts().to_frame()
fig = go.Figure(data=[go.Pie(labels=champion_nations.index, values=champion_nations['nationality'])])
fig.update_traces(textfont_size=18,marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(title="Distribution of Historic Champions by Nation")
fig.show()

To understand why F1 saw the most British drivers and champions, we'll have to transcend back to WWII and the prevailing aerial duels over the British channel against the Germans. Due to the constant aerial warfare, the British were forced to build massive airfields to defend against the Germans. Post-WWII and the fall of Nazi Germany made these airfields utterly unserviceable until a band of British motor enthusiasts decided to turn these airfields into creative race tracks. Soon enough, this attracted race car drivers and engineers who worked on complex fighter jet engines in the war to build the best race cars and test them on the now-converted race tracks. One of the airfields went on to become the "Mecca" of racing, The Silverstone Circuit. Over the years, the influx of racing talent to Britain has seen many F1 teams set up their headquarters in the region. 6 out of 10 constructors in 2022 have their offices in the United Kingdom.

In [13]:
#grouping by nationality year and surname to get the max points achieved every season and dropping year duplicates

champion_drivers = driver_position.groupby(['nationality','year','surname'])[['points','wins']].max().sort_values('points',ascending = False).reset_index()
champion_drivers.drop_duplicates(subset=['year'], inplace=True)

#grouping by nationality and counting the surname of drivers 

final = champion_drivers.groupby('nationality')['surname'].nunique().reset_index(name = 'champions').sort_values(by='champions',ascending = False)

#merging both the datasets and creating a column to calculate the ratio

ratios = final.merge(driver_nationality,on='nationality',how='inner')
ratios['perc_winners'] = (ratios.champions/ratios['number of drivers']*100).round(2)
ratios = ratios.sort_values('perc_winners',ascending = False)
ratios.head(5) 

Unnamed: 0,nationality,champions,number of drivers,perc_winners
2,Finnish,3,9,33.33
6,Austrian,2,15,13.33
5,Australian,2,18,11.11
12,New Zealander,1,9,11.11
1,Brazilian,3,32,9.38


In [14]:
#creating a bar chart

df = ratios
fig = px.bar(df, x='nationality', y='perc_winners',hover_data=['champions','number of drivers'], color='number of drivers',height=400)
fig.update_traces(textfont_size=20,marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(title="Champions from a nation with respect to total drivers from the nation")
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)

Can we put the success of the Finnish in F1 down to chance and luck with just nine drivers entering the grid, or is it much more than that? If we start digging deeper, we realise what makes a country of just five million people such good racing drivers. One reason could be the exposure to automobiles at a young age because the skillset required to drive in the harsh cold terrains of Finnish lands needs to be introduced as early as possible. This was also quoted by Kimi Raikonnen (The last Ferrari world champion and Finnish driver), "Our roads and long winters. You really have to be a good driver to survive in Finland. It is always slippery and bumpy." The second reason is Finland has a lively racing culture amongst the old and the young, with the country having nearly 20 official carting tracks, one of the highest per capita in the world. Motor racing requires a level of composure that's uncalled for in regular day-to-day instances, mainly due to blasting down the straights and curves at over 250kmph. The Finnish seem to have a knack for this in their blood; they call it Sisu. In English, it loosely translates into a stoic sense of determination and purpose while accepting the outcome without judgement. They call Kimi Raikonnen "The Iceman" for no reason; the Finnish drivers are racers of few words. When Kimi's car caught fire in the 2006 Monaco GP while fighting for first place, he simply strolled out of the car and onto his yacht and watched the race while having beers and champagne.

# **Most wins by a driver in a single season**

In [15]:
#merging driver data, their standings and race data

driver_position = drivers.merge(driver_standings,left_on='driverId',right_on='driverId',how = 'left')
driver_position = driver_position.merge(races,on = 'raceId',how = 'left')

In [16]:
#filtering the dataset to include only where the position is 1 and grouping by name, year and extracting the max wins

positions = driver_position[driver_position['position'] == 1].groupby(['surname','year'])['wins'].max().sort_values(ascending=False).reset_index(name = 'Wins')
positions.head(20)
positions.year = positions.year.dt.year
positions.rename(columns={'surname':'name'},inplace=True)
positions.Wins = positions.Wins.astype('int64')

positions.head(20)

Unnamed: 0,name,year,Wins
0,Verstappen,2022,15
1,Vettel,2013,13
2,Schumacher,2004,13
3,Hamilton,2019,11
4,Hamilton,2020,11
5,Hamilton,2018,11
6,Hamilton,2014,11
7,Vettel,2011,11
8,Schumacher,2002,11
9,Hamilton,2015,10


In [17]:
#plotting a bubble chart

fig = px.scatter(positions.head(30), x="year", y="Wins", color="name",title="Most wins by a driver in a single season",size = 'Wins')
fig.update_traces(textfont_size=20,marker=dict(line=dict(color='#000000', width=2)))
fig.update_xaxes(showgrid=False)
fig.show()

Vettel and Schumacher have a combined record for most wins in a single season. Both were utterly dominant in the years 2004 and 2013, respectively. The case for the best F1 driver is strong for both Schumacher (blue) and Hamilton (green), and it wouldn't be wrong to choose either as the best F1 driver of all time, but does the number of championships all that matters? Some argue that the calibre of racers the champion faced makes them the best, Senna vs Alain Prost, Hamilton vs Rosberg. Well, the best-ever talk is subjective in most modern day sports, so I'll leave that up to you to decide. I've written a python script (link below the blog) that classifies a season as one of the greatest rivalries or the season being meh based on the year input. Maybe this way the greatest ever could be decided according to you