# F1 2022 Data Analysis
### (with a little data science)

This Jupyter notebook has data analysis of driver stats. This currently uses the __Race Result__ data from the Formula 1 website (e.g. https://www.formula1.com/en/results.html/2022/races/1125/saudi-arabia/race-result.html).

The race results are saved in the data folder with a separate CSV file for each race.

__HAVE YOU READ THE README FILE? PLEASE DO BEFORE USING THIS JUPYTER NOTEBOOK!__

In [6]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import cufflinks as cf

In [7]:
%matplotlib inline

In [8]:
cf.go_offline() #allows to use cufflinks offline

In [40]:
# dataframe for all race data
race_results = pd.DataFrame()

race_results = race_results.assign(Pos = '', No = '', Driver = '', Car = '', Laps = '', Time = '', PTS = '')
# rename the column header
race_results.columns = race_results.columns.str.replace('Time', 'Time/Retired')

race_results.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS


In [42]:
# example of loading csv data
bahrain_df = pd.read_csv("data/BAHRAIN.csv")
bahrain_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,16,Charles Leclerc,Ferrari,57,37:33.6,26
1,2,55,Carlos Sainz,Ferrari,57,+5.598s,18
2,3,44,Lewis Hamilton,Mercedes,57,+9.675s,15
3,4,63,George Russell,Mercedes,57,+11.211s,12
4,5,20,Kevin Magnussen,Haas Ferrari,57,+14.754s,10


In [49]:
bahrain_df.describe()

Unnamed: 0,No,Laps,PTS
count,20.0,20.0,20.0
mean,25.8,56.15,5.1
std,21.142623,2.942877,7.503683
min,1.0,44.0,0.0
25%,10.75,57.0,0.0
50%,21.0,57.0,0.5
75%,34.25,57.0,8.5
max,77.0,57.0,26.0


In [50]:
bahrain_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Pos           20 non-null     object
 1   No            20 non-null     int64 
 2   Driver        20 non-null     object
 3   Car           20 non-null     object
 4   Laps          20 non-null     int64 
 5   Time/Retired  20 non-null     object
 6   PTS           20 non-null     int64 
dtypes: int64(3), object(4)
memory usage: 1.2+ KB


In [10]:
saudi_arabia_df = pd.read_csv('data/SAUDI Arabia.csv')
saudi_arabia_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,50,24:19.3,25
1,2,16,Charles Leclerc,Ferrari,50,+0.549s,19
2,3,55,Carlos Sainz,Ferrari,50,+8.097s,15
3,4,11,Sergio Perez,Red Bull Racing RBPT,50,+10.800s,12
4,5,63,George Russell,Mercedes,50,+32.732s,10


In [51]:
saudi_arabia_df.describe()

Unnamed: 0,No,Laps,PTS
count,20.0,20.0,20.0
mean,25.8,40.8,5.1
std,21.142623,16.637307,7.454423
min,1.0,0.0,0.0
25%,10.75,35.75,0.0
50%,21.0,50.0,0.5
75%,34.25,50.0,8.5
max,77.0,50.0,25.0


In [52]:
saudi_arabia_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Pos           20 non-null     object
 1   No            20 non-null     int64 
 2   Driver        20 non-null     object
 3   Car           20 non-null     object
 4   Laps          20 non-null     int64 
 5   Time/Retired  20 non-null     object
 6   PTS           20 non-null     int64 
dtypes: int64(3), object(4)
memory usage: 1.2+ KB


In [13]:
australia_df = pd.read_csv('data/AUSTRALIA.csv')
australia_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,16,Charles Leclerc,Ferrari,58,27:46.5,26
1,2,11,Sergio Perez,Red Bull Racing RBPT,58,+20.524s,18
2,3,63,George Russell,Mercedes,58,+25.593s,15
3,4,44,Lewis Hamilton,Mercedes,58,+28.543s,12
4,5,4,Lando Norris,McLaren Mercedes,58,+53.303s,10


In [53]:
australia_df.describe()

Unnamed: 0,No,Laps,PTS
count,20.0,20.0,20.0
mean,24.7,52.1,5.1
std,21.64328,14.962761,7.503683
min,1.0,1.0,0.0
25%,9.0,57.0,0.0
50%,19.0,58.0,0.5
75%,34.25,58.0,8.5
max,77.0,58.0,26.0


In [54]:
australia_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Pos           20 non-null     object
 1   No            20 non-null     int64 
 2   Driver        20 non-null     object
 3   Car           20 non-null     object
 4   Laps          20 non-null     int64 
 5   Time/Retired  20 non-null     object
 6   PTS           20 non-null     int64 
dtypes: int64(3), object(4)
memory usage: 1.2+ KB


In [16]:
df_list = []
df_list.append(bahrain_df)
df_list.append(saudi_arabia_df)
df_list.append(australia_df)

In [41]:
pd.concat([race_results, australia_df])

race_results.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
