# F1 2022 Data Analysis
### (with a little data science)

This Jupyter notebook has data analysis of driver stats. This currently uses the __Race Result__ data from the Formula 1 website (e.g. https://www.formula1.com/en/results.html/2022/races/1125/saudi-arabia/race-result.html).

The race results are saved in the data folder with a separate CSV file for each race.

__HAVE YOU READ THE README FILE? PLEASE DO BEFORE USING THIS JUPYTER NOTEBOOK!__

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import cufflinks as cf
import re
from datetime import datetime
import time

In [2]:
%matplotlib inline

In [3]:
cf.go_offline() #allows to use cufflinks offline

In [4]:
# dataframe for all race data
race_results = pd.DataFrame()

race_results = race_results.assign(POS = '', Driver = '', Car = '', Laps = '', Time = '', PTS = '', Race = '')
# rename the column header
race_results.columns = race_results.columns.str.replace('Time', 'Time/Retired')

race_results.head()

Unnamed: 0,POS,Driver,Car,Laps,Time/Retired,PTS,Race


In [39]:
race_results.info()

<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   POS           0 non-null      object
 1   Driver        0 non-null      object
 2   Car           0 non-null      object
 3   Laps          0 non-null      object
 4   Time/Retired  0 non-null      object
 5   PTS           0 non-null      object
 6   Race          0 non-null      object
dtypes: object(7)
memory usage: 0.0+ bytes


# Export race_results data frame to CSV
This will export data in the race_results data frame to a CSV file

In [49]:
race_results.to_csv('data/compiled-data/race-results.csv')

# Add Data Function

This is the function that's called to add race data to the race_results data frame.

It takes the following arguments:
* _country_df_ - the data frame for the race results to be added to the race_results data frame
* _country_name_ - the name of the country that the race was held in
* _race_results_ - the data frame that stores data from each race
* _commit_mode_ - if you want to see a preview of what data will be added to race_results, set this argument to False. If you want to add the data, set the argument to True.

In [5]:
def addRaceData(country_df, country_name, race_results, commit_mode):
    "Adds race data to the race_results data frame."
    driver_count = 0

    print("Adding data for " + country_name)
    print("Commit mode is set to " + str(commit_mode))

    try:
        while driver_count < len(country_df.index):
                pos = country_df.loc[driver_count, 'Pos']
                driverNo = country_df.loc[driver_count, 'No']
                name = country_df.loc[driver_count, 'Driver']
                car = country_df.loc[driver_count, 'Car']
                laps = country_df.loc[driver_count, 'Laps']
                time = country_df.loc[driver_count, 'Time/Retired']
                points = country_df.loc[driver_count, 'PTS']

                if(commit_mode):
                    # add row
                    race_results.loc[-1] = [pos, name, car, laps, time, points, country_name]
        
                else:
                    # print row
                    print(pos, name, car, laps, time, points, country_name)

                driver_count += 1

                # shift the index
                race_results.index = race_results.index + 1

    except:
        print("ERROR! Double check the arguments provided for the function.\nHave you imported the race data CSV?\nHas the race_results data frame been created?")

# Bahrain Data

In [6]:
# example of loading csv data
bahrain_df = pd.read_csv("data/BAHRAIN.csv")
bahrain_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,16,Charles Leclerc,Ferrari,57,37:33.6,26
1,2,55,Carlos Sainz,Ferrari,57,+5.598s,18
2,3,44,Lewis Hamilton,Mercedes,57,+9.675s,15
3,4,63,George Russell,Mercedes,57,+11.211s,12
4,5,20,Kevin Magnussen,Haas Ferrari,57,+14.754s,10


In [7]:
addRaceData(bahrain_df, "BAHRAIN", race_results, True)

Adding data for BAHRAIN
Commit mode is set to True


# Saudi Arabia Data

In [8]:
saudi_arabia_df = pd.read_csv('data/SAUDI Arabia.csv')
saudi_arabia_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,50,24:19.3,25
1,2,16,Charles Leclerc,Ferrari,50,+0.549s,19
2,3,55,Carlos Sainz,Ferrari,50,+8.097s,15
3,4,11,Sergio Perez,Red Bull Racing RBPT,50,+10.800s,12
4,5,63,George Russell,Mercedes,50,+32.732s,10


In [9]:
addRaceData(saudi_arabia_df, "SAUDI ARABIA", race_results, True)

Adding data for SAUDI ARABIA
Commit mode is set to True


# Australia Data

In [10]:
australia_df = pd.read_csv('data/AUSTRALIA.csv')
australia_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,16,Charles Leclerc,Ferrari,58,27:46.5,26
1,2,11,Sergio Perez,Red Bull Racing RBPT,58,+20.524s,18
2,3,63,George Russell,Mercedes,58,+25.593s,15
3,4,44,Lewis Hamilton,Mercedes,58,+28.543s,12
4,5,4,Lando Norris,McLaren Mercedes,58,+53.303s,10


In [11]:
addRaceData(australia_df, "AUSTRALIA", race_results, True)

Adding data for AUSTRALIA
Commit mode is set to True


# Italy Data

In [12]:
italy_df = pd.read_csv('data/ITALY.csv')
italy_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,63,32:08.0,26
1,2,11,Sergio Perez,Red Bull Racing RBPT,63,+16.527s,18
2,3,4,Lando Norris,McLaren Mercedes,63,+34.834s,15
3,4,63,George Russell,Mercedes,63,+42.506s,12
4,5,77,Valtteri Bottas,Alfa Romeo Ferrari,63,+43.181s,10


In [13]:
addRaceData(italy_df, "ITALY", race_results, True)

Adding data for ITALY
Commit mode is set to True


# Miami Data

In [14]:
miami_df = pd.read_csv('data/MIAMI.csv')
miami_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,57,34:24.3,26
1,2,16,Charles Leclerc,Ferrari,57,+3.786s,18
2,3,55,Carlos Sainz,Ferrari,57,+8.229s,15
3,4,11,Sergio Perez,Red Bull Racing RBPT,57,+10.638s,12
4,5,63,George Russell,Mercedes,57,+18.582s,10


In [15]:
addRaceData(miami_df, "MIAMI", race_results, True)

Adding data for MIAMI
Commit mode is set to True


# Spain Data

In [16]:
spain_df = pd.read_csv('data/SPAIN.csv')
spain_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,66,37:20.5,25
1,2,11,Sergio Perez,Red Bull Racing RBPT,66,+13.072s,19
2,3,63,George Russell,Mercedes,66,+32.927s,15
3,4,55,Carlos Sainz,Ferrari,66,+45.208s,12
4,5,44,Lewis Hamilton,Mercedes,66,+54.534s,10


In [17]:
addRaceData(spain_df, "SPAIN", race_results, True)

Adding data for SPAIN
Commit mode is set to True


# Monaco Data

In [18]:
monaco_df = pd.read_csv('data/MONACO.csv')
monaco_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,11,Sergio Perez,Red Bull Racing RBPT,64,56:30.3,25
1,2,55,Carlos Sainz,Ferrari,64,+1.154s,18
2,3,1,Max Verstappen,Red Bull Racing RBPT,64,+1.491s,15
3,4,16,Charles Leclerc,Ferrari,64,+2.922s,12
4,5,63,George Russell,Mercedes,64,+11.968s,10


In [19]:
addRaceData(monaco_df, "MONACO", race_results, True)

Adding data for MONACO
Commit mode is set to True


# Azerbaijan Data

In [20]:
azerbaijan_df = pd.read_csv('data/AZERBAIJAN.csv')
azerbaijan_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,51,34:05.9,25
1,2,11,Sergio Perez,Red Bull Racing RBPT,51,+20.823s,19
2,3,63,George Russell,Mercedes,51,+45.995s,15
3,4,44,Lewis Hamilton,Mercedes,51,+71.679s,12
4,5,10,Pierre Gasly,AlphaTauri RBPT,51,+77.299s,10


In [21]:
addRaceData(azerbaijan_df, "AZERBAIJAN", race_results, True)

Adding data for AZERBAIJAN
Commit mode is set to True


# Canada Data

In [22]:
canada_df = pd.read_csv('data/CANADA.csv')
canada_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,70,36:21.8,25
1,2,55,Carlos Sainz,Ferrari,70,+0.993s,19
2,3,44,Lewis Hamilton,Mercedes,70,+7.006s,15
3,4,63,George Russell,Mercedes,70,+12.313s,12
4,5,16,Charles Leclerc,Ferrari,70,+15.168s,10


In [23]:
addRaceData(canada_df, "CANADA", race_results, True)

Adding data for CANADA
Commit mode is set to True


# Great Britain

In [24]:
gb_df = pd.read_csv('data/GREATBRITAIN.csv')
gb_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,55,Carlos Sainz,Ferrari,52,17:50.3,25
1,2,11,Sergio Perez,Red Bull Racing RBPT,52,+3.779s,18
2,3,44,Lewis Hamilton,Mercedes,52,+6.225s,16
3,4,16,Charles Leclerc,Ferrari,52,+8.546s,12
4,5,14,Fernando Alonso,Alpine Renault,52,+9.571s,10


In [25]:
addRaceData(gb_df, "GREAT BRITAIN", race_results, True)

Adding data for GREAT BRITAIN
Commit mode is set to True


# Austria

In [26]:
austria_df = pd.read_csv('data/AUSTRIA.csv')
austria_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,16,Charles Leclerc,Ferrari,71,24:24.3,25
1,2,1,Max Verstappen,Red Bull Racing RBPT,71,+1.532s,19
2,3,44,Lewis Hamilton,Mercedes,71,+41.217s,15
3,4,63,George Russell,Mercedes,71,+58.972s,12
4,5,31,Esteban Ocon,Alpine Renault,71,+68.436s,10


In [27]:
addRaceData(austria_df, "AUSTRIA", race_results, True)

Adding data for AUSTRIA
Commit mode is set to True


# France

In [28]:
france_df = pd.read_csv('data/FRANCE.csv')
france_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,53,30:02.1,25
1,2,44,Lewis Hamilton,Mercedes,53,+10.587s,18
2,3,63,George Russell,Mercedes,53,+16.495s,15
3,4,11,Sergio Perez,Red Bull Racing RBPT,53,+17.310s,12
4,5,55,Carlos Sainz,Ferrari,53,+28.872s,11


In [29]:
addRaceData(france_df, "FRANCE", race_results, True)

Adding data for FRANCE
Commit mode is set to True


# Hungary

In [30]:
hungary_df = pd.read_csv('data/HUNGARY.csv')
hungary_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,70,39:35.9,25
1,2,44,Lewis Hamilton,Mercedes,70,+7.834s,19
2,3,63,George Russell,Mercedes,70,+12.337s,15
3,4,55,Carlos Sainz,Ferrari,70,+14.579s,12
4,5,11,Sergio Perez,Red Bull Racing RBPT,70,+15.688s,10


In [31]:
addRaceData(hungary_df, "HUNGARY", race_results, True)

Adding data for HUNGARY
Commit mode is set to True


# Belguim

In [32]:
belguim_df = pd.read_csv('data/BELGUIM.csv')
belguim_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,44,1:25:52.894,26
1,2,11,Sergio Perez,Red Bull Racing RBPT,44,+17.841s,18
2,3,55,Carlos Sainz,Ferrari,44,+26.886s,15
3,4,63,George Russell,Mercedes,44,+29.140s,12
4,5,14,Fernando Alonso,Alpine Renault,44,+73.256s,10


In [33]:
addRaceData(belguim_df, "BELGUIM", race_results, True)

Adding data for BELGUIM
Commit mode is set to True


# Netherlands

In [34]:
netherlands_df = pd.read_csv('data/NETHERLANDS.csv')
netherlands_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,72,1:36:42.773,26
1,2,63,George Russell,Mercedes,72,+4.071s,18
2,3,16,Charles Leclerc,Ferrari,72,+10.929s,15
3,4,44,Lewis Hamilton,Mercedes,72,+13.016s,12
4,5,11,Sergio Perez,Red Bull Racing RBPT,72,+18.168s,10


In [35]:
addRaceData(netherlands_df, "NETHERLANDS", race_results, True)

Adding data for NETHERLANDS
Commit mode is set to True


# Italy 2

In [36]:
italy2_df = pd.read_csv('data/ITALY2.csv')
italy2_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,53,1:20:27.511,25
1,2,16,Charles Leclerc,Ferrari,53,+2.446s,18
2,3,63,George Russell,Mercedes,53,+3.405s,15
3,4,55,Carlos Sainz,Ferrari,53,+5.061s,12
4,5,44,Lewis Hamilton,Mercedes,53,+5.380s,10


In [37]:
addRaceData(italy2_df, "ITALY2", race_results, True)

Adding data for ITALY2
Commit mode is set to True


# Singapore

In [38]:
singapore_df = pd.read_csv('data/SINGAPORE.csv')
singapore_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,11,Sergio Perez,Red Bull Racing RBPT,59,2:02:20.238,25
1,2,16,Charles Leclerc,Ferrari,59,+2.595s,18
2,3,55,Carlos Sainz,Ferrari,59,+10.305s,15
3,4,4,Lando Norris,McLaren Mercedes,59,+21.133s,12
4,5,3,Daniel Ricciardo,McLaren Mercedes,59,+53.282s,10


In [39]:
addRaceData(singapore_df, "SINGAPORE", race_results, True)

Adding data for SINGAPORE
Commit mode is set to True


# Japan

In [40]:
japan_df = pd.read_csv('data/JAPAN.csv')
japan_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,28,3:01:44.004,25
1,2,11,Sergio Perez,Red Bull Racing RBPT,28,+27.066s,18
2,3,16,Charles Leclerc,Ferrari,28,+31.763s,15
3,4,31,Esteban Ocon,Alpine Renault,28,+39.685s,12
4,5,44,Lewis Hamilton,Mercedes,28,+40.326s,10


In [41]:
addRaceData(japan_df, "JAPAN", race_results, True)

Adding data for JAPAN
Commit mode is set to True


# USA

In [42]:
usa_df = pd.read_csv('data/USA.csv')
usa_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,56,1:42:11.687,25
1,2,44,Lewis Hamilton,Mercedes,56,+5.023s,18
2,3,16,Charles Leclerc,Ferrari,56,+7.501s,15
3,4,11,Sergio Perez,Red Bull Racing RBPT,56,+8.293s,12
4,5,63,George Russell,Mercedes,56,+44.815s,11


In [43]:
addRaceData(usa_df, "USA", race_results, True)

Adding data for USA
Commit mode is set to True


# Mexico

In [44]:
mexico_df = pd.read_csv('data/MEXICO.csv')
mexico_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,71,1:38:36.729,25
1,2,44,Lewis Hamilton,Mercedes,71,+15.186s,18
2,3,11,Sergio Perez,Red Bull Racing RBPT,71,+18.097s,15
3,4,63,George Russell,Mercedes,71,+49.431s,13
4,5,55,Carlos Sainz,Ferrari,71,+58.123s,10


In [45]:
addRaceData(mexico_df, "MEXICO", race_results, True)

Adding data for MEXICO
Commit mode is set to True


# Brazil

In [46]:
brazil_df = pd.read_csv('data/BRAZIL.csv')
brazil_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,63,George Russell,Mercedes,71,1:38:34.044,26
1,2,44,Lewis Hamilton,Mercedes,71,+1.529s,18
2,3,55,Carlos Sainz,Ferrari,71,+4.051s,15
3,4,16,Charles Leclerc,Ferrari,71,+8.441s,12
4,5,14,Fernando Alonso,Alpine Renault,71,+9.561s,10


In [48]:
addRaceData(brazil_df, 'BRAZIL', race_results, True)

Adding data for BRAZIL
Commit mode is set to True


# Abu Dhabi

In [50]:
abu_dhabi_df = pd.read_csv('data/ABU DHABI.csv')
abu_dhabi_df.head()

Unnamed: 0,Pos,No,Driver,Car,Laps,Time/Retired,PTS
0,1,1,Max Verstappen,Red Bull Racing RBPT,58,1:27:45.914,25
1,2,16,Charles Leclerc,Ferrari,58,+8.771s,18
2,3,11,Sergio Perez,Red Bull Racing RBPT,58,+10.093s,15
3,4,55,Carlos Sainz,Ferrari,58,+24.892s,12
4,5,63,George Russell,Mercedes,58,+35.888s,10


In [52]:
addRaceData(abu_dhabi_df, 'ABU DHABI', race_results, True)

Adding data for ABU DHABI
Commit mode is set to True
