# Dataset Preparation

## Libraries

For the dataset preparation, since we will be doing web scrapping from several sites, requesting api access and also using Azure databases we will require several libraries stated bellow:

In [258]:
from bs4 import BeautifulSoup
import requests
import re
import urllib.request

import pyodbc

import numpy as np
import pandas as pd

In [259]:
# settings to display columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)

1. Web Scrappping: bs4, requests, re and  urllib
2. API requests: requests
3. Azure access: pyobdc

## Functions

To gather information from the web we will require to repeat the same code several times, meaning that we will extract the same information for each year (2015 to 2021). The reason why we haven't coded only one function is because the website structures do change all over the years and this way we will have more control over the information we will be extracting.

We have prepared 4 functions to help extracting the information each year:
1. Links_Extraction
2. fia_f1_data
3. fia_f1_session
4. f1_gp_circuits

Let's cover a brief description on what they do.

### Links_Extraction

This function is used to extract all the relevant links to pass to another function for extraction. The output is a list.

We have 2 paramenters **url** which is the page we want to know the links available, lets say https://www.fia.com/f1-archives?season=1108 and since these pages have a lot of links we are only interested in a few, we added another parameter called **url_string**, where we pass a string contained in the url that we want. In case we want the mais information from the race we know thta those links contain the string *'race-classification'* so we would search only for those.

The function is defined below.


In [260]:

def Links_Extraction(url, url_string):
    '''
    Description: Function to extract links from a given url and store it to a list
    Parameters:
        url: Webpage to search for any given links
        url_string: Search for links that contain a specific string
    Usage: 
        Links_Extraction(url = 'https://www.fia.com/f1-archives', url_string = 'session')
    '''
    
    response = requests.get(url)
    html_document = response.text
    soup = BeautifulSoup(html_document, 'html.parser')
    
    links = []
    for link in soup.find_all('a', attrs={'href': re.compile(url_string)}):
        links.append(link.get('href'))
    
    return links

So let us try this code with the link provided above and for the *race classification* links:

In [261]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=1108', url_string = 'race-classification')

['/events/fia-formula-one-world-championship/season-2021/bahrain-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/emilia-romagna-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/portuguese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/spanish-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/monaco-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/azerbaijan-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/french-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/styrian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/austrian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/british-grand-prix/race-classification',
 '/event

If need we can tweek the output to just the relevant pages, meaning that during the COVID outbreak there were some cancelations and some grand prix pages were setup, but no data was added since they were cancelled. 

This function can the be apllyed to any page be applied to any page.

### f1_gp_circuits

This function extracts the date of the actual Grand Prix (GP) and the actual name of the race from the espn.com website - https://www.espn.com/f1/schedule/_/year/2022. With this information, we will use it to check the links we will feed on the other functions.

This function will do a small prerocessing on the race field in order to prepare a colummn called GP with a simple string to identify the Grand Prix, which will be used to merge more data at a later time.

This function only uses one parameter - season and by default is 2022

In [262]:
# grand prix country list taken by date of ocorrence
def f1_gp_circuits(season = 2022):
    '''
    Description: Extracts the Race Name and GP and ordered by date of occurence from espn.com
    Parameters:
        season: Year to which we want to retrieve information on the Grand Prix. By default season is set to 2022
    Usage:
        f1_gp_circuits(season = 2021)
    '''
    print('Season: ' + str(season) + ' | Source: espn.com')
    url = f'https://www.espn.com/f1/schedule/_/year/{season}'

    source = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(source,'lxml')
    table = soup.find_all('table')[0]
    df = pd.read_html(str(table), flavor='bs4', header=[0])[0]
    df.drop(['Winner/Lights Out','TV'],axis=1, inplace=True)

    gp = df['Race'].str.split(' GP',n = 1, expand = True)
    df['GP'] = gp[0].str.lower()
    # correction on the names that will feed the links
    df['GP'] = df['GP'].str.replace('socar ', '')
    df['GP'] = df['GP'].str.replace('vtb ', '')
    df['GP'] = df['GP'].str.replace('arabian', 'arabia')
    df['GP'] = df['GP'].str.replace('etihad airways ', '')
    df['GP'] = df['GP'].str.replace(' ', '-')
    df['GP'] = df['GP'].str.replace('singapore-airlines-singapore-gmarina-bay-street-circuit', 'singapore')
    df['GP'] = df['GP'].str.replace('mercedes-benz-german', 'german')
    df['GP'] = df['GP'].str.replace('rolex-british','british')
    return df

We can use this function by simply choosing a season.

In [263]:
f1_gp_circuits(season = 2019)

Season: 2019 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 15 - 17,Australian GPMelbourne Grand Prix Circuit,australian
1,Mar 29 - 31,Bahrain GPBahrain International Circuit,bahrain
2,Apr 12 - 14,Chinese GPShanghai International Circuit,chinese
3,Apr 26 - 28,Socar Azerbaijan GPBaku City Circuit,azerbaijan
4,May 10 - 12,Spanish GPCircuit de Barcelona-Catalunya,spanish
...,...,...,...
16,Oct 11 - 13,Japanese GPSuzuka International Racing Course,japanese
17,Oct 25 - 27,Mexican GPAutodromo Hermanos Rodriguez,mexican
18,Nov 1 - 3,United States GPCircuit of the Americas,united-states
19,Nov 15 - 17,Brazilian GPAutodromo Jose Carlos Pace,brazilian


### fia_f1 data and season functions

Although, these are two different functions they work on the same principle, we feed them with the **season**, **gp_city** and the **gp_links** list we have prepared previously with the function Links_Extraction.

#### fia_f1_data

This function will extract data from the race classification page, regarding team, pilots, classification fastest laps, best sector times, speed traps, maximum speeds and pit stops for each GP.

#### fia_f1_season

This function will extract data from the season classification page, regarding initial qualification and grid for each GP.

Both functions use BeautifulSoup package to extract information from the tables the page holds. That information is selected and renamed appropriately for each GP link we feed. In the end we have a data set that holds all this information for all the GP contained in that season.

Check the code below for the two functions:

In [264]:
# function to prepare dataset by season
def fia_f1_data(season, gp_city, gp_link):
    '''
    Description: Prepare dataset by season for team, pilots, classification fastest laps, best sector times,
    speed traps, maximum speeds and pit stops for each GP
    Parmeters:
        season: corresponding season (year)
        gp_city: list generated by f1_gp_circuits function
        gp_links: list of links corresponding to different races for the same season
    Usage:
        df_2021_data = fia_f1_session(2021, gp_city = city_21, gp_link = links_data_21)
    '''
    print('Season: ' + str(season) + ' | Source: fia.com')

    dataset = pd.DataFrame()
    n = 0
    for i in gp_link:
        url = f'https://www.fia.com{i}'
        print('Circuit: ' + str(gp_city[n]) + ' | ' + url)
        
        source = urllib.request.urlopen(url).read()
        soup = BeautifulSoup(source,'lxml')

        # classification
        table_class = soup.find_all('table')[0] 
        df_class = pd.read_html(str(table_class), flavor='bs4', header=[1])[0]
        df_class.drop(['Unnamed: 5'],axis=1, inplace=True)
        name = df_class['DRIVER'].str.split(' ',n = 1, expand = True)
        df_class['DRIVER_SNAME'] = name[0].astype(str).str[0] + '. ' + name[1]
        df_class['GRAND_PRIX'] = gp_city[n].capitalize()
        df_class['SEASON'] = season
        df_class = df_class.rename (columns={'POS': 'CLASS', 'TIME': 'RACE_TIME'}) 
        n = n + 1

        # fastest laps
        table_flaps = soup.find_all('table')[1] 
        df_flaps = pd.read_html(str(table_flaps), flavor='bs4', header=[1])[0]
        df_flaps.drop(['Unnamed: 7','Unnamed: 8'],axis=1, inplace=True)
        df_flaps = df_flaps.rename(columns={'POS': 'FLAP_POS', 'LAP TIME': 'FLAP_TIME', 'LAP': 'F_LAP', 'GAP': 'FLAP_GAP', 'KM/H': 'FLAP_KM/H', 'TIME': 'FLAP_HOUR'}) 

        # best sector times
        table_bs = soup.find_all('table')[2] 
        df_bs = pd.read_html(str(table_bs), flavor='bs4', header=[2])[0]
        df_bs.drop(['Unnamed: 7'],axis=1, inplace=True)
        df_bs_1 = df_bs[['POS', 'DRIVER',   'TIME']]
        df_bs_1 = df_bs_1.rename(columns={'POS': 'BS1_POS', 'DRIVER':   'BS1_DRIVER', 'TIME':   'BS1_TIME'}) 
        df_bs_2 = df_bs[['POS', 'DRIVER.1', 'TIME.1']]
        df_bs_2 = df_bs_2.rename(columns={'POS': 'BS2_POS', 'DRIVER.1': 'BS2_DRIVER', 'TIME.1': 'BS2_TIME'}) 
        df_bs_3 = df_bs[['POS', 'DRIVER.2', 'TIME.2']]
        df_bs_3 = df_bs_3.rename(columns={'POS': 'BS3_POS', 'DRIVER.2': 'BS3_DRIVER', 'TIME.2': 'BS3_TIME'}) 

        # speed traps
        table_straps = soup.find_all('table')[3] 
        df_straps = pd.read_html(str(table_straps), flavor='bs4', header=[1])[0]
        df_straps.drop(['TEAM','TIME','Unnamed: 5'],axis=1, inplace=True)
        df_straps = df_straps.rename(columns={'POS': 'ST_POS', 'KM/H': 'ST_KM/H'})

        # maximum speeds
        table_mspeeds = soup.find_all('table')[4] 
        df_mspeeds = pd.read_html(str(table_mspeeds), flavor='bs4', header=[2])[0]
        df_mspeeds.drop(['Unnamed: 7'],axis=1, inplace=True)
        df_mspeeds_1 = df_mspeeds[['POS', 'DRIVER',   'KM/H']]
        df_mspeeds_1 = df_mspeeds_1.rename(columns={'POS': 'I1_POS', 'DRIVER':   'I1_DRIVER', 'KM/H':   'I1_KM/H'}) 
        df_mspeeds_2 = df_mspeeds[['POS', 'DRIVER.1', 'KM/H.1']]
        df_mspeeds_2 = df_mspeeds_2.rename(columns={'POS': 'I2_POS', 'DRIVER.1': 'I2_DRIVER', 'KM/H.1': 'I2_KM/H'}) 
        df_mspeeds_3 = df_mspeeds[['POS', 'DRIVER.2', 'KM/H.2']]
        df_mspeeds_3 = df_mspeeds_3.rename(columns={'POS': 'FL_POS', 'DRIVER.2': 'FL_DRIVER', 'KM/H.2': 'FL_KM/H'}) 

        # pit stops
        table_pstops = soup.find_all('table')[5] 
        df_pstops = pd.read_html(str(table_pstops), flavor='bs4', header=[1])[0]
        df_pstops = df_pstops.rename(columns={'NO': 'DRIVER_NO', 'TOTAL TIME':   'PS_TOTAL_TIME'})
        t = df_pstops.index[df_pstops['DRIVER_NO'] == 'RACE - PIT STOP - DETAIL'].to_list() 
        df_pstops = df_pstops[:t[0]]
        df_pstops = df_pstops[['DRIVER_NO','DRIVER','STOPS','PS_TOTAL_TIME']]

        # merge information for maximum speeds and best sector tables 
        fia_df = pd.merge(df_class, df_flaps, how='left', on=['DRIVER'])

        fia_df = pd.merge(fia_df, df_bs_1,  how='left', left_on='DRIVER_SNAME', right_on='BS1_DRIVER')
        fia_df = pd.merge(fia_df, df_bs_2,  how='left', left_on='DRIVER_SNAME', right_on='BS2_DRIVER')
        fia_df = pd.merge(fia_df, df_bs_3,  how='left', left_on='DRIVER_SNAME', right_on='BS3_DRIVER')
        fia_df.drop(['BS1_DRIVER','BS2_DRIVER','BS3_DRIVER'],axis=1, inplace=True)

        fia_df = pd.merge(fia_df,   df_straps, how='left', on=['DRIVER'])

        fia_df = pd.merge(fia_df, df_mspeeds_1,  how='left', left_on='DRIVER_SNAME', right_on='I1_DRIVER')
        fia_df = pd.merge(fia_df, df_mspeeds_2,  how='left', left_on='DRIVER_SNAME', right_on='I2_DRIVER')
        fia_df = pd.merge(fia_df, df_mspeeds_3,  how='left', left_on='DRIVER_SNAME', right_on='FL_DRIVER')
        fia_df.drop(['I1_DRIVER','I2_DRIVER','FL_DRIVER'],axis=1, inplace=True)

        fia_df = pd.merge(fia_df, df_pstops, how='left', on=['DRIVER'])

        dataset = dataset.append(fia_df)
        
    return dataset

In [265]:
# function to prepare dataset by season - extracts qualification and grid position
def fia_f1_session(season, gp_city, gp_link):
    '''Insert the season = year and grand_prix as a list with fia links'''
    print('Season: ' + str(season) + ' | Source: fia.com')

    dataset = pd.DataFrame()
    n = 0
    for i in gp_link:
        url = f'https://www.fia.com{i}'
        print('Circuit: ' + str(gp_city[n]) + ' | ' + url)
        
        source = urllib.request.urlopen(url).read()
        soup = BeautifulSoup(source,'lxml')    
        
        #qualification
        table_qual = soup.find_all('table')[3] 
        df_qual = pd.read_html(str(table_qual), flavor='bs4', header=[1])[0]
        df_qual = df_qual[['POS','DRIVER','Q1','LAPS','Q2','LAPS.1','Q3','LAPS.2']]
        df_qual = df_qual.rename (columns={'POS': 'QL_CLASS', 'Q1': 'QL_TIME1','Q2': 'QL_TIME2','Q3': 'QL_TIME3',
                                           'LAPS':'QL_LAPS1','LAPS.1':'QL_LAPS2','LAPS.2':'QL_LAPS3'})
        df_qual['GRAND_PRIX'] = gp_city[n].capitalize()
        df_qual['SEASON'] = season

        #grid
        table_grid = soup.find_all('table')[4] 
        df_grid = pd.read_html(str(table_grid), flavor='bs4', header=[1])[0]
        df_grid = df_grid[['POS','DRIVER','TIME']]
        df_grid = df_grid.rename (columns={'POS': 'GD_CLASS', 'TIME': 'GD_TIME'})

        fia_df = pd.merge(df_qual, df_grid, how='left', on=['DRIVER'])
        n = n + 1
        dataset = dataset.append(fia_df)

    return dataset

Please see the usage example on the race classification for the 2017 season:

In [266]:
links_race_17 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=679', url_string = 'race-class')
c_17 = f1_gp_circuits(season = 2017)
city_17 = c_17['GP'].unique().tolist()

df_2021_data = fia_f1_data(2017, gp_city = city_17, gp_link = links_race_17)
df_2021_data.head()


Season: 2017 | Source: espn.com
Season: 2017 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-7
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-0
Circuit: russian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-8
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-1
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-2
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-9
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-10
Circuit: aus

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME
0,1,Sebastian Vettel,Scuderia Ferrari,57,1:24:11.672,S. Vettel,Australian,2017,3.0,1:26.638,53.0,0.1,220.351,17:24:49,6.0,29.083,2.0,23.164,1.0,34.385,9.0,314.6,4.0,280.8,3.0,298.8,5.0,305.1,5,1,21.988
1,2,Lewis Hamilton,Mercedes AMG Petronas F1 Team,57,1:24:21.647,L. Hamilton,Australian,2017,6.0,1:27.033,44.0,0.495,219.351,17:11:49,3.0,28.989,6.0,23.243,6.0,34.584,4.0,318.0,2.0,283.9,1.0,300.0,2.0,308.3,44,1,21.709
2,3,Valtteri Bottas,Mercedes AMG Petronas F1 Team,57,1:24:22.922,V. Bottas,Australian,2017,2.0,1:26.593,56.0,0.055,220.465,17:29:23,1.0,28.885,3.0,23.168,4.0,34.453,6.0,316.3,1.0,284.9,5.0,297.9,6.0,304.9,77,1,21.44
3,4,Kimi Raikkonen,Scuderia Ferrari,57,1:24:34.065,K. Raikkonen,Australian,2017,1.0,1:26.538,56.0,0.0,220.605,17:29:33,2.0,28.903,5.0,23.225,2.0,34.41,13.0,304.3,11.0,277.2,13.0,293.2,14.0,297.7,7,1,22.033
4,5,Max Verstappen,Red Bull Racing,57,1:24:40.499,M. Verstappen,Australian,2017,5.0,1:26.964,43.0,0.426,219.525,17:10:39,7.0,29.103,1.0,23.071,5.0,34.552,8.0,315.3,3.0,281.6,2.0,299.4,9.0,302.8,33,1,22.208


No examples areprovide for the fia_f1_season function, but the usage is the same as the example above and we can see the aplication bellow.

After we run both functions we just need to merge both dataset by driver and GP, and move on to another season.

# Web Scrapping

We will now aplly the functions above for seasons from 2015 to 2021 on the fia and espn websites.

## 2021

In [267]:
links_race_21 = Links_Extraction(url = 'https://www.fia.com/f1-archives', url_string = 'race-classification')

In [268]:
links_race_21

['/events/fia-formula-one-world-championship/season-2021/bahrain-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/emilia-romagna-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/portuguese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/spanish-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/monaco-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/azerbaijan-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/french-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/styrian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/austrian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2021/british-grand-prix/race-classification',
 '/event

In [269]:
links_session_21 = Links_Extraction(url = 'https://www.fia.com/f1-archives', url_string = 'session')

In [270]:
c_21 = f1_gp_circuits(season = 2021)
city_21 = c_21['GP'].tolist()
c_21

Season: 2021 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 26 - 28,Bahrain GPBahrain International Circuit,bahrain
1,Apr 16 - 18,Emilia Romagna GPAutodromo Enzo e Dino Ferrari,emilia-romagna
2,Apr 30 - May 2,Portuguese GPAutódromo Internacional Do Algarve,portuguese
3,May 7 - 9,Spanish GPCircuit de Barcelona-Catalunya,spanish
4,May 20 - 23,Monaco GPCircuit de Monaco,monaco
...,...,...,...
17,Nov 5 - 7,Mexican GPAutodromo Hermanos Rodriguez,mexican
18,Nov 12 - 14,Brazilian GPAutodromo Jose Carlos Pace,brazilian
19,Nov 19 - 21,Qatar GPLosail International Circuit,qatar
20,Dec 3 - 5,Saudi Arabian GPJeddah Street Circuit,saudi-arabia


In [271]:
df_2021 = fia_f1_data(2021, gp_city = city_21, gp_link = links_race_21)

Season: 2021 | Source: fia.com
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/bahrain-grand-prix/race-classification
Circuit: emilia-romagna | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/emilia-romagna-grand-prix/race-classification
Circuit: portuguese | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/portuguese-grand-prix/race-classification
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/spanish-grand-prix/race-classification
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/monaco-grand-prix/race-classification
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/azerbaijan-grand-prix/race-classification
Circuit: french | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/french-grand-prix/race-classification
Circuit: austrian | htt

In [272]:
df_2021

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME
0,1,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,56,1:32:03.897,L. Hamilton,Bahrain,2021,4.0,1:34.015,44.0,1.925,207.235,19:20:17,2.0,29.944,4.0,40.418,3.0,23.141,18.0,315.2,10.0,239.5,2.0,267.7,13.0,290.4,44,2,48.915
1,2,Max Verstappen,Red Bull Racing Honda,56,1:32:04.642,M. Verstappen,Bahrain,2021,2.0,1:33.228,41.0,1.138,208.984,19:15:41,4.0,30.009,2.0,40.159,2.0,22.995,4.0,327.0,15.0,237.9,6.0,266.1,12.0,291.1,33,2,48.615
2,3,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,56,1:32:41.280,V. Bottas,Bahrain,2021,1.0,1:32.090,56.0,0.000,211.566,19:39:51,1.0,29.640,1.0,39.508,1.0,22.942,15.0,318.9,11.0,239.5,5.0,266.6,16.0,290.0,77,3,1:21.725
3,4,Lando Norris,McLaren F1 Team,56,1:32:50.363,L. Norris,Bahrain,2021,6.0,1:34.396,38.0,2.306,206.398,19:11:28,7.0,30.206,6.0,40.525,5.0,23.207,9.0,324.2,16.0,237.6,7.0,263.4,8.0,291.8,4,2,50.539
4,5,Sergio Perez,Red Bull Racing Honda,56,1:32:55.944,S. Perez,Bahrain,2021,3.0,1:33.970,44.0,1.880,207.334,19:21:10,6.0,30.180,3.0,40.395,6.0,23.231,8.0,324.7,5.0,240.5,4.0,267.0,17.0,290.0,11,3,1:12.289
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14,15,Sergio Perez,Red Bull Racing Honda,55,1:24:12.571,S. Perez,Abu-dhabi,2021,2.0,1:26.419,51.0,0.316,219.993,18:19:56,2.0,17.423,4.0,37.415,3.0,31.473,18.0,318.4,5.0,296.4,11.0,316.0,7.0,221.9,11,3,1:04.269
15,16,Nicholas Latifi,Williams Racing,50,1:16:55.491,N. Latifi,Abu-dhabi,2021,15.0,1:29.293,30.0,3.190,212.912,17:49:38,15.0,17.837,14.0,37.987,17.0,32.845,6.0,328.3,10.0,294.1,9.0,321.8,14.0,218.8,6,1,21.241
16,17,Antonio Giovinazzi,Alfa Romeo Racing ORLEN,33,0:50:20.298,A. Giovinazzi,Abu-dhabi,2021,16.0,1:29.442,33.0,3.339,212.557,17:53:42,16.0,17.979,16.0,38.036,16.0,32.845,4.0,331.7,12.0,291.6,6.0,325.7,17.0,215.4,99,1,22.283
17,18,George Russell,Williams Racing,26,0:41:10.855,G. Russell,Abu-dhabi,2021,19.0,1:30.647,23.0,4.544,209.732,17:38:42,19.0,18.205,19.0,38.401,19.0,33.772,17.0,321.7,15.0,288.3,16.0,309.5,18.0,214.7,,,


In [273]:
df_2021_session = fia_f1_session(2021, gp_city = city_21, gp_link = links_session_21)

Season: 2021 | Source: fia.com
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/bahrain-grand-prix/session-classifications
Circuit: emilia-romagna | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/emilia-romagna-grand-prix/session
Circuit: portuguese | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/portuguese-grand-prix/session-classifications
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/spanish-grand-prix/session-classifications
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/monaco-grand-prix/session-classifications
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/azerbaijan-grand-prix/session-classifications
Circuit: french | https://www.fia.com/events/fia-formula-one-world-championship/season-2021/french-grand-prix/session-classifications
Circuit: au

In [274]:
df_2021_session

Unnamed: 0,QL_CLASS,DRIVER,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,GRAND_PRIX,SEASON,GD_CLASS,GD_TIME
0,1,Max Verstappen,1:30.499,3,1:30.318,6.0,1:28.997,6.0,Bahrain,2021,1.0,1:28.997
1,2,Lewis Hamilton,1:30.617,6,1:30.085,6.0,1:29.385,6.0,Bahrain,2021,2.0,1:29.385
2,3,Valtteri Bottas,1:31.200,5,1:30.186,6.0,1:29.586,6.0,Bahrain,2021,3.0,1:29.586
3,4,Charles Leclerc,1:30.691,6,1:30.010,6.0,1:29.678,3.0,Bahrain,2021,4.0,1:29.678
4,5,Pierre Gasly,1:30.848,3,1:30.513,6.0,1:29.809,6.0,Bahrain,2021,5.0,1:29.809
...,...,...,...,...,...,...,...,...,...,...,...,...
15,16,Nicholas Latifi,1:24.338,8,,,,,Abu-dhabi,2021,16.0,1:24.338
16,17,George Russell,1:24.423,7,,,,,Abu-dhabi,2021,17.0,1:24.423
17,18,Kimi Raikkonen,1:24.779,8,,,,,Abu-dhabi,2021,18.0,1:24.779
18,19,Mick Schumacher,1:24.906,9,,,,,Abu-dhabi,2021,19.0,1:24.906


In [275]:
f1_2021 = pd.merge(df_2021, df_2021_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [276]:
f1_2021

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,56,1:32:03.897,L. Hamilton,Bahrain,2021,4.0,1:34.015,44.0,1.925,207.235,19:20:17,2.0,29.944,4.0,40.418,3.0,23.141,18.0,315.2,10.0,239.5,2.0,267.7,13.0,290.4,44,2,48.915,2.0,1:30.617,6.0,1:30.085,6.0,1:29.385,6.0,2021.0,2.0,1:29.385
1,2,Max Verstappen,Red Bull Racing Honda,56,1:32:04.642,M. Verstappen,Bahrain,2021,2.0,1:33.228,41.0,1.138,208.984,19:15:41,4.0,30.009,2.0,40.159,2.0,22.995,4.0,327.0,15.0,237.9,6.0,266.1,12.0,291.1,33,2,48.615,1.0,1:30.499,3.0,1:30.318,6.0,1:28.997,6.0,2021.0,1.0,1:28.997
2,3,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,56,1:32:41.280,V. Bottas,Bahrain,2021,1.0,1:32.090,56.0,0.000,211.566,19:39:51,1.0,29.640,1.0,39.508,1.0,22.942,15.0,318.9,11.0,239.5,5.0,266.6,16.0,290.0,77,3,1:21.725,3.0,1:31.200,5.0,1:30.186,6.0,1:29.586,6.0,2021.0,3.0,1:29.586
3,4,Lando Norris,McLaren F1 Team,56,1:32:50.363,L. Norris,Bahrain,2021,6.0,1:34.396,38.0,2.306,206.398,19:11:28,7.0,30.206,6.0,40.525,5.0,23.207,9.0,324.2,16.0,237.6,7.0,263.4,8.0,291.8,4,2,50.539,7.0,1:30.902,6.0,1:30.099,6.0,1:29.974,6.0,2021.0,7.0,1:29.974
4,5,Sergio Perez,Red Bull Racing Honda,56,1:32:55.944,S. Perez,Bahrain,2021,3.0,1:33.970,44.0,1.880,207.334,19:21:10,6.0,30.180,3.0,40.395,6.0,23.231,8.0,324.7,5.0,240.5,4.0,267.0,17.0,290.0,11,3,1:12.289,11.0,1:31.165,5.0,1:30.659,6.0,,,2021.0,11.0,1:30.659
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
474,15,Sergio Perez,Red Bull Racing Honda,55,1:24:12.571,S. Perez,Abu-dhabi,2021,2.0,1:26.419,51.0,0.316,219.993,18:19:56,2.0,17.423,4.0,37.415,3.0,31.473,18.0,318.4,5.0,296.4,11.0,316.0,7.0,221.9,11,3,1:04.269,4.0,1:23.350,9.0,1:23.135,8.0,1:22.947,7.0,2021.0,4.0,1:22.947
475,16,Nicholas Latifi,Williams Racing,50,1:16:55.491,N. Latifi,Abu-dhabi,2021,15.0,1:29.293,30.0,3.190,212.912,17:49:38,15.0,17.837,14.0,37.987,17.0,32.845,6.0,328.3,10.0,294.1,9.0,321.8,14.0,218.8,6,1,21.241,16.0,1:24.338,8.0,,,,,2021.0,16.0,1:24.338
476,17,Antonio Giovinazzi,Alfa Romeo Racing ORLEN,33,0:50:20.298,A. Giovinazzi,Abu-dhabi,2021,16.0,1:29.442,33.0,3.339,212.557,17:53:42,16.0,17.979,16.0,38.036,16.0,32.845,4.0,331.7,12.0,291.6,6.0,325.7,17.0,215.4,99,1,22.283,14.0,1:24.118,9.0,1:24.251,8.0,,,2021.0,14.0,1:24.251
477,18,George Russell,Williams Racing,26,0:41:10.855,G. Russell,Abu-dhabi,2021,19.0,1:30.647,23.0,4.544,209.732,17:38:42,19.0,18.205,19.0,38.401,19.0,33.772,17.0,321.7,15.0,288.3,16.0,309.5,18.0,214.7,,,,17.0,1:24.423,7.0,,,,,2021.0,17.0,1:24.423


# 2020

In [277]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=1059', url_string = 'race')

['/events/karting/season-2022/races-calendar',
 '/race-against-manipulation',
 '/events/fia-formula-one-world-championship/season-2020/singapore-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/abu-dhabi-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/australian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/bahrain-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/vietnamese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/dutch-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/monaco-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/azerbaijan-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2020/canadian-grand-prix/race-classification',
 '/events/fia-formula-one-wor

In [278]:
links_race_20 = [
'/events/fia-formula-one-world-championship/season-2020/austrian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/styrian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/hungarian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/british-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/formula-1-70th-anniversary-grand-prix/race',
'/events/fia-formula-one-world-championship/season-2020/spanish-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/belgian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/italian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/tuscan-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/russian-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/eifel-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/portuguese-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/emilia-romagna-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/turkish-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/bahrain-grand-prix/race-classification-0',
'/events/fia-formula-one-world-championship/season-2020/sakhir-grand-prix/race-classification',
'/events/fia-formula-one-world-championship/season-2020/abu-dhabi-grand-prix/race-classification-0']

In [279]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=1059', url_string = 'session')

['/events/fia-formula-one-world-championship/season-2020/singapore-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/abu-dhabi-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/australian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/bahrain-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/vietnamese-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/dutch-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/monaco-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/azerbaijan-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/canadian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2020/french-grand-p

In [280]:
links_session_20 = [
 '/events/fia-formula-one-world-championship/season-2020/austrian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/styrian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/hungarian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/british-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/formula-1-70th-anniversary-grand-prix/session'
,'/events/fia-formula-one-world-championship/season-2020/spanish-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/belgian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/italian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/tuscan-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/russian-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/eifel-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/portuguese-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/emilia-romagna-grand-prix/session'
,'/events/fia-formula-one-world-championship/season-2020/turkish-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/bahrain-grand-prix/session-classifications-0'
,'/events/fia-formula-one-world-championship/season-2020/sakhir-grand-prix/session-classifications'
,'/events/fia-formula-one-world-championship/season-2020/abu-dhabi-grand-prix/session-classifications-0']

In [281]:
c_20 = f1_gp_circuits(season = 2020)
city_20 = c_20['GP'].tolist()
c_20['GP'][4] = '70th-anniversary'
c_20

Season: 2020 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Jul 3 - 5,Austrian GPRed Bull Ring,austrian
1,Jul 10 - 12,Austrian GP 2Red Bull Ring,austrian
2,Jul 17 - 19,Hungarian GPHungaroring,hungarian
3,Jul 31 - Aug 2,Rolex British GPSilverstone Circuit,british
4,Aug 7 - 9,Rolex British GP 2Silverstone Circuit,70th-anniversary
5,Aug 14 - 16,Spanish GPCircuit de Barcelona-Catalunya,spanish
6,Aug 28 - 30,Belgian GPCircuit de Spa-Francorchamps,belgian
7,Sep 4 - 6,Italian GPAutodromo Nazionale Monza,italian
8,Sep 11 - 13,Tuscan GPAutodromo Internazionale del Mugello,tuscan
9,Sep 25 - 27,Russian GPSochi Autodrom,russian


In [282]:
df_2020 = fia_f1_data(2020, gp_city = city_20, gp_link = links_race_20)

Season: 2020 | Source: fia.com
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/austrian-grand-prix/race-classification
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/styrian-grand-prix/race-classification
Circuit: hungarian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/hungarian-grand-prix/race-classification
Circuit: british | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/british-grand-prix/race-classification
Circuit: british | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/formula-1-70th-anniversary-grand-prix/race
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/spanish-grand-prix/race-classification
Circuit: belgian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/belgian-grand-prix/race-classification
Circuit: italian | https://www.fia

In [283]:
df_2020

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME
0,1,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,71,1:30:55.739,V. Bottas,Austrian,2020,2.0,1:07.657,68.0,0.182,229.758,16:40:16,14.0,17.079,3.0,30.181,3.0,20.210,19.0,299.9,19.0,304.5,2.0,241.9,16.0,280.7,77,2,38.344
1,2,Charles Leclerc,Scuderia Ferrari,71,1:30:58.439,C. Leclerc,Austrian,2020,4.0,1:07.901,64.0,0.426,228.933,16:35:48,2.0,16.715,2.0,30.169,4.0,20.457,11.0,318.2,13.0,322.7,1.0,243.5,10.0,284.7,16,3,1:03.476
2,3,Lando Norris,McLaren F1 Team,71,1:31:01.230,L. Norris,Austrian,2020,1.0,1:07.475,71.0,0.000,230.378,16:43:48,5.0,16.825,1.0,29.982,1.0,20.170,17.0,313.2,12.0,323.1,7.0,239.5,4.0,287.7,4,3,1:03.721
3,4,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,71,1:31:01.428,L. Hamilton,Austrian,2020,3.0,1:07.712,67.0,0.237,229.572,16:39:10,4.0,16.813,5.0,30.316,2.0,20.204,5.0,324.0,9.0,326.0,4.0,240.5,3.0,287.7,44,2,38.215
4,5,Carlos Sainz,McLaren F1 Team,71,1:31:04.642,C. Sainz,Austrian,2020,5.0,1:07.974,63.0,0.499,228.687,16:34:41,1.0,16.632,4.0,30.199,6.0,20.574,2.0,326.5,2.0,334.8,8.0,238.5,1.0,291.4,55,3,1:05.696
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15,16,Antonio Giovinazzi,Alfa Romeo Racing ORLEN,54,1:37:02.381,A. Giovinazzi,Abu-dhabi,2020,7.0,1:41.675,29.0,0.749,196.650,18:07:27,8.0,17.649,12.0,43.348,9.0,40.507,3.0,333.1,2.0,291.8,12.0,306.4,10.0,221.0,99,1,21.480
16,17,Nicholas Latifi,Williams Racing,54,1:37:11.562,N. Latifi,Abu-dhabi,2020,16.0,1:42.497,49.0,1.571,195.073,18:42:04,15.0,17.734,19.0,43.621,13.0,40.736,9.0,329.3,11.0,286.7,10.0,308.0,4.0,222.5,6,2,43.741
17,18,Kevin Magnussen,Haas F1 Team,54,1:38:00.387,K. Magnussen,Abu-dhabi,2020,13.0,1:41.999,50.0,1.073,196.025,18:44:39,6.0,17.602,16.0,43.568,8.0,40.266,18.0,311.2,16.0,285.0,18.0,299.8,12.0,220.4,20,2,45.812
18,19,Pietro Fittipaldi,Haas F1 Team,53,1:36:38.988,P. Fittipaldi,Abu-dhabi,2020,8.0,1:41.707,50.0,0.781,196.588,18:44:58,10.0,17.672,13.0,43.359,12.0,40.676,16.0,320.5,15.0,285.5,15.0,303.9,9.0,221.1,51,3,1:18.047


In [284]:
df_2020_session = fia_f1_session(2020, gp_city = city_20, gp_link = links_session_20)

Season: 2020 | Source: fia.com
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/austrian-grand-prix/session-classifications
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/styrian-grand-prix/session-classifications
Circuit: hungarian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/hungarian-grand-prix/session-classifications
Circuit: british | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/british-grand-prix/session-classifications
Circuit: british | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/formula-1-70th-anniversary-grand-prix/session
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/spanish-grand-prix/session-classifications
Circuit: belgian | https://www.fia.com/events/fia-formula-one-world-championship/season-2020/belgian-grand-prix/session-classifications
Circuit

In [285]:
df_2020_session

Unnamed: 0,QL_CLASS,DRIVER,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,GRAND_PRIX,SEASON,GD_CLASS,GD_TIME
0,1,Valtteri Bottas,1:04.111,6,1:03.015,6.0,1:02.939,7.0,Austrian,2020,1,1:02.939
1,2,Lewis Hamilton,1:04.198,8,1:03.096,6.0,1:02.951,7.0,Austrian,2020,5,1:02.951
2,3,Max Verstappen,1:04.024,8,1:04.000,8.0,1:03.477,7.0,Austrian,2020,2,1:03.477
3,4,Lando Norris,1:04.606,6,1:03.819,6.0,1:03.626,5.0,Austrian,2020,3,1:03.626
4,5,Alexander Albon,1:04.661,6,1:03.746,6.0,1:03.868,6.0,Austrian,2020,4,1:03.868
...,...,...,...,...,...,...,...,...,...,...,...,...
15,16,Kimi Raikkonen,1:37.555,6,,,,,Abu-dhabi,2020,15,1:37.555
16,17,Kevin Magnussen,1:37.863,9,,,,,Abu-dhabi,2020,20,1:37.863
17,18,George Russell,1:38.045,8,,,,,Abu-dhabi,2020,16,1:38.045
18,19,Pietro Fittipaldi,1:38.173,9,,,,,Abu-dhabi,2020,17,1:38.173


In [286]:
f1_2020 = pd.merge(df_2020, df_2020_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [287]:
f1_2020

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,71,1:30:55.739,V. Bottas,Austrian,2020,2.0,1:07.657,68.0,0.182,229.758,16:40:16,14.0,17.079,3.0,30.181,3.0,20.210,19.0,299.9,19.0,304.5,2.0,241.9,16.0,280.7,77,2,38.344,1,1:04.111,6,1:03.015,6.0,1:02.939,7.0,2020,1,1:02.939
1,1,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,71,1:30:55.739,V. Bottas,Austrian,2020,2.0,1:07.657,68.0,0.182,229.758,16:40:16,14.0,17.079,3.0,30.181,3.0,20.210,19.0,299.9,19.0,304.5,2.0,241.9,16.0,280.7,77,2,38.344,4,1:18.791,13,1:18.657,11.0,1:20.701,10.0,2020,4,1:20.701
2,2,Charles Leclerc,Scuderia Ferrari,71,1:30:58.439,C. Leclerc,Austrian,2020,4.0,1:07.901,64.0,0.426,228.933,16:35:48,2.0,16.715,2.0,30.169,4.0,20.457,11.0,318.2,13.0,322.7,1.0,243.5,10.0,284.7,16,3,1:03.476,7,1:04.500,8,1:04.041,6.0,1:03.923,6.0,2020,7,1:03.923
3,2,Charles Leclerc,Scuderia Ferrari,71,1:30:58.439,C. Leclerc,Austrian,2020,4.0,1:07.901,64.0,0.426,228.933,16:35:48,2.0,16.715,2.0,30.169,4.0,20.457,11.0,318.2,13.0,322.7,1.0,243.5,10.0,284.7,16,3,1:03.476,11,1:20.871,12,1:19.628,12.0,,,2020,14,1:19.628
4,3,Lando Norris,McLaren F1 Team,71,1:31:01.230,L. Norris,Austrian,2020,1.0,1:07.475,71.0,0.000,230.378,16:43:48,5.0,16.825,1.0,29.982,1.0,20.170,17.0,313.2,12.0,323.1,7.0,239.5,4.0,287.7,4,3,1:03.721,4,1:04.606,6,1:03.819,6.0,1:03.626,5.0,2020,3,1:03.626
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
415,16,Antonio Giovinazzi,Alfa Romeo Racing ORLEN,54,1:37:02.381,A. Giovinazzi,Abu-dhabi,2020,7.0,1:41.675,29.0,0.749,196.650,18:07:27,8.0,17.649,12.0,43.348,9.0,40.507,3.0,333.1,2.0,291.8,12.0,306.4,10.0,221.0,99,1,21.480,14,1:37.075,6,1:38.248,6.0,,,2020,14,1:38.248
416,17,Nicholas Latifi,Williams Racing,54,1:37:11.562,N. Latifi,Abu-dhabi,2020,16.0,1:42.497,49.0,1.571,195.073,18:42:04,15.0,17.734,19.0,43.621,13.0,40.736,9.0,329.3,11.0,286.7,10.0,308.0,4.0,222.5,6,2,43.741,20,1:38.443,7,,,,,2020,18,1:38.443
417,18,Kevin Magnussen,Haas F1 Team,54,1:38:00.387,K. Magnussen,Abu-dhabi,2020,13.0,1:41.999,50.0,1.073,196.025,18:44:39,6.0,17.602,16.0,43.568,8.0,40.266,18.0,311.2,16.0,285.0,18.0,299.8,12.0,220.4,20,2,45.812,17,1:37.863,9,,,,,2020,20,1:37.863
418,19,Pietro Fittipaldi,Haas F1 Team,53,1:36:38.988,P. Fittipaldi,Abu-dhabi,2020,8.0,1:41.707,50.0,0.781,196.588,18:44:58,10.0,17.672,13.0,43.359,12.0,40.676,16.0,320.5,15.0,285.5,15.0,303.9,9.0,221.1,51,3,1:18.047,19,1:38.173,9,,,,,2020,17,1:38.173


# 2019

In [288]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=971', url_string = 'race-cl')

['/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/azerbaijan-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/french-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/british-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/german-grand-prix/race-classification',
 '/events/fia-fo

In [289]:
links_race_19 = ['/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/azerbaijan-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/french-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/british-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/german-grand-prix/race-classification-german',
 '/events/fia-formula-one-world-championship/season-2019/hungarian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/belgian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/italian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/singapore-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/russian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/japanese-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/united-states-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/brazilian-grand-prix/race-classification',
 '/events/fia-formula-one-world-championship/season-2019/abu-dhabi-grand-prix/race-classification']

In [290]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=971', url_string = 'session')

['/events/fia-formula-one-world-championship/season-2019/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/french-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/british-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/german-grand-prix/session-classification

In [291]:
links_session_19 = ['/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/french-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/british-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/german-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/hungarian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/belgian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/italian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/singapore-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/russian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/japanese-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/united-states-grand-prix/session',
 '/events/fia-formula-one-world-championship/season-2019/brazilian-grand-prix/session-classifications',
 '/events/fia-formula-one-world-championship/season-2019/abu-dhabi-grand-prix/session-classifications']

In [292]:
c_19 = f1_gp_circuits(season = 2019)
c_19 = c_19.drop(index=8)
c_19 = c_19.drop(index=17)
city_19 = c_19['GP'].unique().tolist()
c_19_2 = c_19.drop(index=3)
c_19

Season: 2019 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 15 - 17,Australian GPMelbourne Grand Prix Circuit,australian
1,Mar 29 - 31,Bahrain GPBahrain International Circuit,bahrain
2,Apr 12 - 14,Chinese GPShanghai International Circuit,chinese
3,Apr 26 - 28,Socar Azerbaijan GPBaku City Circuit,azerbaijan
4,May 10 - 12,Spanish GPCircuit de Barcelona-Catalunya,spanish
5,May 23 - 26,Monaco GPCircuit de Monaco,monaco
6,Jun 7 - 9,Canadian GPCircuit Gilles-Villeneuve,canadian
7,Jun 21 - 23,French GPCircuit Paul Ricard,french
9,Jul 12 - 14,Rolex British GPSilverstone Circuit,british
10,Jul 26 - 28,Mercedes-Benz German GPHockenheimring,german


In [293]:
city_19_2 = c_19_2['GP'].tolist()

In [294]:
df_2019 = fia_f1_data(2019, gp_city = city_19, gp_link = links_race_19)

Season: 2019 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/race-classification
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/race-classification
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/race-classification
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/azerbaijan-grand-prix/race-classification
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/race-classification
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/race-classification
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/race-classification
Circuit: french | https://www.fia

In [295]:
#df_2019

In [296]:
df_2019_session = fia_f1_session(2019, gp_city = city_19_2, gp_link = links_session_19)

Season: 2019 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/australian-grand-prix/session-classifications
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/bahrain-grand-prix/session-classifications
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/chinese-grand-prix/session-classifications
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/spanish-grand-prix/session-classifications
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/monaco-grand-prix/session-classifications
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/canadian-grand-prix/session-classifications
Circuit: french | https://www.fia.com/events/fia-formula-one-world-championship/season-2019/french-grand-prix/session-classifications
Circuit: brit

In [297]:
f1_2019 = pd.merge(df_2019, df_2019_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [298]:
f1_2019

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Valtteri Bottas,Mercedes AMG Petronas Motorsport,58,1:25:27.325,V. Bottas,Australian,2019,1,1:25.580,57,0.000,223.075,17:37:32,1,28.640,1,22.839,1,34.101,11,311.3,3,280.3,9,290.9,11,301.4,77,1,22.014,2.0,1:22.367,7.0,1:21.193,6.0,1:20.598,6.0,2019.0,2.0,1:20.598
1,2,Lewis Hamilton,Mercedes AMG Petronas Motorsport,58,1:25:48.211,L. Hamilton,Australian,2019,2,1:26.057,57,0.477,221.839,17:37:54,2,28.701,3,23.055,2,34.269,14,309.3,9,278.2,16,287.2,17,296.1,44,1,21.515,1.0,1:22.043,6.0,1:21.014,6.0,1:20.486,6.0,2019.0,1.0,1:20.486
2,3,Max Verstappen,Aston Martin Red Bull Racing,58,1:25:49.845,M. Verstappen,Australian,2019,3,1:26.256,57,0.676,221.327,17:37:55,3,28.850,2,22.948,3,34.363,2,319.9,1,281.7,4,292.9,6,304.7,33,1,21.157,4.0,1:22.876,5.0,1:21.678,6.0,1:21.320,6.0,2019.0,4.0,1:21.320
3,4,Sebastian Vettel,Scuderia Ferrari,58,1:26:24.434,S. Vettel,Australian,2019,8,1:27.954,16,2.374,217.054,16:37:49,7,29.271,9,23.458,8,34.986,17,303.7,13,274.6,19,283.1,19,291.2,5,1,21.995,3.0,1:22.885,5.0,1:21.912,5.0,1:21.190,6.0,2019.0,3.0,1:21.190
4,5,Charles Leclerc,Scuderia Ferrari,58,1:26:25.555,C. Leclerc,Australian,2019,4,1:26.926,58,1.346,219.621,17:39:58,4,29.006,4,23.138,7,34.753,19,297.4,11,277.4,13,288.7,12,301.2,16,1,22.306,5.0,1:22.017,8.0,1:21.739,3.0,1:21.442,6.0,2019.0,5.0,1:21.442
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,16,Antonio Giovinazzi,Alfa Romeo Racing,54,1:35:32.482,A. Giovinazzi,Abu-dhabi,2019,15,1:43.256,28,3.973,193.639,18:03:50,19,17.932,16,43.785,17,41.464,5,334.9,14,284.8,13,308.9,13,219.3,99,2,44.165,17.0,1:38.114,6.0,,,,,2019.0,16.0,1:38.114
376,17,George Russell,ROKiT Williams Racing,54,1:35:43.499,G. Russell,Abu-dhabi,2019,13,1:43.074,50,3.791,193.981,18:42:21,13,17.732,14,43.662,14,41.258,13,329.3,11,286.7,12,309.1,14,219.1,63,1,22.272,19.0,1:38.717,8.0,,,,,2019.0,18.0,1:38.717
377,18,Pierre Gasly,Red Bull Toro Rosso Honda,53,1:34:17.001,P. Gasly,Abu-dhabi,2019,10,1:42.414,53,3.131,195.231,18:47:48,12,17.723,15,43.768,10,40.923,14,327.7,5,290.7,3,324.4,5,223.0,10,1,1:17.467,12.0,1:37.198,6.0,1:37.089,6.0,,,2019.0,11.0,1:37.089
378,19,Robert Kubica,ROKiT Williams Racing,53,1:34:29.473,R. Kubica,Abu-dhabi,2019,20,1:44.500,51,5.217,191.333,18:44:29,20,17.984,20,44.415,20,41.935,18,317.8,20,281.9,19,302.8,16,218.2,88,1,21.688,20.0,1:39.236,6.0,,,,,2019.0,19.0,1:39.236


# 2018

In [299]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=866', url_string = 'race-class')

['/events/fia-formula-one-world-championship/season-2018/race-classification',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-0',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-1',
 '/events/fia-formula-one-world-championship/season-2019/race-classification',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-25',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-3',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-23',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-5',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-6',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-7',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-8',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-24',
 '/events/fia-formula-one-wor

In [300]:
links_race_18 = ['/events/fia-formula-one-world-championship/season-2018/race-classification',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-0',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-1',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-25',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-3',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-23',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-5',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-6',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-7',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-8',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-24',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-26',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-27',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-28',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-13',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-29',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-30',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-22',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-17',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-18',
 '/events/fia-formula-one-world-championship/season-2018/race-classification-20']

In [301]:
Links_Extraction(url = 'https://www.fia.com/f1-archives?season=866', url_string = 'session')

['/events/fia-formula-one-world-championship/season-2018/session-classifications',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-1',
 '/events/fia-formula-one-world-championship/season-2019/session-classifications',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-25',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-3',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-23',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-5',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-6',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-7',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-8',
 '/events/fia-formula-one-world-championship/season-2018/session-cl

In [302]:
links_session_18 = ['/events/fia-formula-one-world-championship/season-2018/session-classifications',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-1',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-25',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-3',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-23',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-5',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-6',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-7',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-8',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-24',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-26',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-27',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-28',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-13',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-29',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-30',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-22',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-17',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-18',
 '/events/fia-formula-one-world-championship/season-2018/session-classifications-19']

In [303]:
c_18 = f1_gp_circuits(season = 2018)
city_18 = c_18['GP'].unique().tolist()
c_18

Season: 2018 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Feb 23 - Mar 25,Australian GPMelbourne Grand Prix Circuit,australian
1,Apr 6 - 8,Bahrain GPBahrain International Circuit,bahrain
2,Apr 13 - 15,Chinese GPShanghai International Circuit,chinese
3,Apr 27 - 29,Azerbaijan GPBaku City Circuit,azerbaijan
4,May 11 - 13,Spanish GPCircuit de Barcelona-Catalunya,spanish
...,...,...,...
16,Oct 5 - 7,Japanese GPSuzuka International Racing Course,japanese
17,Oct 19 - 21,United States GPCircuit of the Americas,united-states
18,Oct 26 - 28,Mexican GPAutodromo Hermanos Rodriguez,mexican
19,Nov 9 - 11,Brazilian GPAutodromo Jose Carlos Pace,brazilian


In [304]:
df_2018 = fia_f1_data(2018, gp_city = city_18, gp_link = links_race_18)

Season: 2018 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-0
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-1
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-25
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-3
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-23
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-5
Circuit: french | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/race-classification-6
Circuit: austrian | https://www.fia.com/even

In [305]:
df_2018_session = fia_f1_session(2018, gp_city = city_18, gp_link = links_session_18)

Season: 2018 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-0
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-1
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-25
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-3
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-23
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-5
Circuit: french | https://www.fia.com/events/fia-formula-one-world-championship/season-2018/session-classifications-6
Circuit: aus

In [306]:
f1_2018 = pd.merge(df_2018, df_2018_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [307]:
f1_2018

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Sebastian Vettel,Scuderia Ferrari,58,1:29:33.283,S. Vettel,Australian,2018,4.0,1:26.469,53.0,0.524,220.782,17:35:35,4.0,28.891,4.0,23.198,3.0,34.334,,,8.0,278.9,18.0,294.0,17.0,287.3,5,1,21.787,3.0,1:23.348,7.0,1:21.944,6.0,1:21.838,7.0,2018.0,3.0,1:21.838
1,2,Lewis Hamilton,Mercedes AMG Petronas Motorsport,58,1:29:38.319,L. Hamilton,Australian,2018,3.0,1:26.444,50.0,0.499,220.845,17:31:16,3.0,28.760,2.0,23.092,2.0,34.247,,,3.0,281.7,3.0,303.8,8.0,297.8,44,1,21.821,1.0,1:22.824,7.0,1:22.051,5.0,1:21.164,8.0,2018.0,1.0,1:21.164
2,3,Kimi Raikkonen,Scuderia Ferrari,58,1:29:39.592,K. Raikkonen,Australian,2018,2.0,1:26.373,57.0,0.428,221.027,17:41:31,2.0,28.698,3.0,23.101,4.0,34.347,,,1.0,285.0,11.0,299.5,16.0,288.6,7,1,21.421,2.0,1:23.096,5.0,1:22.507,5.0,1:21.828,7.0,2018.0,2.0,1:21.828
3,4,Daniel Ricciardo,Aston Martin Red Bull Racing,58,1:29:40.352,D. Ricciardo,Australian,2018,1.0,1:25.945,54.0,0.000,222.128,17:37:13,1.0,28.651,1.0,23.044,1.0,34.089,,,4.0,280.5,6.0,302.0,7.0,298.2,3,1,21.440,5.0,1:23.494,5.0,1:22.897,5.0,1:22.152,7.0,2018.0,8.0,1:22.152
4,5,Fernando Alonso,McLaren F1 Team,58,1:30:01.169,F. Alonso,Australian,2018,7.0,1:26.978,57.0,1.033,219.489,17:41:52,8.0,28.976,8.0,23.406,5.0,34.454,,,17.0,272.5,17.0,295.3,15.0,294.0,14,1,22.573,11.0,1:23.597,8.0,1:23.692,6.0,,,2018.0,10.0,1:23.692
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
415,16,Pierre Gasly,Red Bull Toro Rosso Honda,46,1:26:09.517,P. Gasly,Abu-dhabi,2018,15.0,1:43.988,42.0,3.121,192.276,18:32:42,17.0,18.007,15.0,43.798,14.0,41.724,5,332.8,7.0,290.9,5.0,315.6,18.0,217.8,10,1,22.493,17.0,1:38.166,9.0,,,,,2018.0,17.0,1:38.166
416,17,Esteban Ocon,Racing Point Force India F1 Team,44,1:22:30.745,E. Ocon,Abu-dhabi,2018,13.0,1:43.591,41.0,2.724,193.012,18:30:48,14.0,17.861,12.0,43.721,15.0,41.751,4,333.8,14.0,288.6,1.0,322.3,14.0,218.9,31,1,21.979,9.0,1:36.936,6.0,1:36.814,6.0,1:36.540,6.0,2018.0,9.0,1:36.540
417,18,Marcus Ericsson,Alfa Romeo Sauber F1 Team,24,0:47:06.407,M. Ericsson,Abu-dhabi,2018,19.0,1:46.077,22.0,5.210,188.489,17:57:07,18.0,18.151,19.0,44.262,19.0,43.191,14,322.6,13.0,288.9,12.0,306.1,17.0,218.1,,,,12.0,1:37.619,8.0,1:37.132,6.0,,,2018.0,12.0,1:37.132
418,19,Kimi Raikkonen,Scuderia Ferrari,6,0:14:06.737,K. Raikkonen,Abu-dhabi,2018,18.0,1:45.198,5.0,4.331,190.064,17:25:45,19.0,18.265,14.0,43.798,18.0,43.072,20,308.8,6.0,291.9,19.0,285.7,19.0,216.0,,,,4.0,1:37.010,5.0,1:36.735,7.0,1:35.365,6.0,2018.0,4.0,1:35.365


# 2017

In [308]:
links_race_17 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=679', url_string = 'race-class')

In [309]:
links_race_17

['/events/fia-formula-one-world-championship/season-2017/race-classification',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-7',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-0',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-8',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-1',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-2',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-9',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-10',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-11',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-3',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-12',
 '/events/fia-formula-one-world-championship/season-2017/race-classification-4',
 '/events/fia-formula-one-w

In [310]:
links_session_17 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=679', url_string = 'session')

In [311]:
links_session_17

['/events/fia-formula-one-world-championship/season-2017/session-classifications',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-7',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-8',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-1',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-2',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-9',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-10',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-11',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-3',
 '/events/fia-formula-one-world-championship/season-2017/session-classifications-12',
 '/events/fia-formula-one-world-championship/season-2017/session

In [312]:
c_17 = f1_gp_circuits(season = 2017)
city_17 = c_17['GP'].unique().tolist()
c_17

Season: 2017 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 24 - 26,Australian GPMelbourne Grand Prix Circuit,australian
1,Apr 7 - 9,Chinese GPShanghai International Circuit,chinese
2,Apr 14 - 16,Bahrain GPBahrain International Circuit,bahrain
3,Apr 28 - 30,Russian GPSochi Autodrom,russian
4,May 12 - 14,Spanish GPCircuit de Barcelona-Catalunya,spanish
5,May 25 - 28,Monaco GPCircuit de Monaco,monaco
6,Jun 9 - 11,Canadian GPCircuit Gilles-Villeneuve,canadian
7,Jun 23 - 25,Azerbaijan GPBaku City Circuit,azerbaijan
8,Jul 7 - 9,Austrian GPRed Bull Ring,austrian
9,Jul 14 - 16,British GPSilverstone Circuit,british


In [313]:
df_2017 = fia_f1_data(2017, gp_city = city_17, gp_link = links_race_17)

Season: 2017 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-7
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-0
Circuit: russian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-8
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-1
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-2
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-9
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/race-classification-10
Circuit: austrian | https://www.fia.com/even

In [314]:
#df_2017

In [315]:
df_2017_session = fia_f1_session(2017, gp_city = city_17, gp_link = links_session_17)

Season: 2017 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-7
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-0
Circuit: russian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-8
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-1
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-2
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-9
Circuit: azerbaijan | https://www.fia.com/events/fia-formula-one-world-championship/season-2017/session-classifications-10
Circuit: aus

In [316]:
#df_2017_session

In [317]:
f1_2017 = pd.merge(df_2017, df_2017_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [318]:
f1_2017

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Sebastian Vettel,Scuderia Ferrari,57,1:24:11.672,S. Vettel,Australian,2017,3.0,1:26.638,53.0,0.100,220.351,17:24:49,6.0,29.083,2.0,23.164,1.0,34.385,9.0,314.6,4.0,280.8,3.0,298.8,5.0,305.1,5,1,21.988,2.0,1:25.210,5.0,1:23.401,6.0,1:22.456,6.0,2017.0,2.0,1:22.456
1,2,Lewis Hamilton,Mercedes AMG Petronas F1 Team,57,1:24:21.647,L. Hamilton,Australian,2017,6.0,1:27.033,44.0,0.495,219.351,17:11:49,3.0,28.989,6.0,23.243,6.0,34.584,4.0,318.0,2.0,283.9,1.0,300.0,2.0,308.3,44,1,21.709,1.0,1:24.191,5.0,1:23.251,3.0,1:22.188,6.0,2017.0,1.0,1:22.188
2,3,Valtteri Bottas,Mercedes AMG Petronas F1 Team,57,1:24:22.922,V. Bottas,Australian,2017,2.0,1:26.593,56.0,0.055,220.465,17:29:23,1.0,28.885,3.0,23.168,4.0,34.453,6.0,316.3,1.0,284.9,5.0,297.9,6.0,304.9,77,1,21.440,3.0,1:24.514,4.0,1:23.215,3.0,1:22.481,6.0,2017.0,3.0,1:22.481
3,4,Kimi Raikkonen,Scuderia Ferrari,57,1:24:34.065,K. Raikkonen,Australian,2017,1.0,1:26.538,56.0,0.000,220.605,17:29:33,2.0,28.903,5.0,23.225,2.0,34.410,13.0,304.3,11.0,277.2,13.0,293.2,14.0,297.7,7,1,22.033,4.0,1:24.352,10.0,1:23.376,3.0,1:23.033,6.0,2017.0,4.0,1:23.033
4,5,Max Verstappen,Red Bull Racing,57,1:24:40.499,M. Verstappen,Australian,2017,5.0,1:26.964,43.0,0.426,219.525,17:10:39,7.0,29.103,1.0,23.071,5.0,34.552,8.0,315.3,3.0,281.6,2.0,299.4,9.0,302.8,33,1,22.208,5.0,1:24.482,7.0,1:24.092,6.0,1:23.485,5.0,2017.0,5.0,1:23.485
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,16,Pierre Gasly,SCUDERIA TORO ROSSO,54,1:35:33.961,P. Gasly,Abu-dhabi,2017,16.0,1:43.844,33.0,3.194,192.542,18:02:27,20.0,18.109,19.0,44.035,17.0,41.538,15.0,316.2,20.0,278.5,18.0,300.5,16.0,218.6,10,1,21.519,17.0,1:39.724,9.0,,,,,2017.0,17.0,1:39.724
396,17,Marcus Ericsson,Sauber F1 Team,54,1:35:36.526,M. Ericsson,Abu-dhabi,2017,15.0,1:43.567,36.0,2.917,193.057,18:07:48,13.0,17.873,15.0,43.796,14.0,41.497,6.0,325.7,14.0,284.4,7.0,307.3,18.0,218.0,9,1,22.176,19.0,1:39.994,9.0,,,,,2017.0,19.0,1:39.994
397,18,Lance Stroll,Williams Martini Racing,54,1:35:44.704,L. Stroll,Abu-dhabi,2017,6.0,1:42.324,52.0,1.674,195.402,18:35:47,9.0,17.766,6.0,43.34,10.0,41.108,8.0,324.9,6.0,288.3,11.0,306.1,6.0,220.4,18,3,1:05.064,15.0,1:39.503,8.0,1:39.646,6.0,,,2017.0,15.0,1:39.646
398,19,Carlos Sainz Jr.,Renault Sport F1 Team,31,0:54:20.683,C. Sainz Jr.,Abu-dhabi,2017,14.0,1:43.378,26.0,2.728,193.410,17:49:10,16.0,17.937,17.0,43.919,12.0,41.184,12.0,319.0,18.0,281.0,17.0,301.9,19.0,216.4,55,1,21.956,12.0,1:38.810,3.0,1:38.725,6.0,,,2017.0,12.0,1:38.725


# 2016

In [319]:
links_race_16 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=644', url_string = 'race-class')

In [320]:
links_race_16

['/events/fia-formula-one-world-championship/season-2016/race-classification',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-0',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-1',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-2',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-3',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-4',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-5',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-6',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-7',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-8',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-9',
 '/events/fia-formula-one-world-championship/season-2016/race-classification-10',
 '/events/fia-formula-one-wor

In [321]:
links_session_16 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=644', url_string = 'session')

In [322]:
links_session_16

['/events/fia-formula-one-world-championship/season-2016/session-classifications',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-1',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-2',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-3',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-4',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-5',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-6',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-7',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-8',
 '/events/fia-formula-one-world-championship/season-2016/session-classifications-9',
 '/events/fia-formula-one-world-championship/season-2016/session-cl

In [323]:
c_16 = f1_gp_circuits(season = 2016)
city_16 = c_16['GP'].unique().tolist()
c_16

Season: 2016 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 18 - 20,Australian GPMelbourne Grand Prix Circuit,australian
1,Apr 1 - 3,Bahrain GPBahrain International Circuit,bahrain
2,Apr 15 - 17,Chinese GPShanghai International Circuit,chinese
3,Apr 29 - May 1,Russian GPSochi Autodrom,russian
4,May 13 - 15,Spanish GPCircuit de Barcelona-Catalunya,spanish
...,...,...,...
16,Oct 7 - 9,Japanese GPSuzuka International Racing Course,japanese
17,Oct 21 - 23,United States GPCircuit of the Americas,united-states
18,Oct 28 - 30,Mexican GPAutodromo Hermanos Rodriguez,mexican
19,Nov 11 - 13,Brazilian GPAutodromo Jose Carlos Pace,brazilian


In [324]:
df_2016 = fia_f1_data(2016, gp_city = city_16, gp_link = links_race_16)

Season: 2016 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-0
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-1
Circuit: russian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-2
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-3
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-4
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-5
Circuit: european | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/race-classification-6
Circuit: austrian | https://www.fia.com/events/

In [325]:
#df_2016

In [326]:
df_2016_session = fia_f1_session(2016, gp_city = city_16, gp_link = links_session_16)

Season: 2016 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-0
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-1
Circuit: russian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-2
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-3
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-4
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-5
Circuit: european | https://www.fia.com/events/fia-formula-one-world-championship/season-2016/session-classifications-6
Circuit: austri

In [327]:
f1_2016 = pd.merge(df_2016, df_2016_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [328]:
f1_2016

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Nico Rosberg,Mercedes AMG Petronas F1 Team,57,1:48:15.565,N. Rosberg,Australian,2016,3.0,1:30.557,21.0,1.560,210.815,17:00:31,3.0,30.204,1.0,23.574,1.0,35.638,1.0,315.6,4.0,277.9,6.0,301.3,15.0,302.9,6,2,18:30.834,2.0,1:26.934,4.0,1:24.796,3.0,1:24.197,6.0,2016.0,2.0,1:24.197
1,2,Lewis Hamilton,Mercedes AMG Petronas F1 Team,57,1:48:23.625,L. Hamilton,Australian,2016,4.0,1:30.646,48.0,1.649,210.608,17:41:53,4.0,30.315,4.0,23.829,5.0,36.352,2.0,314.8,1.0,278.7,1.0,305.2,2.0,309.2,44,2,18:32.027,1.0,1:25.351,5.0,1:24.605,3.0,1:23.837,6.0,2016.0,1.0,1:23.837
2,3,Sebastian Vettel,Scuderia Ferrari,57,1:48:25.208,S. Vettel,Australian,2016,2.0,1:29.951,23.0,0.954,212.235,17:03:29,2.0,30.058,3.0,23.626,3.0,35.715,5.0,312.1,8.0,276.5,5.0,301.9,3.0,308.8,5,3,18:55.199,3.0,1:26.945,5.0,1:25.257,6.0,1:24.675,3.0,2016.0,3.0,1:24.675
3,4,Daniel Ricciardo,Red Bull Racing,57,1:48:39.895,D. Ricciardo,Australian,2016,1.0,1:28.997,49.0,0.000,214.510,17:43:43,1.0,29.718,2.0,23.613,2.0,35.666,8.0,309.7,3.0,278.4,15.0,295.0,1.0,309.8,3,3,18:54.048,8.0,1:26.945,6.0,1:25.599,6.0,1:25.589,3.0,2016.0,8.0,1:25.589
4,5,Felipe Massa,Williams Martini Racing,57,1:49:14.544,F. Massa,Australian,2016,9.0,1:32.288,39.0,3.291,206.861,17:28:28,10.0,30.843,9.0,24.225,11.0,37.054,10.0,308.2,9.0,274.8,8.0,298.0,13.0,303.2,19,2,18:31.355,6.0,1:25.918,6.0,1:25.644,3.0,1:25.458,3.0,2016.0,6.0,1:25.458
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
457,18,Carlos Sainz Jr.,Scuderia Toro Rosso,41,1:15:22.129,C. Sainz Jr.,Abu-dhabi,2016,18.0,1:46.591,30.0,2.862,187.580,17:58:54,18.0,18.686,20.0,44.552,13.0,43.237,17.0,323.6,21.0,275.1,19.0,299.7,17.0,216.9,55,2,43.696,21.0,1:42.393,9.0,,,,,2016.0,21.0,1:42.393
458,19,Daniil Kvyat,Scuderia Toro Rosso,14,0:26:02.917,D. Kvyat,Abu-dhabi,2016,21.0,1:48.752,13.0,5.023,183.853,17:27:47,22.0,19.131,19.0,44.51,20.0,44.261,16.0,326.8,22.0,271.4,15.0,303.1,21.0,213.6,26,1,21.896,17.0,1:42.003,9.0,,,,,2016.0,17.0,1:42.003
459,20,Jenson Button,McLaren Honda,12,0:22:42.839,J. Button,Abu-dhabi,2016,22.0,1:48.753,4.0,5.024,183.851,17:10:57,21.0,19.061,21.0,44.834,21.0,44.406,21.0,316.0,17.0,284.3,21.0,298.2,20.0,214.0,,,,12.0,1:41.158,8.0,1:41.272,6.0,,,2016.0,12.0,1:41.272
460,21,Valtteri Bottas,Williams Martini Racing,6,0:11:03.071,V. Bottas,Abu-dhabi,2016,19.0,1:47.837,4.0,4.108,185.413,17:10:53,20.0,19.020,12.0,43.961,22.0,44.731,20.0,316.9,20.0,277.8,10.0,309.4,13.0,217.6,,,,11.0,1:41.192,3.0,1:41.084,6.0,,,2016.0,11.0,1:41.084


# 2015

In [329]:
links_race_15 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=249', url_string = 'race-class')

In [330]:
links_race_15 = ['/events/fia-formula-one-world-championship/season-2015/race-classification',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-0',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-1',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-2',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-3',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-4',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-5',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-6',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-7',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-9',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-10',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-11',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-12',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-13',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-14',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-15',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-16',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-17',
 '/events/fia-formula-one-world-championship/season-2015/race-classification-18']

In [331]:
links_session_15 = Links_Extraction(url = 'https://www.fia.com/f1-archives?season=249', url_string = 'session')

In [332]:
links_session_15 = ['/events/fia-formula-one-world-championship/season-2015/session-classifications-0',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-1',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-2',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-3',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-4',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-5',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-6',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-7',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-8',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-10',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-11',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-12',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-13',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-14',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-15',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-16',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-17',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications-18',
 '/events/fia-formula-one-world-championship/season-2015/session-classifications']

In [333]:
c_15 = f1_gp_circuits(season = 2015)
city_15 = c_15['GP'].unique().tolist()
c_15

Season: 2015 | Source: espn.com


Unnamed: 0,Date,Race,GP
0,Mar 13 - 15,Australian GPMelbourne Grand Prix Circuit,australian
1,Mar 27 - 29,Malaysian GPSepang International Circuit,malaysian
2,Apr 10 - 12,Chinese GPShanghai International Circuit,chinese
3,Apr 17 - 19,Bahrain GPBahrain International Circuit,bahrain
4,May 8 - 10,Spanish GPCircuit de Barcelona-Catalunya,spanish
5,May 21 - 24,Monaco GPCircuit de Monaco,monaco
6,Jun 5 - 7,Canadian GPCircuit Gilles-Villeneuve,canadian
7,Jun 19 - 21,Austrian GPRed Bull Ring,austrian
8,Jul 3 - 5,British GPSilverstone Circuit,british
9,Jul 24 - 26,Hungarian GPHungaroring,hungarian


In [334]:
df_2015 = fia_f1_data(2015, gp_city = city_15, gp_link = links_race_15)

Season: 2015 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification
Circuit: malaysian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-0
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-1
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-2
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-3
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-4
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-5
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/race-classification-6
Circuit: british | https://www.fia.com/events

In [335]:
df_2015_session = fia_f1_session(2015, gp_city = city_15, gp_link = links_session_15)

Season: 2015 | Source: fia.com
Circuit: australian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-0
Circuit: malaysian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-1
Circuit: chinese | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-2
Circuit: bahrain | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-3
Circuit: spanish | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-4
Circuit: monaco | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-5
Circuit: canadian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-6
Circuit: austrian | https://www.fia.com/events/fia-formula-one-world-championship/season-2015/session-classifications-7
Circuit: br

In [336]:
f1_2015 = pd.merge(df_2015, df_2015_session, how='left', left_on=['DRIVER','GRAND_PRIX'], right_on = ['DRIVER','GRAND_PRIX'])

In [337]:
data = f1_2021.append(f1_2020, ignore_index = True)
data = data.append(f1_2019, ignore_index = True)
data = data.append(f1_2018, ignore_index = True)
data = data.append(f1_2017, ignore_index = True)
data = data.append(f1_2016, ignore_index = True)
data = data.append(f1_2015, ignore_index = True)

In [338]:
print('Shape: ' + str(data.shape))
data.head()

Shape: (2940, 41)


Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON_x,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,SEASON_y,GD_CLASS,GD_TIME
0,1,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,56,1:32:03.897,L. Hamilton,Bahrain,2021,4.0,1:34.015,44.0,1.925,207.235,19:20:17,2.0,29.944,4.0,40.418,3.0,23.141,18.0,315.2,10.0,239.5,2.0,267.7,13.0,290.4,44,2,48.915,2.0,1:30.617,6.0,1:30.085,6.0,1:29.385,6.0,2021.0,2.0,1:29.385
1,2,Max Verstappen,Red Bull Racing Honda,56,1:32:04.642,M. Verstappen,Bahrain,2021,2.0,1:33.228,41.0,1.138,208.984,19:15:41,4.0,30.009,2.0,40.159,2.0,22.995,4.0,327.0,15.0,237.9,6.0,266.1,12.0,291.1,33,2,48.615,1.0,1:30.499,3.0,1:30.318,6.0,1:28.997,6.0,2021.0,1.0,1:28.997
2,3,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,56,1:32:41.280,V. Bottas,Bahrain,2021,1.0,1:32.090,56.0,0.0,211.566,19:39:51,1.0,29.64,1.0,39.508,1.0,22.942,15.0,318.9,11.0,239.5,5.0,266.6,16.0,290.0,77,3,1:21.725,3.0,1:31.200,5.0,1:30.186,6.0,1:29.586,6.0,2021.0,3.0,1:29.586
3,4,Lando Norris,McLaren F1 Team,56,1:32:50.363,L. Norris,Bahrain,2021,6.0,1:34.396,38.0,2.306,206.398,19:11:28,7.0,30.206,6.0,40.525,5.0,23.207,9.0,324.2,16.0,237.6,7.0,263.4,8.0,291.8,4,2,50.539,7.0,1:30.902,6.0,1:30.099,6.0,1:29.974,6.0,2021.0,7.0,1:29.974
4,5,Sergio Perez,Red Bull Racing Honda,56,1:32:55.944,S. Perez,Bahrain,2021,3.0,1:33.970,44.0,1.88,207.334,19:21:10,6.0,30.18,3.0,40.395,6.0,23.231,8.0,324.7,5.0,240.5,4.0,267.0,17.0,290.0,11,3,1:12.289,11.0,1:31.165,5.0,1:30.659,6.0,,,2021.0,11.0,1:30.659


The first part of the dataset preparation is concluded. We can export the data on to excel and take a first glance at the extracted data.

On the next chapter we will do some preprocessing on some data.

## Pre Processing

Update the pilots names to be consistent throught the dataset, since names change from season to season. Also, there are also some caracters that weren't correct on the website and we need to correct it.

These updates will allow us to merge more data from other data sources in order to complete our dataset.

In [339]:
df = data.copy() #checkpoint

In [340]:
df.drop(['SEASON_y'], axis = 1, inplace=True)
df.rename(columns= {'SEASON_x':'SEASON'}, inplace=True)

In [341]:
df['DRIVER'] = df['DRIVER'].str.replace('Sergio Perez Mendoza','Sergio Perez')
df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('S. Perez Mendoza','S. Perez')

  df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('S. Perez Mendoza','S. Perez')


In [342]:
df['DRIVER'] = df['DRIVER'].str.replace(r'[^\x00-\x7f]', '')
df['DRIVER'] = df['DRIVER'].str.replace('Esteban Gutirrez','Esteban Gutierrez')
df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace(r'[^\x00-\x7f]', '')
df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('E. Gutirrez','E. Gutierrez')

  df['DRIVER'] = df['DRIVER'].str.replace(r'[^\x00-\x7f]', '')
  df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace(r'[^\x00-\x7f]', '')
  df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('E. Gutirrez','E. Gutierrez')


In [343]:
df['DRIVER'] = df['DRIVER'].str.replace('Carlos Sainz Jr.','Carlos Sainz')
df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('C. Sainz Jr.','C. Sainz')

  df['DRIVER'] = df['DRIVER'].str.replace('Carlos Sainz Jr.','Carlos Sainz')
  df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('C. Sainz Jr.','C. Sainz')


In [344]:
df['DRIVER'] = df['DRIVER'].str.replace('Roberto Merhi Muntan','Roberto Merhi')
df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('R. Merhi Muntan','R. Merhi')

  df['DRIVER_SNAME'] = df['DRIVER_SNAME'].str.replace('R. Merhi Muntan','R. Merhi')


In [345]:
df['GRAND_PRIX'] = df['GRAND_PRIX'].str.lower()

In [346]:
web_data = df.copy() #checkpoint 2

In [347]:
web_data

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,GD_CLASS,GD_TIME
0,1,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,56,1:32:03.897,L. Hamilton,bahrain,2021,4.0,1:34.015,44.0,1.925,207.235,19:20:17,2.0,29.944,4.0,40.418,3.0,23.141,18.0,315.2,10.0,239.5,2.0,267.7,13.0,290.4,44,2,48.915,2.0,1:30.617,6.0,1:30.085,6.0,1:29.385,6.0,2.0,1:29.385
1,2,Max Verstappen,Red Bull Racing Honda,56,1:32:04.642,M. Verstappen,bahrain,2021,2.0,1:33.228,41.0,1.138,208.984,19:15:41,4.0,30.009,2.0,40.159,2.0,22.995,4.0,327.0,15.0,237.9,6.0,266.1,12.0,291.1,33,2,48.615,1.0,1:30.499,3.0,1:30.318,6.0,1:28.997,6.0,1.0,1:28.997
2,3,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,56,1:32:41.280,V. Bottas,bahrain,2021,1.0,1:32.090,56.0,0.000,211.566,19:39:51,1.0,29.640,1.0,39.508,1.0,22.942,15.0,318.9,11.0,239.5,5.0,266.6,16.0,290.0,77,3,1:21.725,3.0,1:31.200,5.0,1:30.186,6.0,1:29.586,6.0,3.0,1:29.586
3,4,Lando Norris,McLaren F1 Team,56,1:32:50.363,L. Norris,bahrain,2021,6.0,1:34.396,38.0,2.306,206.398,19:11:28,7.0,30.206,6.0,40.525,5.0,23.207,9.0,324.2,16.0,237.6,7.0,263.4,8.0,291.8,4,2,50.539,7.0,1:30.902,6.0,1:30.099,6.0,1:29.974,6.0,7.0,1:29.974
4,5,Sergio Perez,Red Bull Racing Honda,56,1:32:55.944,S. Perez,bahrain,2021,3.0,1:33.970,44.0,1.880,207.334,19:21:10,6.0,30.180,3.0,40.395,6.0,23.231,8.0,324.7,5.0,240.5,4.0,267.0,17.0,290.0,11,3,1:12.289,11.0,1:31.165,5.0,1:30.659,6.0,,,11.0,1:30.659
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2935,16,Max Verstappen,Scuderia Toro Rosso,54,1:39:09.064,M. Verstappen,abu-dhabi,2015,6.0,1:45.746,40.0,1.229,189.079,18:17:12,10.0,18.509,14.0,44.509,5.0,42.675,7.0,325.6,10.0,285.5,12.0,300.7,13.0,215.3,33,3,1:07.121,11.0,1:42.889,8.0,1:42.521,6.0,,,11.0,1:42.521
2936,17,Fernando Alonso,McLaren Honda,53,1:39:17.721,F. Alonso,abu-dhabi,2015,3.0,1:44.796,52.0,0.279,190.793,18:40:55,4.0,18.250,4.0,43.904,2.0,42.407,17.0,306.7,11.0,285.2,15.0,298.5,2.0,221.7,14,4,1:45.741,17.0,1:43.187,5.0,,,,,16.0,1:43.187
2937,18,Will Stevens,Manor Marussia F1 Team,53,1:39:54.204,W. Stevens,abu-dhabi,2015,18.0,1:49.610,53.0,5.093,182.414,18:43:26,18.0,19.423,18.0,45.603,18.0,44.416,19.0,297.6,19.0,268.4,19.0,284.3,16.0,213.2,28,2,45.767,19.0,1:46.297,7.0,,,,,19.0,1:46.297
2938,19,Roberto Merhi,Manor Marussia F1 Team,52,1:39:11.622,R. Merhi,abu-dhabi,2015,19.0,1:51.213,26.0,6.696,179.784,17:53:35,19.0,19.681,19.0,46.215,19.0,45.132,18.0,305.1,18.0,270.4,18.0,285.1,19.0,208.4,98,1,24.314,20.0,1:47.434,8.0,,,,,,


## Azure Database

We would also like to demonstrate the connection from python notebooks to an Azure SQL database to retrive some information to add to our dataset.

On the following site http://ergast.com/mrd/db/#csv we have downloaded the MySQL (5.7) data dumps and convert it to T-SQL in order to run this on Azure SQL Server database.

After that we just need to setup the connection by getting the server, database, username and password.

In [349]:
server = 'f1server.database.windows.net'
database = 'WackyRacesF1'
username = 'PDS'
password = '{Formula1}'  

We now needd to establish a connection with the database.

In [350]:
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+
                      database +';UID='+username+';PWD='+ password)

Now, let's test the connection by doing a simple query on any table, for exampl select the top 5 rows from the table circuits. We setup our variable query and the results from the query to the variable df and then we just need to call it.

In [351]:
query = 'select top 5 * from dbo.circuits'

df = pd.read_sql_query(query, cnxn)
df

Unnamed: 0,circuitId,circuitRef,name,location,country,lat,lng,alt,url
0,1,albert_park,Albert Park Grand Prix Circuit,Melbourne,Australia,-378.497009,144.968002,10,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
1,2,sepang,Sepang International Circuit,Kuala Lumpur,Malaysia,276.083008,101.737999,18,http://en.wikipedia.org/wiki/Sepang_Internatio...
2,3,bahrain,Bahrain International Circuit,Sakhir,Bahrain,260.325012,505.105988,7,http://en.wikipedia.org/wiki/Bahrain_Internati...
3,4,catalunya,Circuit de Barcelona-Catalunya,Montmeló,Spain,41.57,226.110992,109,http://en.wikipedia.org/wiki/Circuit_de_Barcel...
4,5,istanbul,Istanbul Park,Istanbul,Turkey,409.516998,29.405001,130,http://en.wikipedia.org/wiki/Istanbul_Park


<div class="alert alert-block alert-info">
<b>Note:</b> read_sql_query will allow the information returned to be displayed as an actual table, that we are used to use in pandas.
</div>

### Retrieve Information from Azure SQL Server

We want to retrive the list of all drivers with the corresponding **date of birth** and **nationality** we might have on our scrapped dataset.

In [352]:
query = '''
    SELECT [driverId] AS DRIVER_ID
        , [forename] + ' '+ [surname] AS DRIVER
        , [driverRef] as DRIVER_REF
        , [code] AS DRIVER_CODE
        , [dob] AS DOB
        , [nationality] AS NATIONALITY
    FROM [dbo].[drivers]
    WHERE [number] IS NOT NULL
    '''
driver = pd.read_sql_query(query, cnxn)

In [353]:
driver.head()

Unnamed: 0,DRIVER_ID,DRIVER,DRIVER_REF,DRIVER_CODE,DOB,NATIONALITY
0,1,Lewis Hamilton,hamilton,HAM,1985-01-07,British
1,3,Nico Rosberg,rosberg,ROS,1985-06-27,German
2,4,Fernando Alonso,alonso,ALO,1981-07-29,Spanish
3,8,Kimi Räikkönen,raikkonen,RAI,1979-10-17,Finnish
4,9,Robert Kubica,kubica,KUB,1984-12-07,Polish


Additionally, we have retrived DRIVERREF, and DRIVER_CODE, which may come in handy to merge or display information in some graphics in the exploration phase.

Before we can use this data we nee to perform some preprocessing to the names of the drivers to make them consistent among all datsets.

In [354]:
driver['DRIVER'] = driver['DRIVER'].str.replace('Kimi Räikkönen','Kimi Raikkonen')
driver['DRIVER'] = driver['DRIVER'].str.replace('Nico Hülkenberg','Nico Hulkenberg')
driver['DRIVER'] = driver['DRIVER'].str.replace('Sergio Pérez','Sergio Perez')
driver['DRIVER'] = driver['DRIVER'].str.replace('Jean-Éric Vergne','Jean-Eric Vergne')
driver['DRIVER'] = driver['DRIVER'].str.replace('Esteban Gutiérrez','Esteban Gutierrez')

In [355]:
driver.head()

Unnamed: 0,DRIVER_ID,DRIVER,DRIVER_REF,DRIVER_CODE,DOB,NATIONALITY
0,1,Lewis Hamilton,hamilton,HAM,1985-01-07,British
1,3,Nico Rosberg,rosberg,ROS,1985-06-27,German
2,4,Fernando Alonso,alonso,ALO,1981-07-29,Spanish
3,8,Kimi Raikkonen,raikkonen,RAI,1979-10-17,Finnish
4,9,Robert Kubica,kubica,KUB,1984-12-07,Polish


We have processed the data to use at a later date. We can now store this data in the database under a new schema.

we will create a cursor to create a new schema in the database

In [None]:
#cursor = cnxn.cursor()

In [None]:
cursor = cnxn.cursor()
cursor.execute('CREATE SCHEMA pfds')
cnxn.commit()

Before uploading the data into the database we need to create the table that will hold our information.

In [None]:
# No need to recreate the table, we will just truncate the table.

#cursor.execute('''
#	CREATE TABLE pfds.driver (
#	DRIVER_ID int,
#	DRIVER nvarchar(50),
#	DRIVER_REF nvarchar(50),
#	DRIVER_CODE nvarchar(3),
#	NATIONALITY nvarchar(50)
#	)
#    ''')
#cnxn.commit()

In [None]:
# will remove all data from pfds.driver table in the Azure SQL server

#cursor.execute('TRUNCATE TABLE pfds.driver')
#cnxn.commit()

In [None]:
# Inserts the results from trhe query back to the Azure SQL Server database

#for row in driver.itertuples():
#    cursor.execute('''
#                INSERT INTO pfds.driver (DRIVER_ID, DRIVER, DRIVER_REF, DRIVER_CODE, NATIONALITY)
#                VALUES (?,?,?,?,?)
#                ''',
#                row.DRIVER_ID, 
#                row.DRIVER,
#                row.DRIVER_REF,
#                row.DRIVER_CODE,
#                row.NATIONALITY
#                )
#cnxn.commit()

Note that this code was commented out, since this will drop and recreate tables in the server and this does take some time. Please feel free to do so in order to test these features.

After we have imported the data into Azure SQL server we can query that data by just:

In [356]:
query = 'SELECT * FROM pfds.driver'

azure_driver = pd.read_sql_query(query, cnxn)

In [357]:
azure_driver

Unnamed: 0,DRIVER_ID,DRIVER,DRIVER_REF,DRIVER_CODE,NATIONALITY
0,1,Lewis Hamilton,hamilton,HAM,British
1,3,Nico Rosberg,rosberg,ROS,German
2,4,Fernando Alonso,alonso,ALO,Spanish
3,8,Kimi Raikkonen,raikkonen,RAI,Finnish
4,9,Robert Kubica,kubica,KUB,Polish
...,...,...,...,...,...
46,851,Jack Aitken,aitken,AIT,British
47,852,Yuki Tsunoda,tsunoda,TSU,Japanese
48,853,Nikita Mazepin,mazepin,MAZ,Russian
49,854,Mick Schumacher,mick_schumacher,MSC,German


Let us save our data into an excel file.

In [358]:
driver.to_excel('f1_drivers_2015_2021.xlsx', index = False)

Now we need to retrieve information regarding the status of the race for each racer, meaning, we need to know if a driver finished the race or if had a malfunction and if so, which one was.

Let us do another query, retrieving information from several tables including the one we have just uploaded.

In [359]:
query = ''';WITH one AS (SELECT 
      drivers.driverId AS DRIVER_ID
    , D.DRIVER
    , results.number AS NUMBER
    , results.raceId AS RACE_ID
    , races.name as RACE_NAME
    , replace(races.name,' Grand Prix', '') AS RACE 
    , results.StatusId AS STATUS_ID
    , status.status AS DRIVER_STATUS
    , circuits.circuitId AS CIRCUIT_ID
    , circuits.circuitREf AS CIRCUIT_REF
    , races.year AS SEASON
    FROM drivers
    left join results
        ON drivers.driverId = results.driverId
    left join races
        ON results.raceId = races.raceId
    left join seasons
        ON races.year = seasons.year
    left join circuits
        ON  races.circuitId = circuits.circuitId
    left join status
        ON  results.statusId = status.statusId
    left join pfds.driver D
        ON  drivers.driverId  = D.driver_Id)
    , two as (
    SELECT distinct * , replace(lower(RACE),' ', '-') AS GRAND_PRIX
    FROM one 
    WHERE SEASON IN ('2015','2016','2017','2018','2019','2020','2021'))
    SELECT * FROM two
    '''

status = pd.read_sql_query(query, cnxn)

When looking at the data we need to correct some information from the data base, so we run the following code

In [360]:
status['RACE_NAME'] = status['RACE_NAME'].str.replace(r'[^\x00-\x7f]', '')
status['RACE_NAME'] = status['RACE_NAME'].str.replace('So Paulo Grand Prix','Sao Paulo Grand Prix')

status['RACE'] = status['RACE'].str.replace(r'[^\x00-\x7f]', '')
status['RACE'] = status['RACE'].str.replace('So Paulo','So Paulo')

status['GRAND_PRIX'] = status['GRAND_PRIX'].str.replace(r'[^\x00-\x7f]', '')
status['GRAND_PRIX'] = status['GRAND_PRIX'].str.replace('So Paulo','brazilian')

status['GRAND_PRIX'] = status['GRAND_PRIX'].str.replace('mexico-city','mexican')

  status['RACE_NAME'] = status['RACE_NAME'].str.replace(r'[^\x00-\x7f]', '')
  status['RACE'] = status['RACE'].str.replace(r'[^\x00-\x7f]', '')
  status['GRAND_PRIX'] = status['GRAND_PRIX'].str.replace(r'[^\x00-\x7f]', '')


The data should be now ready to be merged with our main data set.

In [361]:
status

Unnamed: 0,DRIVER_ID,DRIVER,NUMBER,RACE_ID,RACE_NAME,RACE,STATUS_ID,DRIVER_STATUS,CIRCUIT_ID,CIRCUIT_REF,SEASON,GRAND_PRIX
0,1,Lewis Hamilton,44,926,Australian Grand Prix,Australian,1,Finished,1,albert_park,2015,australian
1,1,Lewis Hamilton,44,927,Malaysian Grand Prix,Malaysian,1,Finished,2,sepang,2015,malaysian
2,1,Lewis Hamilton,44,928,Chinese Grand Prix,Chinese,1,Finished,17,shanghai,2015,chinese
3,1,Lewis Hamilton,44,929,Bahrain Grand Prix,Bahrain,1,Finished,3,bahrain,2015,bahrain
4,1,Lewis Hamilton,44,930,Spanish Grand Prix,Spanish,1,Finished,4,catalunya,2015,spanish
...,...,...,...,...,...,...,...,...,...,...,...,...
2855,854,Mick Schumacher,47,1069,United States Grand Prix,United States,12,+2 Laps,69,americas,2021,united-states
2856,854,Mick Schumacher,47,1070,Mexico City Grand Prix,Mexico City,4,Collision,32,rodriguez,2021,mexican
2857,854,Mick Schumacher,47,1071,Sao Paulo Grand Prix,So Paulo,12,+2 Laps,18,interlagos,2021,so-paulo
2858,854,Mick Schumacher,47,1072,Saudi Arabian Grand Prix,Saudi Arabian,3,Accident,77,jeddah,2021,saudi-arabian


In [None]:
# No need to recreate the table, we will just truncate the table.

#cursor.execute('''
#	CREATE TABLE pfds.status (
#      DRIVER_ID	int
#    , DRIVER	nvarchar(50)
#    , NUMBER	int
#    , RACE_ID	int
#    , RACE_NAME	nvarchar(100)
#    , RACE	nvarchar(50)
#    , STATUS_ID	int
#    , DRIVER_STATUS	nvarchar(50)
#    , CIRCUIT_ID	int
#    , CIRCUIT_REF	nvarchar(50)
#    , SEASON	nvarchar(4)
#    , GRAND_PRIX	nvarchar(50)
#	)
#    ''')
#cnxn.commit()

In [None]:
#cursor.execute('TRUNCATE TABLE pfds.status')
#cnxn.commit()

In [None]:
#for row in status.itertuples():
#    cursor.execute('''
#                INSERT INTO pfds.status (
#                                          DRIVER_ID	
#                                        , DRIVER
#                                        , NUMBER
#                                        , RACE_ID
#                                        , RACE_NAME	
#                                        , RACE
#                                        , STATUS_ID	
#                                        , DRIVER_STATUS
#                                        , CIRCUIT_ID	
#                                        , CIRCUIT_REF	
#                                        , SEASON
#                                        , GRAND_PRIX	
#                                        )
#                VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
#                ''',
#                row.DRIVER_ID, 
#                row.DRIVER,
#                row.NUMBER,
#                row.RACE_ID,
#                row.RACE_NAME,
#                row.RACE,
#                row.STATUS_ID,	
#                row.DRIVER_STATUS,
#                row.CIRCUIT_ID,	
#                row.CIRCUIT_REF,
#                row.SEASON,
#                row.GRAND_PRIX,	
#                )
#cnxn.commit()

In [362]:
query = 'SELECT * FROM pfds.status'

azure_status = pd.read_sql_query(query, cnxn)

In [363]:
azure_status

Unnamed: 0,DRIVER_ID,DRIVER,NUMBER,RACE_ID,RACE_NAME,RACE,STATUS_ID,DRIVER_STATUS,CIRCUIT_ID,CIRCUIT_REF,SEASON,GRAND_PRIX
0,1,Lewis Hamilton,44,926,Australian Grand Prix,Australian,1,Finished,1,albert_park,2015,australian
1,1,Lewis Hamilton,44,927,Malaysian Grand Prix,Malaysian,1,Finished,2,sepang,2015,malaysian
2,1,Lewis Hamilton,44,928,Chinese Grand Prix,Chinese,1,Finished,17,shanghai,2015,chinese
3,1,Lewis Hamilton,44,929,Bahrain Grand Prix,Bahrain,1,Finished,3,bahrain,2015,bahrain
4,1,Lewis Hamilton,44,930,Spanish Grand Prix,Spanish,1,Finished,4,catalunya,2015,spanish
...,...,...,...,...,...,...,...,...,...,...,...,...
2855,854,Mick Schumacher,47,1069,United States Grand Prix,United States,12,+2 Laps,69,americas,2021,united-states
2856,854,Mick Schumacher,47,1070,Mexico City Grand Prix,Mexico City,4,Collision,32,rodriguez,2021,mexico-city
2857,854,Mick Schumacher,47,1071,So Paulo Grand Prix,São Paulo,12,+2 Laps,18,interlagos,2021,são-paulo
2858,854,Mick Schumacher,47,1072,Saudi Arabian Grand Prix,Saudi Arabian,3,Accident,77,jeddah,2021,saudi-arabian


In [364]:
f1_full_data = pd.merge(web_data, status, how='left', left_on=['DRIVER','GRAND_PRIX','SEASON'], 
                                                      right_on = ['DRIVER','GRAND_PRIX','SEASON'])

In [365]:
f1_full_data.head()

Unnamed: 0,CLASS,DRIVER,TEAM,LAPS,RACE_TIME,DRIVER_SNAME,GRAND_PRIX,SEASON,FLAP_POS,FLAP_TIME,F_LAP,FLAP_GAP,FLAP_KM/H,FLAP_HOUR,BS1_POS,BS1_TIME,BS2_POS,BS2_TIME,BS3_POS,BS3_TIME,ST_POS,ST_KM/H,I1_POS,I1_KM/H,I2_POS,I2_KM/H,FL_POS,FL_KM/H,DRIVER_NO,STOPS,PS_TOTAL_TIME,QL_CLASS,QL_TIME1,QL_LAPS1,QL_TIME2,QL_LAPS2,QL_TIME3,QL_LAPS3,GD_CLASS,GD_TIME,DRIVER_ID,NUMBER,RACE_ID,RACE_NAME,RACE,STATUS_ID,DRIVER_STATUS,CIRCUIT_ID,CIRCUIT_REF
0,1,Lewis Hamilton,Mercedes-AMG Petronas F1 Team,56,1:32:03.897,L. Hamilton,bahrain,2021,4.0,1:34.015,44.0,1.925,207.235,19:20:17,2.0,29.944,4.0,40.418,3.0,23.141,18.0,315.2,10.0,239.5,2.0,267.7,13.0,290.4,44,2,48.915,2.0,1:30.617,6.0,1:30.085,6.0,1:29.385,6.0,2.0,1:29.385,1.0,44.0,1052.0,Bahrain Grand Prix,Bahrain,1.0,Finished,3.0,bahrain
1,2,Max Verstappen,Red Bull Racing Honda,56,1:32:04.642,M. Verstappen,bahrain,2021,2.0,1:33.228,41.0,1.138,208.984,19:15:41,4.0,30.009,2.0,40.159,2.0,22.995,4.0,327.0,15.0,237.9,6.0,266.1,12.0,291.1,33,2,48.615,1.0,1:30.499,3.0,1:30.318,6.0,1:28.997,6.0,1.0,1:28.997,830.0,33.0,1052.0,Bahrain Grand Prix,Bahrain,1.0,Finished,3.0,bahrain
2,3,Valtteri Bottas,Mercedes-AMG Petronas F1 Team,56,1:32:41.280,V. Bottas,bahrain,2021,1.0,1:32.090,56.0,0.0,211.566,19:39:51,1.0,29.64,1.0,39.508,1.0,22.942,15.0,318.9,11.0,239.5,5.0,266.6,16.0,290.0,77,3,1:21.725,3.0,1:31.200,5.0,1:30.186,6.0,1:29.586,6.0,3.0,1:29.586,822.0,77.0,1052.0,Bahrain Grand Prix,Bahrain,1.0,Finished,3.0,bahrain
3,4,Lando Norris,McLaren F1 Team,56,1:32:50.363,L. Norris,bahrain,2021,6.0,1:34.396,38.0,2.306,206.398,19:11:28,7.0,30.206,6.0,40.525,5.0,23.207,9.0,324.2,16.0,237.6,7.0,263.4,8.0,291.8,4,2,50.539,7.0,1:30.902,6.0,1:30.099,6.0,1:29.974,6.0,7.0,1:29.974,846.0,4.0,1052.0,Bahrain Grand Prix,Bahrain,1.0,Finished,3.0,bahrain
4,5,Sergio Perez,Red Bull Racing Honda,56,1:32:55.944,S. Perez,bahrain,2021,3.0,1:33.970,44.0,1.88,207.334,19:21:10,6.0,30.18,3.0,40.395,6.0,23.231,8.0,324.7,5.0,240.5,4.0,267.0,17.0,290.0,11,3,1:12.289,11.0,1:31.165,5.0,1:30.659,6.0,,,11.0,1:30.659,815.0,11.0,1052.0,Bahrain Grand Prix,Bahrain,1.0,Finished,3.0,bahrain


In [366]:
f1_full_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2940 entries, 0 to 2939
Data columns (total 49 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   CLASS          2940 non-null   object 
 1   DRIVER         2940 non-null   object 
 2   TEAM           2940 non-null   object 
 3   LAPS           2940 non-null   object 
 4   RACE_TIME      2846 non-null   object 
 5   DRIVER_SNAME   2940 non-null   object 
 6   GRAND_PRIX     2940 non-null   object 
 7   SEASON         2940 non-null   int64  
 8   FLAP_POS       2697 non-null   object 
 9   FLAP_TIME      2697 non-null   object 
 10  F_LAP          2697 non-null   object 
 11  FLAP_GAP       2684 non-null   float64
 12  FLAP_KM/H      2696 non-null   float64
 13  FLAP_HOUR      2696 non-null   object 
 14  BS1_POS        2728 non-null   object 
 15  BS1_TIME       2728 non-null   float64
 16  BS2_POS        2728 non-null   object 
 17  BS2_TIME       2728 non-null   object 
 18  BS3_POS 

Let us save our data set in excel format.

In [367]:
f1_full_data.to_excel('f1_race_data_2015_2021.xlsx')

# API

We can also make use of API functionality that will let us get the most updated information if we would require the dat to be updated constantely, which is not the case but we will make a demonstration of it.

We will use the API to request all the circuits in use for seasons 2015 to 2021, this will retrieve all the circuits locations that we require for our work.

In [368]:
# query API
races = {'season': [],
        'circuit_id': [],
        'lat': [],
        'long': [],
        'country': []}

for year in list(range(2015,2021)):
    url = 'https://ergast.com/api/f1/{}.json' 
    r = requests.get(url.format(year))
    json = r.json()
    for item in json['MRData']['RaceTable']['Races']:
        try:
            races['season'].append(int(item['season']))
        except:
            races['season'].append(None)
        try:
            races['circuit_id'].append(item['Circuit']['circuitId'])
        except:
            races['circuit_id'].append(None)
        try:
            races['lat'].append(float(item['Circuit']['Location']['lat']))
        except:
            races['lat'].append(None)
        try:
            races['long'].append(float(item['Circuit']['Location']['long']))
        except:
            races['long'].append(None)
        try:
            races['country'].append(item['Circuit']['Location']['country'])
        except:
            races['country'].append(None)
        
races = pd.DataFrame(races)

Let us check the results from our query.

In [369]:
races

Unnamed: 0,season,circuit_id,lat,long,country
0,2015,albert_park,-37.84970,144.96800,Australia
1,2015,sepang,2.76083,101.73800,Malaysia
2,2015,shanghai,31.33890,121.22000,China
3,2015,bahrain,26.03250,50.51060,Bahrain
4,2015,catalunya,41.57000,2.26111,Spain
...,...,...,...,...,...
114,2020,imola,44.34390,11.71670,Italy
115,2020,istanbul,40.95170,29.40500,Turkey
116,2020,bahrain,26.03250,50.51060,Bahrain
117,2020,bahrain,26.03250,50.51060,Bahrain


In [370]:
f1_circuits = races
f1_circuits.to_excel('f1_circuits_2015_2021.xlsx', index= False) 

# Data extracted 

Our main data is comprised of 3 datasets

1. Drivers
2. Circuits
3. Race data

### 1. Drivers
- **DRIVER_ID**:	    Driver Identification
- **DRIVER**:	        Driver name
- **DRIVER_REF**:	    Driver reference (last name)
- **DRIVER_CODE**:	    3 letter code based on the driver name
- **DOB**:	            Date of birth
- **NATIONALITY**:	    Nationality
    
### 2. Circuits 
- **season**:           Season
- **circuit_id**:       Circuit identification
- **lat**:              Latitude
- **long**:             Longitude
- **country**:          Country

### 3. Race data

- **CLASS**:	        Race position
- **DRIVER**:	        Driver name
- **TEAM**:	            F1 Team
- **LAPS**:	            Laps completed
- **RACE_TIME**:	    Race total time
- **DRIVER_SNAME**:	    Driver short name
- **GRAND_PRIX**:	    Grand Prix 
- **SEASON**:	        Season year
- **FLAP_POS**:	        Fastest lap position
- **FLAP_TIME**:	    Fastest lap time
- **F_LAP**:	        Fastest lap
- **FLAP_GAP**:	        Fastest lap gap
- **FLAP_KM/H**:	    Fastest lap max speed
- **FLAP_HOUR**:	    Fastest lap time of the day
- **BS1_POS**:	        Sector 1 position
- **BS1_TIME**:	        Sector 1 time
- **BS2_POS**:	        Sector 2 position
- **BS2_TIME**:	        Sector 2 time
- **BS3_POS**:	        Sector 3 position
- **BS3_TIME**:	        Sector 3 time
- **ST_POS**:	        Speed trap position
- **STKM/H**:	        Speed trap speed
- **I1_POS**:	        Maximum speed intermediate 1 position
- **I1_KM/H**:	        Maximum speed intermediate 1 speed
- **I2_POS**:	        Maximum speed intermediate 2 position
- **I2_KM/H**:	        Maximum speed intermediate 2 speed
- **FL_POS**:	        Maximum speed finish line position
- **FL_KM/H**:	        Maximum speed finish line speed
- **DRIVER_NO**:	    Driver number
- **STOPS**:	        Number of pit stops
- **PS_TOTAL_TIME**:	Pit stops total time
- **QL_CLASS**:	        Session classification position
- **QL_TIME1**:	        Qualification session 1 time
- **QL_LAPS1**:	        Qualification session 1 laps
- **QL_TIME2**:	        Qualification session 2 time
- **QL_LAPS2**:	        Qualification session 2 laps
- **QL_TIME3**:	        Qualification session 3 time
- **QL_LAPS3**:	        Qualification session 3 laps
- **GD_CLASS**:	        Grid position
- **GD_TIME**:	        Grid time
- **DRIVER_ID**:	    Driver identification
- **DRIVER**:	        Driver name
- **NUMBER**:	        Driver number
- **RACE_ID**:          Race identification
- **RACE_NAME**:	    Race name
- **RACE**:	            Race short name
- **STATUS_ID**:	    Driver status for each race
- **DRIVER_STATUS**:	Decode for Status ID
- **CIRCUIT_ID**:	    Circuit identification
- **CIRCUIT_REF**:	    Circuit reference
