## Intro

This notebook contains basic importing and light formatting of data to support my F1 Predictor project. The data below is queried from the [Egrast API](http://ergast.com/mrd/).

The Ergast database contains information on races, drivers, qualifying and final grand prox results, and more going back to the 1950s. Some features have more historical data than others for a variety of reasons. For example, though the first F1 World Championship was in 1950, the first Constructor's Champion wasn't crowned until 1958.

Importantly, Ergast rate-limits the API calls you can make. You must specify the number of rows to be returned with a limit of 1000. You can also offset the starting point of those rows via query string. In my project, I'm interested in making predictions about current F1 drivers and teams - so I attach query strings to fetch the most recent race data, going back to 2008. 

### Races

The first table to import is Ergast's base list of races (aka, Grand Prix events) per season. This is the top-most level of Ergast's database, "Race Schedule".

In [91]:
# import the basic packages, including requests to handle API calls and results
import pandas as pd
import numpy as np
import requests
import json

In [119]:
# fetch the database from Ergast as JSON
# turn json response into a DataFrame

    # this was my attempt at using a loop to move through the json returned by the API 1000 rows at a time. 

            # limit = 1000
            # offset = [0, 1000, 2000]

            # race_dict = {}

            # for interval in offset:
            #     url = f'http://ergast.com/api/f1.json?limit={limit}&offset={offset}'
            #     res = requests.get(url)
            #     json = res.json()

            #     race_dict.update(json)
    
    # I couldn't get that to work, so instead I went the long way by creating two separate dictionaries with two
    # separate API calls, then merging them together.

race_1 = {}

url = f'http://ergast.com/api/f1.json?limit=1000&offset=0'
res = requests.get(url)
json = res.json()
    
race_1.update(json)

df_race_1 = pd.DataFrame.from_dict(race_1['MRData']['RaceTable']['Races'], orient="columns")
df_race_1

Unnamed: 0,season,round,url,raceName,Circuit,date,time
0,1950,1,http://en.wikipedia.org/wiki/1950_British_Gran...,British Grand Prix,"{'circuitId': 'silverstone', 'url': 'http://en...",1950-05-13,
1,1950,2,http://en.wikipedia.org/wiki/1950_Monaco_Grand...,Monaco Grand Prix,"{'circuitId': 'monaco', 'url': 'http://en.wiki...",1950-05-21,
2,1950,3,http://en.wikipedia.org/wiki/1950_Indianapolis...,Indianapolis 500,"{'circuitId': 'indianapolis', 'url': 'http://e...",1950-05-30,
3,1950,4,http://en.wikipedia.org/wiki/1950_Swiss_Grand_...,Swiss Grand Prix,"{'circuitId': 'bremgarten', 'url': 'http://en....",1950-06-04,
4,1950,5,http://en.wikipedia.org/wiki/1950_Belgian_Gran...,Belgian Grand Prix,"{'circuitId': 'spa', 'url': 'http://en.wikiped...",1950-06-18,
...,...,...,...,...,...,...,...
995,2018,20,https://en.wikipedia.org/wiki/2018_Brazilian_G...,Brazilian Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2018-11-11,17:10:00Z
996,2018,21,https://en.wikipedia.org/wiki/2018_Abu_Dhabi_G...,Abu Dhabi Grand Prix,"{'circuitId': 'yas_marina', 'url': 'http://en....",2018-11-25,13:10:00Z
997,2019,1,https://en.wikipedia.org/wiki/2019_Australian_...,Australian Grand Prix,"{'circuitId': 'albert_park', 'url': 'http://en...",2019-03-17,05:10:00Z
998,2019,2,https://en.wikipedia.org/wiki/2019_Bahrain_Gra...,Bahrain Grand Prix,"{'circuitId': 'bahrain', 'url': 'http://en.wik...",2019-03-31,15:10:00Z


In [120]:
race_2 = {}

url = f'http://ergast.com/api/f1.json?limit=1000&offset=1000'
res = requests.get(url)
json = res.json()
    
race_2.update(json)

df_race_2 = pd.DataFrame.from_dict(race_2['MRData']['RaceTable']['Races'], orient="columns")
df_race_2

Unnamed: 0,season,round,url,raceName,Circuit,date,time
0,2019,4,https://en.wikipedia.org/wiki/2019_Azerbaijan_...,Azerbaijan Grand Prix,"{'circuitId': 'BAK', 'url': 'http://en.wikiped...",2019-04-28,12:10:00Z
1,2019,5,https://en.wikipedia.org/wiki/2019_Spanish_Gra...,Spanish Grand Prix,"{'circuitId': 'catalunya', 'url': 'http://en.w...",2019-05-12,13:10:00Z
2,2019,6,https://en.wikipedia.org/wiki/2019_Monaco_Gran...,Monaco Grand Prix,"{'circuitId': 'monaco', 'url': 'http://en.wiki...",2019-05-26,13:10:00Z
3,2019,7,https://en.wikipedia.org/wiki/2019_Canadian_Gr...,Canadian Grand Prix,"{'circuitId': 'villeneuve', 'url': 'http://en....",2019-06-09,18:10:00Z
4,2019,8,https://en.wikipedia.org/wiki/2019_French_Gran...,French Grand Prix,"{'circuitId': 'ricard', 'url': 'http://en.wiki...",2019-06-23,13:10:00Z
...,...,...,...,...,...,...,...
75,2022,19,https://en.wikipedia.org/wiki/2022_Japanese_Gr...,Japanese Grand Prix,"{'circuitId': 'suzuka', 'url': 'http://en.wiki...",2022-10-09,05:10:00Z
76,2022,20,https://en.wikipedia.org/wiki/2022_United_Stat...,United States Grand Prix,"{'circuitId': 'americas', 'url': 'http://en.wi...",2022-10-23,19:00:00Z
77,2022,21,https://en.wikipedia.org/wiki/2022_Mexican_Gra...,Mexico City Grand Prix,"{'circuitId': 'rodriguez', 'url': 'http://en.w...",2022-10-30,19:00:00Z
78,2022,22,https://en.wikipedia.org/wiki/2022_S%C3%A3o_Pa...,São Paulo Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2022-11-13,17:00:00Z


In [122]:
df_races = pd.concat([df_race_1, df_race_2])
df_races

Unnamed: 0,season,round,url,raceName,Circuit,date,time
0,1950,1,http://en.wikipedia.org/wiki/1950_British_Gran...,British Grand Prix,"{'circuitId': 'silverstone', 'url': 'http://en...",1950-05-13,
1,1950,2,http://en.wikipedia.org/wiki/1950_Monaco_Grand...,Monaco Grand Prix,"{'circuitId': 'monaco', 'url': 'http://en.wiki...",1950-05-21,
2,1950,3,http://en.wikipedia.org/wiki/1950_Indianapolis...,Indianapolis 500,"{'circuitId': 'indianapolis', 'url': 'http://e...",1950-05-30,
3,1950,4,http://en.wikipedia.org/wiki/1950_Swiss_Grand_...,Swiss Grand Prix,"{'circuitId': 'bremgarten', 'url': 'http://en....",1950-06-04,
4,1950,5,http://en.wikipedia.org/wiki/1950_Belgian_Gran...,Belgian Grand Prix,"{'circuitId': 'spa', 'url': 'http://en.wikiped...",1950-06-18,
...,...,...,...,...,...,...,...
75,2022,19,https://en.wikipedia.org/wiki/2022_Japanese_Gr...,Japanese Grand Prix,"{'circuitId': 'suzuka', 'url': 'http://en.wiki...",2022-10-09,05:10:00Z
76,2022,20,https://en.wikipedia.org/wiki/2022_United_Stat...,United States Grand Prix,"{'circuitId': 'americas', 'url': 'http://en.wi...",2022-10-23,19:00:00Z
77,2022,21,https://en.wikipedia.org/wiki/2022_Mexican_Gra...,Mexico City Grand Prix,"{'circuitId': 'rodriguez', 'url': 'http://en.w...",2022-10-30,19:00:00Z
78,2022,22,https://en.wikipedia.org/wiki/2022_S%C3%A3o_Pa...,São Paulo Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2022-11-13,17:00:00Z


In [135]:
# Before I leave this DataFrame, I extract the circuitName (which in F1 terms is often but not always synonomous with 
# the raceName. For example, the United States Grand Prix has been held at six different circuits over the years.)

df_races['circuitName'] = df_races['Circuit'].apply(lambda k: k['circuitId'] )
df_races

Unnamed: 0,season,round,url,raceName,Circuit,date,time,circuitName
0,1950,1,http://en.wikipedia.org/wiki/1950_British_Gran...,British Grand Prix,"{'circuitId': 'silverstone', 'url': 'http://en...",1950-05-13,,silverstone
1,1950,2,http://en.wikipedia.org/wiki/1950_Monaco_Grand...,Monaco Grand Prix,"{'circuitId': 'monaco', 'url': 'http://en.wiki...",1950-05-21,,monaco
2,1950,3,http://en.wikipedia.org/wiki/1950_Indianapolis...,Indianapolis 500,"{'circuitId': 'indianapolis', 'url': 'http://e...",1950-05-30,,indianapolis
3,1950,4,http://en.wikipedia.org/wiki/1950_Swiss_Grand_...,Swiss Grand Prix,"{'circuitId': 'bremgarten', 'url': 'http://en....",1950-06-04,,bremgarten
4,1950,5,http://en.wikipedia.org/wiki/1950_Belgian_Gran...,Belgian Grand Prix,"{'circuitId': 'spa', 'url': 'http://en.wikiped...",1950-06-18,,spa
...,...,...,...,...,...,...,...,...
75,2022,19,https://en.wikipedia.org/wiki/2022_Japanese_Gr...,Japanese Grand Prix,"{'circuitId': 'suzuka', 'url': 'http://en.wiki...",2022-10-09,05:10:00Z,suzuka
76,2022,20,https://en.wikipedia.org/wiki/2022_United_Stat...,United States Grand Prix,"{'circuitId': 'americas', 'url': 'http://en.wi...",2022-10-23,19:00:00Z,americas
77,2022,21,https://en.wikipedia.org/wiki/2022_Mexican_Gra...,Mexico City Grand Prix,"{'circuitId': 'rodriguez', 'url': 'http://en.w...",2022-10-30,19:00:00Z,rodriguez
78,2022,22,https://en.wikipedia.org/wiki/2022_S%C3%A3o_Pa...,São Paulo Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2022-11-13,17:00:00Z,interlagos


### Results

The results of a race include the grid position of each driver (a grid position of 1 indicates the winner) as well as important race data including each driver's fastest lap time, average lap time, and whether or not they successfully completed the course.

In [138]:
url = 'http://ergast.com/api/f1/results.json?limit=500&offset=800'
res = requests.get(url)
json = res.json()

In [139]:
df_results = pd.DataFrame.from_dict(json['MRData']['RaceTable']['Races'][0]['Results'], orient="columns")
df_results

Unnamed: 0,number,position,positionText,points,Driver,Constructor,grid,laps,status,Time
0,2,1,1,8,"{'driverId': 'fangio', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",3,87,Finished,"{'millis': '10855800', 'time': '3:00:55.8'}"
1,10,2,2,6,"{'driverId': 'farina', 'url': 'http://en.wikip...","{'constructorId': 'ferrari', 'url': 'http://en...",1,87,Finished,"{'millis': '10934800', 'time': '+1:19.0'}"
2,12,3,3,5,"{'driverId': 'gonzalez', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",2,87,Finished,"{'millis': '10976800', 'time': '+2:01.0'}"
3,26,4,4,3,"{'driverId': 'trintignant', 'url': 'http://en....","{'constructorId': 'ferrari', 'url': 'http://en...",5,86,+1 Lap,
4,20,5,5,2,"{'driverId': 'bayol', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",15,85,+2 Laps,
5,28,6,6,0,"{'driverId': 'schell', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",11,84,+3 Laps,
6,8,7,7,0,"{'driverId': 'bira', 'url': 'http://en.wikiped...","{'constructorId': 'maserati', 'url': 'http://e...",10,83,+4 Laps,
7,30,8,8,0,"{'driverId': 'graffenried', 'url': 'http://en....","{'constructorId': 'maserati', 'url': 'http://e...",13,83,+4 Laps,
8,16,9,9,0,"{'driverId': 'maglioli', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",12,82,+5 Laps,
9,18,10,D,0,"{'driverId': 'behra', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",17,61,Disqualified,


In [62]:
for i in json.keys():
    print(i)
    for j in json[i].keys():
        print(j)
        
    

MRData
xmlns
series
url
limit
offset
total
RaceTable


In [132]:
df_results['DriverLastName'] = df_results['Driver'].apply(lambda k: k['familyName'] )

In [133]:
df_results

Unnamed: 0,number,position,positionText,points,Driver,Constructor,grid,laps,status,Time,DriverLastName
0,2,1,1,8,"{'driverId': 'fangio', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",3,87,Finished,"{'millis': '10855800', 'time': '3:00:55.8'}",Fangio
1,10,2,2,6,"{'driverId': 'farina', 'url': 'http://en.wikip...","{'constructorId': 'ferrari', 'url': 'http://en...",1,87,Finished,"{'millis': '10934800', 'time': '+1:19.0'}",Farina
2,12,3,3,5,"{'driverId': 'gonzalez', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",2,87,Finished,"{'millis': '10976800', 'time': '+2:01.0'}",González
3,26,4,4,3,"{'driverId': 'trintignant', 'url': 'http://en....","{'constructorId': 'ferrari', 'url': 'http://en...",5,86,+1 Lap,,Trintignant
4,20,5,5,2,"{'driverId': 'bayol', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",15,85,+2 Laps,,Bayol
5,28,6,6,0,"{'driverId': 'schell', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",11,84,+3 Laps,,Schell
6,8,7,7,0,"{'driverId': 'bira', 'url': 'http://en.wikiped...","{'constructorId': 'maserati', 'url': 'http://e...",10,83,+4 Laps,,Bira
7,30,8,8,0,"{'driverId': 'graffenried', 'url': 'http://en....","{'constructorId': 'maserati', 'url': 'http://e...",13,83,+4 Laps,,de Graffenried
8,16,9,9,0,"{'driverId': 'maglioli', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",12,82,+5 Laps,,Maglioli
9,18,10,D,0,"{'driverId': 'behra', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",17,61,Disqualified,,Behra
