## Intro

This notebook contains basic importing and light formatting of data to support my F1 Predictor project. The data below is queried from the [Egrast API](http://ergast.com/mrd/).

The Ergast database contains information on races, drivers, qualifying and final grand prox results, and more going back to the 1950s. Some features have more historical data than others for a variety of reasons. For example, though the first F1 World Championship was in 1950, the first Constructor's Champion wasn't crowned until 1958.

Importantly, Ergast rate-limits the API calls you can make. You must specify the number of rows to be returned with a limit of 1000. You can also offset the starting point of those rows via query string. In my project, I'm interested in making predictions about current F1 drivers and teams - so I attach query strings to fetch the most recent race data, going back to 2008. 

### Races

The first table to import is Ergast's base list of races (aka, Grand Prix events) per season. This is the top-most level of Ergast's database, "Race Schedule".

In [71]:
#import the basic packages, including requests to handle API calls and results
import pandas as pd
import numpy as np
import requests
import json

In [72]:
#fetch the database from Ergast as JSON
#turn json response into a DataFrame

url = 'http://ergast.com/api/f1.json?limit=500&offset=800'
res = requests.get(url)
json = res.json()

json


{'MRData': {'xmlns': 'http://ergast.com/mrd/1.4',
  'series': 'f1',
  'url': 'http://ergast.com/api/f1.json',
  'limit': '500',
  'offset': '800',
  'total': '1080',
  'RaceTable': {'Races': [{'season': '2008',
     'round': '16',
     'url': 'http://en.wikipedia.org/wiki/2008_Japanese_Grand_Prix',
     'raceName': 'Japanese Grand Prix',
     'Circuit': {'circuitId': 'fuji',
      'url': 'http://en.wikipedia.org/wiki/Fuji_Speedway',
      'circuitName': 'Fuji Speedway',
      'Location': {'lat': '35.3717',
       'long': '138.927',
       'locality': 'Oyama',
       'country': 'Japan'}},
     'date': '2008-10-12',
     'time': '04:30:00Z'},
    {'season': '2008',
     'round': '17',
     'url': 'http://en.wikipedia.org/wiki/2008_Chinese_Grand_Prix',
     'raceName': 'Chinese Grand Prix',
     'Circuit': {'circuitId': 'shanghai',
      'url': 'http://en.wikipedia.org/wiki/Shanghai_International_Circuit',
      'circuitName': 'Shanghai International Circuit',
      'Location': {'lat': '3

In [73]:
df_races = pd.DataFrame.from_dict(json['MRData']['RaceTable']['Races'], orient="columns")
df_races

Unnamed: 0,season,round,url,raceName,Circuit,date,time
0,2008,16,http://en.wikipedia.org/wiki/2008_Japanese_Gra...,Japanese Grand Prix,"{'circuitId': 'fuji', 'url': 'http://en.wikipe...",2008-10-12,04:30:00Z
1,2008,17,http://en.wikipedia.org/wiki/2008_Chinese_Gran...,Chinese Grand Prix,"{'circuitId': 'shanghai', 'url': 'http://en.wi...",2008-10-19,07:00:00Z
2,2008,18,http://en.wikipedia.org/wiki/2008_Brazilian_Gr...,Brazilian Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2008-11-02,17:00:00Z
3,2009,1,http://en.wikipedia.org/wiki/2009_Australian_G...,Australian Grand Prix,"{'circuitId': 'albert_park', 'url': 'http://en...",2009-03-29,06:00:00Z
4,2009,2,http://en.wikipedia.org/wiki/2009_Malaysian_Gr...,Malaysian Grand Prix,"{'circuitId': 'sepang', 'url': 'http://en.wiki...",2009-04-05,09:00:00Z
...,...,...,...,...,...,...,...
275,2022,19,https://en.wikipedia.org/wiki/2022_Japanese_Gr...,Japanese Grand Prix,"{'circuitId': 'suzuka', 'url': 'http://en.wiki...",2022-10-09,05:10:00Z
276,2022,20,https://en.wikipedia.org/wiki/2022_United_Stat...,United States Grand Prix,"{'circuitId': 'americas', 'url': 'http://en.wi...",2022-10-23,19:00:00Z
277,2022,21,https://en.wikipedia.org/wiki/2022_Mexican_Gra...,Mexico City Grand Prix,"{'circuitId': 'rodriguez', 'url': 'http://en.w...",2022-10-30,19:00:00Z
278,2022,22,https://en.wikipedia.org/wiki/2022_S%C3%A3o_Pa...,São Paulo Grand Prix,"{'circuitId': 'interlagos', 'url': 'http://en....",2022-11-13,17:00:00Z


### Results

The results of a race include the grid position of each driver (a grid position of 1 indicates the winner) as well as important race data including each driver's fastest lap time, average lap time, and whether or not they successfully completed the course.

In [79]:
url = 'http://ergast.com/api/f1/results.json?limit=500&offset=800'
res = requests.get(url)
json = res.json()

json

{'MRData': {'xmlns': 'http://ergast.com/mrd/1.4',
  'series': 'f1',
  'url': 'http://ergast.com/api/f1/results.json',
  'limit': '500',
  'offset': '800',
  'total': '25400',
  'RaceTable': {'Races': [{'season': '1954',
     'round': '1',
     'url': 'http://en.wikipedia.org/wiki/1954_Argentine_Grand_Prix',
     'raceName': 'Argentine Grand Prix',
     'Circuit': {'circuitId': 'galvez',
      'url': 'http://en.wikipedia.org/wiki/Aut%C3%B3dromo_Oscar_Alfredo_G%C3%A1lvez',
      'circuitName': 'Autódromo Juan y Oscar Gálvez',
      'Location': {'lat': '-34.6943',
       'long': '-58.4593',
       'locality': 'Buenos Aires',
       'country': 'Argentina'}},
     'date': '1954-01-17',
     'Results': [{'number': '2',
       'position': '1',
       'positionText': '1',
       'points': '8',
       'Driver': {'driverId': 'fangio',
        'url': 'http://en.wikipedia.org/wiki/Juan_Manuel_Fangio',
        'givenName': 'Juan',
        'familyName': 'Fangio',
        'dateOfBirth': '1911-06-24',

In [87]:
df_races = pd.DataFrame.from_dict(json['MRData']['RaceTable']['Races'][0]['Results'], orient="columns")
df_races

Unnamed: 0,number,position,positionText,points,Driver,Constructor,grid,laps,status,Time
0,2,1,1,8,"{'driverId': 'fangio', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",3,87,Finished,"{'millis': '10855800', 'time': '3:00:55.8'}"
1,10,2,2,6,"{'driverId': 'farina', 'url': 'http://en.wikip...","{'constructorId': 'ferrari', 'url': 'http://en...",1,87,Finished,"{'millis': '10934800', 'time': '+1:19.0'}"
2,12,3,3,5,"{'driverId': 'gonzalez', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",2,87,Finished,"{'millis': '10976800', 'time': '+2:01.0'}"
3,26,4,4,3,"{'driverId': 'trintignant', 'url': 'http://en....","{'constructorId': 'ferrari', 'url': 'http://en...",5,86,+1 Lap,
4,20,5,5,2,"{'driverId': 'bayol', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",15,85,+2 Laps,
5,28,6,6,0,"{'driverId': 'schell', 'url': 'http://en.wikip...","{'constructorId': 'maserati', 'url': 'http://e...",11,84,+3 Laps,
6,8,7,7,0,"{'driverId': 'bira', 'url': 'http://en.wikiped...","{'constructorId': 'maserati', 'url': 'http://e...",10,83,+4 Laps,
7,30,8,8,0,"{'driverId': 'graffenried', 'url': 'http://en....","{'constructorId': 'maserati', 'url': 'http://e...",13,83,+4 Laps,
8,16,9,9,0,"{'driverId': 'maglioli', 'url': 'http://en.wik...","{'constructorId': 'ferrari', 'url': 'http://en...",12,82,+5 Laps,
9,18,10,D,0,"{'driverId': 'behra', 'url': 'http://en.wikipe...","{'constructorId': 'gordini', 'url': 'http://en...",17,61,Disqualified,


In [94]:
for i in json.keys():
    print(i)
    for j in json[i].keys():
        print(j)
        
    

MRData
xmlns
series
url
limit
offset
total
RaceTable
