The original Formula 1 dataset is split into multiple CSV files, each containing different information about races, drivers, teams, and circuits. To build a predictive model, all relevant attributes must exist in a single dataset. Therefore, I have merged:

race_results with races to add race context (season, round, circuitId), races with circuits to include geographical track features, race_results with drivers to attach driver demographics, and race_results with constructors to include team information.

This merging process produces one unified dataset containing every variable needed to predict driver finishing position accurately.

Please check the code below. Thank you

In [1]:
import pandas as pd

In [2]:
old_results = pd.read_csv("race_results.csv")
races = pd.read_csv("races.csv")
circuits = pd.read_csv("circuits.csv")
drivers = pd.read_csv("drivers.csv")
constructors = pd.read_csv("constructors.csv")

In [3]:
old_results.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName,number,position,positionText,points,grid,laps,status,time,fastestLapRank,fastestLap_lap,fastestLapTime,averageSpeed
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,1,70,Finished,2:13:23.6,,,,
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,2,70,Finished,+2.6,,,,
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,4,70,Finished,+52.0,,,,
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,6,68,+2 Laps,,,,,
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,9,68,+2 Laps,,,,,


In [4]:
results1 = old_results.merge(races, on=["season", "round"], how="left")

In [5]:
results1.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName,number,position,positionText,points,...,circuitId,circuitName,date,time_y,firstPractice,secondPractice,thirdPractice,qualifying,sprint,url
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,...,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,...,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,...,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,...,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,...,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...


In [6]:
results2 = results1.merge(circuits, on="circuitId", how="left")

In [7]:
results2.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName,number,position,positionText,points,...,thirdPractice,qualifying,sprint,url_x,circuitName_y,lat,long,locality,country,url_y
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,...,,,,http://en.wikipedia.org/wiki/1950_British_Gran...,Silverstone Circuit,52.0786,-1.01694,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,...,,,,http://en.wikipedia.org/wiki/1950_British_Gran...,Silverstone Circuit,52.0786,-1.01694,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,...,,,,http://en.wikipedia.org/wiki/1950_British_Gran...,Silverstone Circuit,52.0786,-1.01694,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,...,,,,http://en.wikipedia.org/wiki/1950_British_Gran...,Silverstone Circuit,52.0786,-1.01694,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,...,,,,http://en.wikipedia.org/wiki/1950_British_Gran...,Silverstone Circuit,52.0786,-1.01694,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit


In [8]:
results3 = results2.merge(drivers, on="driverId", how="left")

In [9]:
results3.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName,number,position,positionText,points,...,locality,country,url_y,givenName,familyName,code,permanentNumber,dateOfBirth,nationality,url
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Nino,Farina,,,1906-10-30,Italian,http://en.wikipedia.org/wiki/Nino_Farina
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Luigi,Fagioli,,,1898-06-09,Italian,http://en.wikipedia.org/wiki/Luigi_Fagioli
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Reg,Parnell,,,1911-07-02,British,http://en.wikipedia.org/wiki/Reg_Parnell
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Yves,Cabantous,,,1904-10-08,French,http://en.wikipedia.org/wiki/Yves_Giraud_Caban...
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Louis,Rosier,,,1905-11-05,French,http://en.wikipedia.org/wiki/Louis_Rosier


In [10]:
results3 = results3.rename(columns={"url_x": "race_url"})

In [11]:
results3.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName,number,position,positionText,points,...,locality,country,url_y,givenName,familyName,code,permanentNumber,dateOfBirth,nationality,url
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Nino,Farina,,,1906-10-30,Italian,http://en.wikipedia.org/wiki/Nino_Farina
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Luigi,Fagioli,,,1898-06-09,Italian,http://en.wikipedia.org/wiki/Luigi_Fagioli
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Reg,Parnell,,,1911-07-02,British,http://en.wikipedia.org/wiki/Reg_Parnell
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Yves,Cabantous,,,1904-10-08,French,http://en.wikipedia.org/wiki/Yves_Giraud_Caban...
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,...,Silverstone,UK,http://en.wikipedia.org/wiki/Silverstone_Circuit,Louis,Rosier,,,1905-11-05,French,http://en.wikipedia.org/wiki/Louis_Rosier


In [12]:
results = results3.merge(constructors, on="constructorId", how="left")

In [13]:
results.head()

Unnamed: 0,season,round,driverId,driverName,constructorId,constructorName_x,number,position,positionText,points,...,givenName,familyName,code,permanentNumber,dateOfBirth,nationality_x,url_x,constructorName_y,nationality_y,url_y
0,1950,1,farina,Nino Farina,alfa,Alfa Romeo,2.0,1,1,9.0,...,Nino,Farina,,,1906-10-30,Italian,http://en.wikipedia.org/wiki/Nino_Farina,Alfa Romeo,Swiss,http://en.wikipedia.org/wiki/Alfa_Romeo_in_For...
1,1950,1,fagioli,Luigi Fagioli,alfa,Alfa Romeo,3.0,2,2,6.0,...,Luigi,Fagioli,,,1898-06-09,Italian,http://en.wikipedia.org/wiki/Luigi_Fagioli,Alfa Romeo,Swiss,http://en.wikipedia.org/wiki/Alfa_Romeo_in_For...
2,1950,1,reg_parnell,Reg Parnell,alfa,Alfa Romeo,4.0,3,3,4.0,...,Reg,Parnell,,,1911-07-02,British,http://en.wikipedia.org/wiki/Reg_Parnell,Alfa Romeo,Swiss,http://en.wikipedia.org/wiki/Alfa_Romeo_in_For...
3,1950,1,cabantous,Yves Cabantous,lago,Talbot-Lago,14.0,4,4,3.0,...,Yves,Cabantous,,,1904-10-08,French,http://en.wikipedia.org/wiki/Yves_Giraud_Caban...,Talbot-Lago,French,http://en.wikipedia.org/wiki/Talbot-Lago
4,1950,1,rosier,Louis Rosier,lago,Talbot-Lago,15.0,5,5,2.0,...,Louis,Rosier,,,1905-11-05,French,http://en.wikipedia.org/wiki/Louis_Rosier,Talbot-Lago,French,http://en.wikipedia.org/wiki/Talbot-Lago


In [16]:
results.columns

Index(['season', 'round', 'driverId', 'driverName', 'constructorId',
       'constructorName_x', 'number', 'position', 'positionText', 'points',
       'grid', 'laps', 'status', 'time_x', 'fastestLapRank', 'fastestLap_lap',
       'fastestLapTime', 'averageSpeed', 'raceName', 'circuitId',
       'circuitName_x', 'date', 'time_y', 'firstPractice', 'secondPractice',
       'thirdPractice', 'qualifying', 'sprint', 'race_url', 'circuitName_y',
       'lat', 'long', 'locality', 'country', 'url_y', 'givenName',
       'familyName', 'code', 'permanentNumber', 'dateOfBirth', 'nationality_x',
       'url_x', 'constructorName_y', 'nationality_y', 'url_y'],
      dtype='object')

In [17]:
results.shape

(26759, 45)

In [14]:
results.to_csv('unclean_results.csv', index=False)

In [18]:
results = results.drop(columns=["race_url", "url_x", "url_y", "positionText", "constructorName_y", "circuitName_y", "driverName", "time_x", "time_y", "firstPractice", "secondPractice", "thirdPractice", "qualifying", "sprint"], errors="ignore")


In [19]:
results.columns

Index(['season', 'round', 'driverId', 'constructorId', 'constructorName_x',
       'number', 'position', 'points', 'grid', 'laps', 'status',
       'fastestLapRank', 'fastestLap_lap', 'fastestLapTime', 'averageSpeed',
       'raceName', 'circuitId', 'circuitName_x', 'date', 'lat', 'long',
       'locality', 'country', 'givenName', 'familyName', 'code',
       'permanentNumber', 'dateOfBirth', 'nationality_x', 'nationality_y'],
      dtype='object')

In [20]:
results.head()

Unnamed: 0,season,round,driverId,constructorId,constructorName_x,number,position,points,grid,laps,...,long,locality,country,givenName,familyName,code,permanentNumber,dateOfBirth,nationality_x,nationality_y
0,1950,1,farina,alfa,Alfa Romeo,2.0,1,9.0,1,70,...,-1.01694,Silverstone,UK,Nino,Farina,,,1906-10-30,Italian,Swiss
1,1950,1,fagioli,alfa,Alfa Romeo,3.0,2,6.0,2,70,...,-1.01694,Silverstone,UK,Luigi,Fagioli,,,1898-06-09,Italian,Swiss
2,1950,1,reg_parnell,alfa,Alfa Romeo,4.0,3,4.0,4,70,...,-1.01694,Silverstone,UK,Reg,Parnell,,,1911-07-02,British,Swiss
3,1950,1,cabantous,lago,Talbot-Lago,14.0,4,3.0,6,68,...,-1.01694,Silverstone,UK,Yves,Cabantous,,,1904-10-08,French,French
4,1950,1,rosier,lago,Talbot-Lago,15.0,5,2.0,9,68,...,-1.01694,Silverstone,UK,Louis,Rosier,,,1905-11-05,French,French


In [21]:
results= results[results["season"] >= 2009]

In [23]:
results.head(20)

Unnamed: 0,season,round,driverId,constructorId,constructorName_x,number,position,points,grid,laps,...,long,locality,country,givenName,familyName,code,permanentNumber,dateOfBirth,nationality_x,nationality_y
19983,2009,1,button,brawn,Brawn,22.0,1,10.0,1,58,...,144.968,Melbourne,Australia,Jenson,Button,BUT,22.0,1980-01-19,British,British
19984,2009,1,barrichello,brawn,Brawn,23.0,2,8.0,2,58,...,144.968,Melbourne,Australia,Rubens,Barrichello,BAR,,1972-05-23,Brazilian,British
19985,2009,1,trulli,toyota,Toyota,9.0,3,6.0,20,58,...,144.968,Melbourne,Australia,Jarno,Trulli,TRU,,1974-07-13,Italian,Japanese
19986,2009,1,glock,toyota,Toyota,10.0,4,5.0,19,58,...,144.968,Melbourne,Australia,Timo,Glock,GLO,,1982-03-18,German,Japanese
19987,2009,1,alonso,renault,Renault,7.0,5,4.0,10,58,...,144.968,Melbourne,Australia,Fernando,Alonso,ALO,14.0,1981-07-29,Spanish,French
19988,2009,1,rosberg,williams,Williams,16.0,6,3.0,5,58,...,144.968,Melbourne,Australia,Nico,Rosberg,ROS,6.0,1985-06-27,German,British
19989,2009,1,buemi,toro_rosso,Toro Rosso,12.0,7,2.0,13,58,...,144.968,Melbourne,Australia,Sébastien,Buemi,BUE,,1988-10-31,Swiss,Italian
19990,2009,1,bourdais,toro_rosso,Toro Rosso,11.0,8,1.0,17,58,...,144.968,Melbourne,Australia,Sébastien,Bourdais,BOU,,1979-02-28,French,Italian
19991,2009,1,sutil,force_india,Force India,20.0,9,0.0,16,58,...,144.968,Melbourne,Australia,Adrian,Sutil,SUT,99.0,1983-01-11,German,Indian
19992,2009,1,heidfeld,bmw_sauber,BMW Sauber,6.0,10,0.0,9,58,...,144.968,Melbourne,Australia,Nick,Heidfeld,HEI,,1977-05-10,German,German


In [24]:
results.shape

(6776, 30)

In [25]:
drivers.head()

Unnamed: 0,driverId,givenName,familyName,code,permanentNumber,dateOfBirth,nationality,url
0,abate,Carlo,Abate,,,1932-07-10,Italian,http://en.wikipedia.org/wiki/Carlo_Mario_Abate
1,abecassis,George,Abecassis,,,1913-03-21,British,http://en.wikipedia.org/wiki/George_Abecassis
2,acheson,Kenny,Acheson,,,1957-11-27,British,http://en.wikipedia.org/wiki/Kenny_Acheson
3,adams,Philippe,Adams,,,1969-11-19,Belgian,http://en.wikipedia.org/wiki/Philippe_Adams
4,ader,Walt,Ader,,,1913-12-15,American,http://en.wikipedia.org/wiki/Walt_Ader


In [26]:
constructors.head()

Unnamed: 0,constructorId,constructorName,nationality,url
0,adams,Adams,American,http://en.wikipedia.org/wiki/Adams_(constructor)
1,afm,AFM,German,http://en.wikipedia.org/wiki/Alex_von_Falkenha...
2,ags,AGS,French,http://en.wikipedia.org/wiki/Automobiles_Gonfa...
3,alfa,Alfa Romeo,Swiss,http://en.wikipedia.org/wiki/Alfa_Romeo_in_For...
4,alphatauri,AlphaTauri,Italian,http://en.wikipedia.org/wiki/Scuderia_AlphaTauri


In [27]:
races.head()

Unnamed: 0,season,round,raceName,circuitId,circuitName,date,time,firstPractice,secondPractice,thirdPractice,qualifying,sprint,url
0,1950,1,British Grand Prix,silverstone,Silverstone Circuit,1950-05-13,,,,,,,http://en.wikipedia.org/wiki/1950_British_Gran...
1,1950,2,Monaco Grand Prix,monaco,Circuit de Monaco,1950-05-21,,,,,,,http://en.wikipedia.org/wiki/1950_Monaco_Grand...
2,1950,3,Indianapolis 500,indianapolis,Indianapolis Motor Speedway,1950-05-30,,,,,,,http://en.wikipedia.org/wiki/1950_Indianapolis...
3,1950,4,Swiss Grand Prix,bremgarten,Circuit Bremgarten,1950-06-04,,,,,,,http://en.wikipedia.org/wiki/1950_Swiss_Grand_...
4,1950,5,Belgian Grand Prix,spa,Circuit de Spa-Francorchamps,1950-06-18,,,,,,,http://en.wikipedia.org/wiki/1950_Belgian_Gran...


In [28]:
circuits.head()

Unnamed: 0,circuitId,circuitName,lat,long,locality,country,url
0,adelaide,Adelaide Street Circuit,-34.9272,138.617,Adelaide,Australia,http://en.wikipedia.org/wiki/Adelaide_Street_C...
1,ain-diab,Ain Diab,33.5786,-7.6875,Casablanca,Morocco,http://en.wikipedia.org/wiki/Ain-Diab_Circuit
2,aintree,Aintree,53.4769,-2.94056,Liverpool,UK,http://en.wikipedia.org/wiki/Aintree_Motor_Rac...
3,albert_park,Albert Park Grand Prix Circuit,-37.8497,144.968,Melbourne,Australia,http://en.wikipedia.org/wiki/Melbourne_Grand_P...
4,americas,Circuit of the Americas,30.1328,-97.6411,Austin,USA,http://en.wikipedia.org/wiki/Circuit_of_the_Am...
