# Constructors Old and New

### Table of Contents
* [Introduction](#Intro)
* [Importing the Data](#Data)
* [Cleaning Up and Reformatting the Data](#Clean)
* [Constructor Results by Year](#create)
    * [Exporting Constructor Results](#create_1)
* [New Constructors Since 2000](#New)
    * [Exporting New Constructors](#create_1)
* [Conclusion](#Conc)

### Introduction <a class="anchor" id="Intro"></a>

In this notebook we will explore the data on Fastest Laps and Qualifying pole times for three tracks Silverstone, Monza, and Monaco.

These tracks were chosen because they have seen almost continuous racing year on year since Formula 1's inception in 1950.

Questions we are looking to answer include:
- How has the pace of the cars evolved?
- Have they evolved at the same pace for each track?

### Importing the Data <a class="anchor" id="Data"></a>
First up we will load in the required libraries and data

In [53]:
#Import Librabies
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [54]:
## Loading in the datasets

#Kaggle Data
circuits = pd.read_csv("kaggle_data//circuits.csv")
constructor_results = pd.read_csv("kaggle_data//constructor_results.csv")
constructor_standings = pd.read_csv("kaggle_data//constructor_standings.csv")
constructors = pd.read_csv("kaggle_data//constructors.csv")
driver_standings = pd.read_csv("kaggle_data//driver_standings.csv")
drivers = pd.read_csv("kaggle_data//drivers.csv")
lap_times = pd.read_csv("kaggle_data//lap_times.csv")
qualifying = pd.read_csv("kaggle_data//qualifying.csv")
results = pd.read_csv("kaggle_data//results.csv")
races = pd.read_csv("kaggle_data//races.csv")

#gp racing stats Data
silverstone_poles_data = pd.read_csv("gpracingstats_data//silverstone_poles_data.csv")
monza_poles_data = pd.read_csv("gpracingstats_data//monza_poles_data.csv")
monaco_poles_data = pd.read_csv("gpracingstats_data//monaco_poles_data.csv")
monza_fastest_lap_data = pd.read_csv("gpracingstats_data//monza_fastest_lap_data.csv")
silverstone_fastest_lap_data = pd.read_csv("gpracingstats_data//silverstone_fastest_lap_data.csv")
monaco_fastest_lap_data = pd.read_csv("gpracingstats_data//monaco_fastest_lap_data.csv")

## Cleaning Up and Reformatting the Constructors Data <a class="anchor" id="Clean"></a>

We have 3 tables to consider on the constructors front:
- constructors: Which lists the constructors
- constructor_results: Which holds the results for each constructor for each race
- constructor standings: Which has the standings in the constructors table after each race

Let's first take a look at the heads of the dataframes

In [55]:
constructor_results.head()

Unnamed: 0,constructorResultsId,raceId,constructorId,points,status
0,1,18,1,14.0,
1,2,18,2,8.0,
2,3,18,3,9.0,
3,4,18,4,5.0,
4,5,18,5,2.0,


In [56]:
constructor_standings.head()

Unnamed: 0,constructorStandingsId,raceId,constructorId,points,position,positionText,wins
0,1,18,1,14.0,1,1,1
1,2,18,2,8.0,3,3,0
2,3,18,3,9.0,2,2,0
3,4,18,4,5.0,4,4,0
4,5,18,5,2.0,5,5,0


In [57]:
constructors.head()

Unnamed: 0,constructorId,constructorRef,name,nationality,url
0,1,mclaren,McLaren,British,http://en.wikipedia.org/wiki/McLaren
1,2,bmw_sauber,BMW Sauber,German,http://en.wikipedia.org/wiki/BMW_Sauber
2,3,williams,Williams,British,http://en.wikipedia.org/wiki/Williams_Grand_Pr...
3,4,renault,Renault,French,http://en.wikipedia.org/wiki/Renault_in_Formul...
4,5,toro_rosso,Toro Rosso,Italian,http://en.wikipedia.org/wiki/Scuderia_Toro_Rosso


What we want to do first is merge the results and standings with the constructor names since it is absent in the initial table.

In [58]:
#Rename name in constructors to constructor name to remove ambigouity
constructors = constructors.rename(columns = {"name": "constructor_name"})

#Adding constructor name and reference to constructor results and standings
constructor_results = constructor_results.merge(constructors[["constructorId", "constructorRef", "constructor_name"]], on = "constructorId", how="left")
constructor_standings = constructor_standings.merge(constructors[["constructorId", "constructorRef", "constructor_name"]], on = "constructorId", how="left")

#Remove unneeded columns
constructor_results.drop(columns = ["status"], inplace=True)
constructor_standings.drop(columns = ["positionText"], inplace=True)

In [59]:
constructor_results.head()

Unnamed: 0,constructorResultsId,raceId,constructorId,points,constructorRef,constructor_name
0,1,18,1,14.0,mclaren,McLaren
1,2,18,2,8.0,bmw_sauber,BMW Sauber
2,3,18,3,9.0,williams,Williams
3,4,18,4,5.0,renault,Renault
4,5,18,5,2.0,toro_rosso,Toro Rosso


In [60]:
constructor_standings.head()

Unnamed: 0,constructorStandingsId,raceId,constructorId,points,position,wins,constructorRef,constructor_name
0,1,18,1,14.0,1,1,mclaren,McLaren
1,2,18,2,8.0,3,0,bmw_sauber,BMW Sauber
2,3,18,3,9.0,2,0,williams,Williams
3,4,18,4,5.0,4,0,renault,Renault
4,5,18,5,2.0,5,0,toro_rosso,Toro Rosso


Next we want to add in the race information to constructor results and standings.

In [61]:
#Rename column name to track name in races
races = races.rename(columns = {"name": "track_name"})

#Combining with the columns from the races table we want
constructor_results = constructor_results.merge(races[["raceId", "year", "circuitId", "track_name", "date"]], on = "raceId", how = "left")
constructor_standings = constructor_standings.merge(races[["raceId", "year", "circuitId", "track_name", "date"]], on = "raceId", how = "left")

Next we check the data types for constructor results and standings, we will find the dates are listed as objects. So we change the data type to datetime for both.

In [62]:
constructor_results.dtypes

constructorResultsId      int64
raceId                    int64
constructorId             int64
points                  float64
constructorRef           object
constructor_name         object
year                      int64
circuitId                 int64
track_name               object
date                     object
dtype: object

In [63]:
constructor_standings.dtypes

constructorStandingsId      int64
raceId                      int64
constructorId               int64
points                    float64
position                    int64
wins                        int64
constructorRef             object
constructor_name           object
year                        int64
circuitId                   int64
track_name                 object
date                       object
dtype: object

In [64]:
constructor_results.date = constructor_results.date.astype("datetime64")
constructor_standings.date = constructor_standings.date.astype("datetime64")

Following this, we order the tables by date, drop unwanted columns and reorder the columns into a more sensible order.

In [65]:
#Order by date
constructor_results.sort_values(by=["date"],inplace=True)
constructor_results=constructor_results.reset_index()

#Drop unwated columns
constructor_results = constructor_results.drop(columns = ["index", "constructorResultsId" ])

#Rearrange columns
constructor_results = constructor_results[["date", "track_name", "circuitId" ,"year", "constructorId","constructorRef", "constructor_name", "raceId", "points"]]

constructor_results.head()

Unnamed: 0,date,track_name,circuitId,year,constructorId,constructorRef,constructor_name,raceId,points
0,1956-01-22,Argentine Grand Prix,25,1956,105,maserati,Maserati,784,13.0
1,1956-01-22,Argentine Grand Prix,25,1956,6,ferrari,Ferrari,784,12.0
2,1956-08-05,German Grand Prix,20,1956,128,gordini,Gordini,790,0.0
3,1956-08-05,German Grand Prix,20,1956,105,maserati,Maserati,790,15.0
4,1956-08-05,German Grand Prix,20,1956,6,ferrari,Ferrari,790,9.0


In [66]:
#Same process again
constructor_standings.sort_values(by=["date"],inplace=True)
constructor_standings=constructor_standings.reset_index()
constructor_standings = constructor_standings.rename(columns = {"points": "total_points"})

constructor_standings = constructor_standings[["date", "track_name", "circuitId" ,"year", "constructorId","constructorRef", "constructor_name", "raceId", "total_points", "position", "wins"]]

constructor_standings.head()

Unnamed: 0,date,track_name,circuitId,year,constructorId,constructorRef,constructor_name,raceId,total_points,position,wins
0,1958-01-19,Argentine Grand Prix,25,1958,105,maserati,Maserati,765,3.0,3,0
1,1958-01-19,Argentine Grand Prix,25,1958,6,ferrari,Ferrari,765,6.0,2,0
2,1958-01-19,Argentine Grand Prix,25,1958,87,cooper,Cooper,765,8.0,1,1
3,1958-05-18,Monaco Grand Prix,6,1958,87,cooper,Cooper,766,16.0,1,2
4,1958-05-18,Monaco Grand Prix,6,1958,32,team_lotus,Team Lotus,766,0.0,5,0


Despite starting in 1950, the Formula 1 constructors championship only ran since 1958. From here on in, all the data only refers to 1958 onwards.

## Creating Constructor Results By Year Table <a class="anchor" id="create"></a>
We want to make a table showing the result at the end of the year for each team. To do this, we need to find the last race of each year, then our standings table will tell us position = final placing, and total points scored.

We will do this by cycling through the years one by one, starting with 1958.

In [67]:
#Finding the row index of the last race in 1958
constructor_standings[constructor_standings.year == 1958].date.idxmax()

83

In [68]:
#Finding the race id of this last race by looking it up in the data frame
constructor_standings.iloc[83][7]

775

In [69]:
#We know the last race in 1958 has a race id of 775, so we filter down to just this race, drop unwanted columns and sort
standings_1958 = constructor_standings[constructor_standings.raceId == 775]
standings_1958 = standings_1958[["year","constructorId", "constructorRef" ,"constructor_name", "total_points","position", "wins"]].reset_index(drop=True)
standings_1958.sort_values(by = ["position"], inplace= True)
standings_1958 = standings_1958.reset_index(drop=True)

In [70]:
#Lets take a look at the standings for 1958
standings_1958

Unnamed: 0,year,constructorId,constructorRef,constructor_name,total_points,position,wins
0,1958,118,vanwall,Vanwall,48.0,1,6
1,1958,6,ferrari,Ferrari,40.0,2,2
2,1958,87,cooper,Cooper,31.0,3,2
3,1958,66,brm,BRM,18.0,4,0
4,1958,105,maserati,Maserati,6.0,5,0
5,1958,32,team_lotus,Team Lotus,3.0,6,0
6,1958,95,porsche,Porsche,0.0,7,0
7,1958,125,connaught,Connaught,0.0,8,0
8,1958,127,osca,OSCA,0.0,9,0


Now we cycle through every year up to 2022, performing the same process and merging on the bottom

In [71]:
#results for up to the last full year as 2022
constructors_results_by_year = standings_1958
for i in range(1959,2023):
    #grab race id for last in season
    last_race_index = constructor_standings[constructor_standings.year == i].date.idxmax()
    raceId = constructor_standings.iloc[last_race_index][7]
    ith_year_df = constructor_standings[constructor_standings.raceId == raceId]
    ith_year_df = ith_year_df[["year","constructorId", "constructorRef" ,"constructor_name", "total_points","position", "wins"]].reset_index(drop=True)
    ith_year_df.sort_values(by = ["position"], inplace= True)
    constructors_results_by_year = pd.concat([constructors_results_by_year,ith_year_df])

constructors_results_by_year

Unnamed: 0,year,constructorId,constructorRef,constructor_name,total_points,position,wins
0,1958,118,vanwall,Vanwall,48.0,1,6
1,1958,6,ferrari,Ferrari,40.0,2,2
2,1958,87,cooper,Cooper,31.0,3,2
3,1958,66,brm,BRM,18.0,4,0
4,1958,105,maserati,Maserati,6.0,5,0
...,...,...,...,...,...,...,...
6,2022,51,alfa,Alfa Romeo,55.0,6,0
0,2022,117,aston_martin,Aston Martin,55.0,7,0
7,2022,210,haas,Haas F1 Team,37.0,8,0
4,2022,213,alphatauri,AlphaTauri,35.0,9,0


In [73]:
#Checking the data types
constructors_results_by_year.dtypes

year                  int64
constructorId         int64
constructorRef       object
constructor_name     object
total_points        float64
position              int64
wins                  int64
dtype: object

### Exporting Constructor Results <a class="anchor" id="create_1"></a>
First we simply export the results table as is to a csv file.

In [75]:
constructors_results_by_year.to_csv("constructors_results_by_year.csv", index = False)

It would also be hand to just have a table of the Constructor winners of each year.

In [76]:
#Creating winners table by selecting only the winners, then dropping unwated columns
constructors_winners = constructors_results_by_year[constructors_results_by_year.position == 1]/
        .drop(columns = ["position", "wins", "total_points"]).reset_index(drop=True)

constructors_winners.head()

Export the winners table.

In [78]:
#Export csv
constructors_winners.to_csv("constructors_winners.csv", index = False)


# New Constructors Since 2000 <a class="anchor" id="New"></a>

We have picked out 9 constructors who have entered the sport since 2000. We will form new tables just containing these teams.

- Red Bull
- Haas
- Force India/Racing Point/Aston Martin*
- Mercedes
- Caterham Racing
- Virgin/Marussia/Manor*
- Toyota
- Super Aguri
- HRT

We have decided to group Force India, Racing Point, Aston Martin under the name "Aston Grouped", and similarly, Virgin/Marussia/Manor under "Manor Grouped".

First we will group the Aston Martin bunch.

In [90]:
#Pull out force india data from our constructor results by year table. then assign new constructor reference and name
force_india = constructors_results_by_year[constructors_results_by_year.constructorRef == "force_india"]
force_india = force_india.assign(constructor_name = "Aston Grouped")
force_india = force_india.assign(constructorRef = "aston_grouped")

force_india.head()

Unnamed: 0,year,constructorId,constructorRef,constructor_name,total_points,position,wins
3,2008,10,aston_grouped,Aston Grouped,0.0,10,0
9,2009,10,aston_grouped,Aston Grouped,13.0,9,0
9,2010,10,aston_grouped,Aston Grouped,68.0,7,0
2,2011,10,aston_grouped,Aston Grouped,69.0,6,0
4,2012,10,aston_grouped,Aston Grouped,109.0,7,0


In [91]:
#We repeat the same process for racing point and aston_martin
racing_point = constructors_results_by_year[constructors_results_by_year.constructorRef == "racing_point"]
racing_point = racing_point.assign(constructor_name = "Aston Grouped")
racing_point = racing_point.assign(constructorRef = "aston_grouped")

aston_martin = constructors_results_by_year[(constructors_results_by_year.constructorRef == "aston_martin") & (constructors_results_by_year.year >= 2000)]
aston_martin = aston_martin.assign(constructor_name = "Aston Grouped")
aston_martin = aston_martin.assign(constructorRef = "aston_grouped")

Now we can concatenate the dataframes into one for the Aston Martin group. We call this new_constructors since it will be the basis for all of our data on our new constructors.

In [95]:
#Concat the frames and create new constructors dataframe
new_constructors = pd.concat([force_india,racing_point,aston_martin ])

From here we simply repeat exactly the same process for the Manor grouping.

In [85]:
#Same Process for Virgin/Marussia/Manor
virgin = constructors_results_by_year[constructors_results_by_year.constructorRef == "virgin"]
virgin = virgin.assign(constructor_name = "Manor Grouped")
virgin = virgin.assign(constructorRef = "manor_grouped")

marussia = constructors_results_by_year[constructors_results_by_year.constructorRef == "marussia"]
marussia = marussia.assign(constructor_name = "Manor Grouped")
marussia = marussia.assign(constructorRef = "manor_grouped")

manor = constructors_results_by_year[constructors_results_by_year.constructorRef == "manor"]
manor = manor.assign(constructor_name = "Manor Grouped")
manor = manor.assign(constructorRef = "manor_grouped")


#Then we add this data onto our new constructors frame
new_constructors = pd.concat([new_constructors, virgin,marussia,manor ])
new_constructors

Unnamed: 0,year,constructorId,constructorRef,constructor_name,total_points,position,wins
3,2008,10,aston_grouped,Aston Grouped,0.0,10,0
9,2009,10,aston_grouped,Aston Grouped,13.0,9,0
9,2010,10,aston_grouped,Aston Grouped,68.0,7,0
2,2011,10,aston_grouped,Aston Grouped,69.0,6,0
4,2012,10,aston_grouped,Aston Grouped,109.0,7,0
5,2013,10,aston_grouped,Aston Grouped,77.0,6,0
6,2014,10,aston_grouped,Aston Grouped,155.0,6,0
8,2015,10,aston_grouped,Aston Grouped,136.0,5,0
5,2016,10,aston_grouped,Aston Grouped,173.0,4,0
4,2017,10,aston_grouped,Aston Grouped,187.0,4,0


For the remaining 7 constructors we'd like to study, we can take a more simple approach to first pull just the data for these teams.

In [86]:
#Creating a new frame called other teams with the data from our 7 remaining constructors
other_teams = constructors_results_by_year[(constructors_results_by_year.constructorRef == "red_bull")
                             | (constructors_results_by_year.constructorRef == "haas")
                             | (constructors_results_by_year.constructorRef == "mercedes")
                             | (constructors_results_by_year.constructorRef == "hrt")
                             | (constructors_results_by_year.constructorRef == "caterham")
                             | (constructors_results_by_year.constructorRef == "toyota")
                             | (constructors_results_by_year.constructorRef == "super_aguri")
                            ]                          
                            

In [96]:
#All we need to do from here is add in out other teams data into the new constructors frame and sort the lot by year
new_constructors = pd.concat([new_constructors,other_teams  ]).reset_index(drop=True)
new_constructors = new_constructors.sort_values("year").reset_index(drop=True)
new_constructors.head()

Unnamed: 0,year,constructorId,constructorRef,constructor_name,total_points,position,wins
0,2002,7,toyota,Toyota,2.0,10,0
1,2003,7,toyota,Toyota,16.0,8,0
2,2004,7,toyota,Toyota,9.0,8,0
3,2005,9,red_bull,Red Bull,34.0,7,0
4,2005,7,toyota,Toyota,88.0,4,0


### Export New Constructors <a class="anchor" id="New_1"></a>
Finally we export the new constructors data

In [88]:
#Export new_constructors data
new_constructors.to_csv("new_constructors.csv", index = False)

## Conclusion <a class="anchor" id="Conc"></a>

In conclusion, one clear highligh of the data is apparent: Formula 1 saw a sizable increase in speeds and performance in the formative years from 1950 to the 1980's. Since the safety changes to the cars and circuits have surely played a large factor in keeping speeds within a tight margin.

If cars continued to get ever quicker the risk to drivers and spectators would only increase, and via technical changes these speeds can be kept at a level deemed reasonably safe.