# Decision Modeling Mini Sprint 
This is a notebook to document the Decision Modeling Mini Sprint from IBM Skills Build. It is a 3-week sprint where we will analyze Formula 1 race data and build a decision model. To get started running this notebook, You can install the requirements by following the instructions in the [README.md](../README.md) file in the root of this repository.

## Problem Statement
We have been tasked with analyzing the data to make predictions on which constructor Will win a race given on data from previous years. We will build a decision model to predict the winner.
## Hypothesis
We hypothesize that the constructor with the fastest lap times on the circuit provided will win.

In [1]:
data_path = '../data/decision-modeling-sprint/'

In [None]:
zip_file = 'data.zip'
!unzip -o {data_path + zip_file} -d {data_path}

In [2]:
# Here we import all our libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Data Cleaning and Preprocessing

In [3]:
drivers = pd.read_csv(data_path + 'drivers.csv').replace('\\N', np.nan)
constructors = pd.read_csv(data_path + 'constructors.csv').replace('\\N', np.nan)
races = pd.read_csv(data_path + 'races.csv').replace('\\N', np.nan)
status = pd.read_csv(data_path + 'status.csv').replace('\\N', np.nan)
circuits = pd.read_csv(data_path + 'circuits.csv').replace('\\N', np.nan)

standings = pd.read_csv(data_path + 'driver_standings.csv').replace('\\N', np.nan)
constructor_standings = pd.read_csv(data_path + 'constructor_standings.csv').replace('\\N', np.nan)
qualifying = pd.read_csv(data_path + 'qualifying.csv').replace('\\N', np.nan)
results = pd.read_csv(data_path + 'results.csv').replace('\\N', np.nan)
sprint_results = pd.read_csv(data_path + 'sprint_results.csv').replace('\\N', np.nan)
constructor_results = pd.read_csv(data_path + 'constructor_results.csv').replace('\\N', np.nan)
lap_times = pd.read_csv(data_path + 'lap_times.csv').replace('\\N', np.nan)
pit_stops = pd.read_csv(data_path + 'pit_stops.csv').replace('\\N', np.nan)

In [4]:
constructors.info()
constructors.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 211 entries, 0 to 210
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   constructorId   211 non-null    int64 
 1   constructorRef  211 non-null    object
 2   name            211 non-null    object
 3   nationality     211 non-null    object
 4   url             211 non-null    object
dtypes: int64(1), object(4)
memory usage: 8.4+ KB


Unnamed: 0,constructorId,constructorRef,name,nationality,url
0,1,mclaren,McLaren,British,http://en.wikipedia.org/wiki/McLaren
1,2,bmw_sauber,BMW Sauber,German,http://en.wikipedia.org/wiki/BMW_Sauber
2,3,williams,Williams,British,http://en.wikipedia.org/wiki/Williams_Grand_Pr...
3,4,renault,Renault,French,http://en.wikipedia.org/wiki/Renault_in_Formul...
4,5,toro_rosso,Toro Rosso,Italian,http://en.wikipedia.org/wiki/Scuderia_Toro_Rosso


In [5]:
constructor_standings.info()
constructor_standings.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13051 entries, 0 to 13050
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   constructorStandingsId  13051 non-null  int64  
 1   raceId                  13051 non-null  int64  
 2   constructorId           13051 non-null  int64  
 3   points                  13051 non-null  float64
 4   position                13051 non-null  int64  
 5   positionText            13051 non-null  object 
 6   wins                    13051 non-null  int64  
dtypes: float64(1), int64(5), object(1)
memory usage: 713.9+ KB


Unnamed: 0,constructorStandingsId,raceId,constructorId,points,position,positionText,wins
0,1,18,1,14.0,1,1,1
1,2,18,2,8.0,3,3,0
2,3,18,3,9.0,2,2,0
3,4,18,4,5.0,4,4,0
4,5,18,5,2.0,5,5,0


In [8]:
constructor_standings = constructor_standings.merge(constructors[['name']], on='constructorId')
constructor_standings.drop(columns=['constructorId'], inplace=True)
constructor_standings.head()

KeyError: 'constructorId'

In [None]:
drivers.info()
drivers.head()

In [None]:
# convert date of birth to age
drivers['dob'] = pd.to_datetime(drivers['dob'])
drivers['age'] = (pd.to_datetime('today') - drivers['dob']).dt.days // 365
drivers.drop(columns=['dob', 'url'], inplace=True)
drivers.head()

In [None]:
standings.info()
standings.head()

In [None]:
# convert driver id to driver ref and drop driver id
standings = standings.merge(drivers[['driverId', 'driverRef']], on='driverId', how='left')
standings.drop(columns=['driverId', 'positionText'], inplace=True)
# sort by wins, points, position, and driverRef
standings = standings.sort_values(by=['wins', 'points'], ascending=False).reset_index(drop=True)
standings = standings.groupby(['raceId', 'driverRef']).first()
standings.head()

In [None]:
results.replace({'statusId': status.set_index('statusId')['status'].to_dict()}, inplace=True)
sprint_results.replace({'statusId': status.set_index('statusId')['status'].to_dict()}, inplace=True)
results.rename(columns={'statusId': 'status'}, inplace=True)
sprint_results.rename(columns={'statusId': 'status'}, inplace=True)
display(results.head(10), sprint_results.head(10))

In [None]:
driverId, raceId = pit_stops['driverId'], pit_stops['raceId']
# add driver race result positions to pit stops
pit_stops = pit_stops.merge(results[['raceId', 'driverId', 'positionOrder']], on=['raceId', 'driverId'])
pit_stops = pit_stops.rename(columns={'positionOrder': 'driverPosition'})
pit_stops = pit_stops.groupby(['raceId', 'driverId']).first().reset_index()
pit_stops.head()

In [None]:
lap_times = lap_times.merge(results[['raceId', 'driverId', 'positionOrder']], on=['raceId', 'driverId'])
lap_times = lap_times.rename(columns={'positionOrder': 'driverPosition'})
lap_times = lap_times.groupby('raceId').first().reset_index()
lap_times.head()

# Data Analysis
* Fastest lap times
* Fastest pit
* Least wins?
* Standings
* Top performing racers or constructors?