# 5. F1 Prediction Project- Feature Engineering

# Table of Contents
- [Introduction to Feature Engineering](#introduction-to-feature-engineering)
- [Data Preprocessing](#data-preprocessing)
  - [Importing Libraries and Notebook Setup](#importing-libraries-and-notebook-setup)
- [Changes to the dataset for Feature Engineering](#changes-to-the-dataset-for-feature-engineering)
- [One-Hot Encoding](#one-hot-encoding)
- [Null Values](#null-values)
- [Drop columns](#drop-columns)
- [Conclusion](#conclusion)


### Introduction to Feature Engineering

In this part of the project, we will start looking at Feature Engineering to see if we can improve the performance of our model. For example, in this section, we will try and decompose the 'Status' of race outcomes (e.g. what was outcome of previous races for drivers/rows of data) into categories like 'Mechanical Issues' and 'Driver Issues'. This will help  give the model a clearer picture of team reliability and driver consistency. We will also incorporate 'Weather Conditions', recognizing its profound impact on strategy and performance. This was meticulously compiled from both a dedicated Formula 1 dataset and detailed race reports.

Moreover, 'Circuit Characteristics' will be added to account for the diverse challenges posed by different tracks, whether they favor high-speed performance or the technical prowess suited to street circuits. We've collated data on 'Recent Form' to catch the momentum of drivers and teams, understanding that past performance can be a harbinger of future results.
The unsung heroes in the pit lane have not been overlooked; 'Team Strategy and Pit Crew Performance' data will be reintegrated to spotlight their influence on race outcomes. 

Through these enhancements in feature engineering, we've set the stage for a model that doesn't just process data but interprets the pulse of the race, giving us unprecedented insights into what makes a champion.

## Data Preprocessing

### Importing Libraries and Notebook Setup

In [2]:
# Install libraries
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

In [3]:
process_df = pd.read_csv('C:/Users/Alex/OneDrive/BrainStation/Data_Science_Bootcamp/Capstone_Project/capstone-Aboard89/data/f1_data_capstone_v2.csv')

# Changes to the dataset for Feature Engineering

It's important to note that this data looks different from "model_data.csv", which was used for our first modelling notebook. A lot of these data changes were made in Excel and getting data from different sources, so I'll talk you through the changes.

The additional columns in the "f1_data_capstone_v2" dataset, compared to "model_data," have been introduced to enhance the robustness and accuracy of our F1 race winner prediction model. Here's why each one is important:

- **Reindex**: This is a technical column likely used for data management, ensuring alignment and consistency in the dataset, when pulling data from multiple Excel sheets.
- **Avg_time_in_pits**: Reflects the average time a driver spends in the pits, which can significantly influence race outcomes due to strategic pit stops. This data was scraped from F1.com, for example you can see what a Pit Stop Summary page looks like here : https://www.formula1.com/en/results.html/2024/races/1229/bahrain/pit-stop-summary.html
- **Total_time_in_pits**: Sum of all time spent in pits, which, like average time, can influence the race outcome. This data was scraped from F1.com, for example you can see what a Pit Stop Summary page looks like here : https://www.formula1.com/en/results.html/2024/races/1229/bahrain/pit-stop-summary.html
- **Number_Of_Stops**: The number of pit stops can impact race strategy and thus the final position. This data was scraped from F1.com, for example you can see what a Pit Stop Summary page looks like here : https://www.formula1.com/en/results.html/2024/races/1229/bahrain/pit-stop-summary.html
- **Circuit_Type**: Provides context on the type of circuit, whether it's a street, road, or oval track, as certain drivers or teams may perform better on specific track types. The data here was a mixture of using wikipedia, my own knowledge about tracks and Gemini/Bing Copilot/ChatGPT (to verify the results).
- **Driver_Issue**: Captures any problems specific to the driver, such as penalties or errors, which could affect their chances of winning. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **Grand_Prix**: The name of the Grand Prix is crucial because historical data can show trends and patterns for particular races. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **Lapped**: Indicates whether a driver was lapped during a race, as being lapped typically correlates with lower chances of winning. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **Mechanical**: Details about mechanical failures inform us of the reliability issues that could preclude a driver from finishing, let alone winning, a race. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **Weather_Conditions**: Weather has a profound impact on race dynamics, affecting tire choice and driving conditions. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **age**: The driver's age could influence performance and experience, affecting race results. This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)
- **races_with_each_team_since_1995**: Indicates experience with the current team, which could relate to driver and team synergy. This data was created in Excel. It started with each driver in the first race with their team starting at 1. For every race they would drive with the team, this would add 1 to the total, until they moved to another team or were no longer in F1. 
- **years_in_f1**: Reflects the driver's experience level in F1, which can be a factor in their ability to win races. This data was gathered from wikipedia (e.g. their F1 debut year). Then it would simply be the year column (which indicates the year of the race) - the year they started in F1. There are some drivers (for example, Kevin Magnussed lost his place at Haas at the end of the 2020 season, but returned in the 2022 season), where some manipulations had to be made
- **driverId**, which is a unique identifier for each driver - This data was taken from the original pulls of data from Ergast F1 API (via Kaggle - https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2023)

By including these columns, we aim to capture a comprehensive set of factors that influence a race's outcome, providing a nuanced view that goes beyond basic statistics to model the complexities of F1 racing.

In [4]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10921 entries, 0 to 10920
Columns: 137 entries, Reindex to Avg_time_in_pits
dtypes: float64(7), int64(127), object(3)
memory usage: 11.4+ MB


In [5]:
process_df.set_index('Reindex', inplace=True)

In [6]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,race_index,Grand_Prix,year,age,years_in_f1,races_with_each_team_since_1995,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,race_70th Anniversary Grand Prix,race_Abu Dhabi Grand Prix,race_Argentine Grand Prix,race_Australian Grand Prix,race_Austrian Grand Prix,race_Azerbaijan Grand Prix,race_Bahrain Grand Prix,race_Belgian Grand Prix,race_Brazilian Grand Prix,race_British Grand Prix,race_Canadian Grand Prix,race_Chinese Grand Prix,race_Dutch Grand Prix,race_Eifel Grand Prix,race_Emilia Romagna Grand Prix,race_European Grand Prix,race_French Grand Prix,race_German Grand Prix,race_Hungarian Grand Prix,race_Indian Grand Prix,race_Italian Grand Prix,race_Japanese Grand Prix,race_Korean Grand Prix,race_Luxembourg Grand Prix,race_Malaysian Grand Prix,race_Mexican Grand Prix,race_Mexico City Grand Prix,race_Miami Grand Prix,race_Monaco Grand Prix,race_Pacific Grand Prix,race_Portuguese Grand Prix,race_Qatar Grand Prix,race_Russian Grand Prix,race_Sakhir Grand Prix,race_San Marino Grand Prix,race_Saudi Arabian Grand Prix,race_Singapore Grand Prix,race_Spanish Grand Prix,race_Styrian Grand Prix,race_SÃ£o Paulo Grand Prix,race_Turkish Grand Prix,race_Tuscan Grand Prix,race_United States Grand Prix,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Weather_Conditions,Circuit_Type,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits
Reindex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1
2199517,2,Argentine Grand Prix,1995,29,4,1,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,1,0,1.0,,,
219955,2,Argentine Grand Prix,1995,26,4,1,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,1,1.0,,,
219951,2,Argentine Grand Prix,1995,24,1,1,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,1,0,1.0,,,
219952,2,Argentine Grand Prix,1995,34,3,1,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,0,0.0,3.0,88.958,29.652667
219956,2,Argentine Grand Prix,1995,30,6,1,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,0,0.0,2.0,56.85,28.425


### One-Hot Encoding

In [7]:
# One-hot encode "Weather_Conditions" column
process_df = pd.get_dummies(process_df, columns=['Weather_Conditions'], drop_first=True)


In [8]:
# One-hot encode "Circuit_Type" column
process_df = pd.get_dummies(process_df, columns=['Circuit_Type'], drop_first=True)

In [9]:
# Ensure 'constructorId' is of type object
process_df['constructorId'] = process_df['constructorId'].astype(str)

# One-hot encode "constructorId" column
process_df = pd.get_dummies(process_df, columns=['constructorId'], drop_first=True)

In [10]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,race_index,Grand_Prix,year,age,years_in_f1,races_with_each_team_since_1995,F2_champion,Former_F1_World_Champion,home_race,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,race_70th Anniversary Grand Prix,race_Abu Dhabi Grand Prix,race_Argentine Grand Prix,race_Australian Grand Prix,race_Austrian Grand Prix,race_Azerbaijan Grand Prix,race_Bahrain Grand Prix,race_Belgian Grand Prix,race_Brazilian Grand Prix,race_British Grand Prix,race_Canadian Grand Prix,race_Chinese Grand Prix,race_Dutch Grand Prix,race_Eifel Grand Prix,race_Emilia Romagna Grand Prix,race_European Grand Prix,race_French Grand Prix,race_German Grand Prix,race_Hungarian Grand Prix,race_Indian Grand Prix,race_Italian Grand Prix,race_Japanese Grand Prix,race_Korean Grand Prix,race_Luxembourg Grand Prix,race_Malaysian Grand Prix,race_Mexican Grand Prix,race_Mexico City Grand Prix,race_Miami Grand Prix,race_Monaco Grand Prix,race_Pacific Grand Prix,race_Portuguese Grand Prix,race_Qatar Grand Prix,race_Russian Grand Prix,race_Sakhir Grand Prix,race_San Marino Grand Prix,race_Saudi Arabian Grand Prix,race_Singapore Grand Prix,race_Spanish Grand Prix,race_Styrian Grand Prix,race_SÃ£o Paulo Grand Prix,race_Turkish Grand Prix,race_Tuscan Grand Prix,race_United States Grand Prix,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit,constructorId_10,constructorId_11,constructorId_117,constructorId_12,constructorId_13,constructorId_131,constructorId_14,constructorId_15,constructorId_16,constructorId_164,constructorId_166,constructorId_17,constructorId_18,constructorId_19,constructorId_2,constructorId_20,constructorId_205,constructorId_206,constructorId_207,constructorId_208,constructorId_209,constructorId_21,constructorId_210,constructorId_211,constructorId_213,constructorId_214,constructorId_22,constructorId_23,constructorId_24,constructorId_25,constructorId_26,constructorId_27,constructorId_28,constructorId_29,constructorId_3,constructorId_30,constructorId_31,constructorId_4,constructorId_5,constructorId_51,constructorId_6,constructorId_7,constructorId_8,constructorId_9
Reindex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1
2199517,2,Argentine Grand Prix,1995,29,4,1,0,0,0,17,1.0,70,0,4.0,1.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219955,2,Argentine Grand Prix,1995,26,4,1,0,0,0,5,3.0,70,0,4.0,3.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,,,,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219951,2,Argentine Grand Prix,1995,24,1,1,0,0,0,1,6.0,71,0,16.0,6.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219952,2,Argentine Grand Prix,1995,34,3,1,0,1,0,2,0.0,30,1,16.0,10.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219956,2,Argentine Grand Prix,1995,30,6,1,0,0,0,6,2.0,70,0,13.0,8.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False


In [11]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10921 entries, 2199517 to 527202314
Columns: 183 entries, race_index to constructorId_9
dtypes: bool(50), float64(7), int64(125), object(1)
memory usage: 11.7+ MB


In [12]:
# Get columns that start with 'Weather_Conditions' or 'Circuit_Type'
columns_to_convert = [col for col in process_df.columns if 'Weather_Conditions' in col or 'Circuit_Type' in col]

# Convert True/False to 1/0 for each column in the list
for column in columns_to_convert:
    process_df[column] = process_df[column].astype(int)

In [13]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,race_index,Grand_Prix,year,age,years_in_f1,races_with_each_team_since_1995,F2_champion,Former_F1_World_Champion,home_race,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,race_70th Anniversary Grand Prix,race_Abu Dhabi Grand Prix,race_Argentine Grand Prix,race_Australian Grand Prix,race_Austrian Grand Prix,race_Azerbaijan Grand Prix,race_Bahrain Grand Prix,race_Belgian Grand Prix,race_Brazilian Grand Prix,race_British Grand Prix,race_Canadian Grand Prix,race_Chinese Grand Prix,race_Dutch Grand Prix,race_Eifel Grand Prix,race_Emilia Romagna Grand Prix,race_European Grand Prix,race_French Grand Prix,race_German Grand Prix,race_Hungarian Grand Prix,race_Indian Grand Prix,race_Italian Grand Prix,race_Japanese Grand Prix,race_Korean Grand Prix,race_Luxembourg Grand Prix,race_Malaysian Grand Prix,race_Mexican Grand Prix,race_Mexico City Grand Prix,race_Miami Grand Prix,race_Monaco Grand Prix,race_Pacific Grand Prix,race_Portuguese Grand Prix,race_Qatar Grand Prix,race_Russian Grand Prix,race_Sakhir Grand Prix,race_San Marino Grand Prix,race_Saudi Arabian Grand Prix,race_Singapore Grand Prix,race_Spanish Grand Prix,race_Styrian Grand Prix,race_SÃ£o Paulo Grand Prix,race_Turkish Grand Prix,race_Tuscan Grand Prix,race_United States Grand Prix,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit,constructorId_10,constructorId_11,constructorId_117,constructorId_12,constructorId_13,constructorId_131,constructorId_14,constructorId_15,constructorId_16,constructorId_164,constructorId_166,constructorId_17,constructorId_18,constructorId_19,constructorId_2,constructorId_20,constructorId_205,constructorId_206,constructorId_207,constructorId_208,constructorId_209,constructorId_21,constructorId_210,constructorId_211,constructorId_213,constructorId_214,constructorId_22,constructorId_23,constructorId_24,constructorId_25,constructorId_26,constructorId_27,constructorId_28,constructorId_29,constructorId_3,constructorId_30,constructorId_31,constructorId_4,constructorId_5,constructorId_51,constructorId_6,constructorId_7,constructorId_8,constructorId_9
Reindex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1
2199517,2,Argentine Grand Prix,1995,29,4,1,0,0,0,17,1.0,70,0,4.0,1.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219955,2,Argentine Grand Prix,1995,26,4,1,0,0,0,5,3.0,70,0,4.0,3.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,,,,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219951,2,Argentine Grand Prix,1995,24,1,1,0,0,0,1,6.0,71,0,16.0,6.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219952,2,Argentine Grand Prix,1995,34,3,1,0,1,0,2,0.0,30,1,16.0,10.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219956,2,Argentine Grand Prix,1995,30,6,1,0,0,0,6,2.0,70,0,13.0,8.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False


In [14]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10921 entries, 2199517 to 527202314
Columns: 183 entries, race_index to constructorId_9
dtypes: bool(44), float64(7), int32(6), int64(125), object(1)
memory usage: 11.9+ MB


### Null Values

In [15]:
# Count null values per column
null_values_per_column = process_df.isnull().sum()

# Display the counts
print(null_values_per_column)

race_index          0
Grand_Prix          0
year                0
age                 0
years_in_f1         0
                   ..
constructorId_51    0
constructorId_6     0
constructorId_7     0
constructorId_8     0
constructorId_9     0
Length: 183, dtype: int64


In [16]:
# Filter and show only columns that have null values
null_values_per_column = null_values_per_column[null_values_per_column > 0]

# Display the filtered counts
print(null_values_per_column)

Lapped                 280
Number_Of_Stops       1597
Total_time_in_pits    1597
Avg_time_in_pits      1597
dtype: int64


In [17]:
#Impute with the mean values

# List of columns with missing values to impute with mean
columns_to_impute = ['Lapped', 'Number_Of_Stops', 'Total_time_in_pits', 'Avg_time_in_pits']

# Loop through each column and impute missing values with mean of that column
for column in columns_to_impute:
    process_df[column] = process_df[column].fillna(process_df[column].mean())


In our data preprocessing for the F1 race winner prediction project, we’ve chosen to impute missing values using the mean for certain columns because the mean provides a reasonable estimate that preserves the overall distribution of the data. For instance, columns like 'Number_Of_Stops', 'Total_time_in_pits', and 'Avg_time_in_pits' are critical operational metrics that could have a direct impact on race outcomes. By using the mean, we aim to fill in the gaps with a typical value that reflects the average scenario, minimizing the distortion that missing data might cause in our analysis. 

With more time (and post-bootcamp), I would like to have found better ways of imputing this data (e.g. for missing rows, impute the average data for pitstops for that team and that circuit), however due to time pressure this will have to do for the time being.

In [18]:
# Verify the imputation
print(process_df[columns_to_impute].isnull().sum())

Lapped                0
Number_Of_Stops       0
Total_time_in_pits    0
Avg_time_in_pits      0
dtype: int64


In [19]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,race_index,Grand_Prix,year,age,years_in_f1,races_with_each_team_since_1995,F2_champion,Former_F1_World_Champion,home_race,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,race_70th Anniversary Grand Prix,race_Abu Dhabi Grand Prix,race_Argentine Grand Prix,race_Australian Grand Prix,race_Austrian Grand Prix,race_Azerbaijan Grand Prix,race_Bahrain Grand Prix,race_Belgian Grand Prix,race_Brazilian Grand Prix,race_British Grand Prix,race_Canadian Grand Prix,race_Chinese Grand Prix,race_Dutch Grand Prix,race_Eifel Grand Prix,race_Emilia Romagna Grand Prix,race_European Grand Prix,race_French Grand Prix,race_German Grand Prix,race_Hungarian Grand Prix,race_Indian Grand Prix,race_Italian Grand Prix,race_Japanese Grand Prix,race_Korean Grand Prix,race_Luxembourg Grand Prix,race_Malaysian Grand Prix,race_Mexican Grand Prix,race_Mexico City Grand Prix,race_Miami Grand Prix,race_Monaco Grand Prix,race_Pacific Grand Prix,race_Portuguese Grand Prix,race_Qatar Grand Prix,race_Russian Grand Prix,race_Sakhir Grand Prix,race_San Marino Grand Prix,race_Saudi Arabian Grand Prix,race_Singapore Grand Prix,race_Spanish Grand Prix,race_Styrian Grand Prix,race_SÃ£o Paulo Grand Prix,race_Turkish Grand Prix,race_Tuscan Grand Prix,race_United States Grand Prix,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit,constructorId_10,constructorId_11,constructorId_117,constructorId_12,constructorId_13,constructorId_131,constructorId_14,constructorId_15,constructorId_16,constructorId_164,constructorId_166,constructorId_17,constructorId_18,constructorId_19,constructorId_2,constructorId_20,constructorId_205,constructorId_206,constructorId_207,constructorId_208,constructorId_209,constructorId_21,constructorId_210,constructorId_211,constructorId_213,constructorId_214,constructorId_22,constructorId_23,constructorId_24,constructorId_25,constructorId_26,constructorId_27,constructorId_28,constructorId_29,constructorId_3,constructorId_30,constructorId_31,constructorId_4,constructorId_5,constructorId_51,constructorId_6,constructorId_7,constructorId_8,constructorId_9
Reindex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1
2199517,2,Argentine Grand Prix,1995,29,4,1,0,0,0,17,1.0,70,0,4.0,1.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219955,2,Argentine Grand Prix,1995,26,4,1,0,0,0,5,3.0,70,0,4.0,3.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219951,2,Argentine Grand Prix,1995,24,1,1,0,0,0,1,6.0,71,0,16.0,6.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219952,2,Argentine Grand Prix,1995,34,3,1,0,1,0,2,0.0,30,1,16.0,10.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219956,2,Argentine Grand Prix,1995,30,6,1,0,0,0,6,2.0,70,0,13.0,8.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False


### Drop columns

In [20]:
# Drop 'race_index' and 'Grand_Prix' columns from the DataFrame
process_df = process_df.drop(columns=['race_index', 'Grand_Prix'])

# Verify the columns are dropped
print(process_df.columns)

Index(['year', 'age', 'years_in_f1', 'races_with_each_team_since_1995',
       'F2_champion', 'Former_F1_World_Champion', 'home_race',
       'starting_grid_position', 'points_in_previous_race',
       'laps_in_previous_race',
       ...
       'constructorId_3', 'constructorId_30', 'constructorId_31',
       'constructorId_4', 'constructorId_5', 'constructorId_51',
       'constructorId_6', 'constructorId_7', 'constructorId_8',
       'constructorId_9'],
      dtype='object', length=181)


In [21]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,year,age,years_in_f1,races_with_each_team_since_1995,F2_champion,Former_F1_World_Champion,home_race,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,race_70th Anniversary Grand Prix,race_Abu Dhabi Grand Prix,race_Argentine Grand Prix,race_Australian Grand Prix,race_Austrian Grand Prix,race_Azerbaijan Grand Prix,race_Bahrain Grand Prix,race_Belgian Grand Prix,race_Brazilian Grand Prix,race_British Grand Prix,race_Canadian Grand Prix,race_Chinese Grand Prix,race_Dutch Grand Prix,race_Eifel Grand Prix,race_Emilia Romagna Grand Prix,race_European Grand Prix,race_French Grand Prix,race_German Grand Prix,race_Hungarian Grand Prix,race_Indian Grand Prix,race_Italian Grand Prix,race_Japanese Grand Prix,race_Korean Grand Prix,race_Luxembourg Grand Prix,race_Malaysian Grand Prix,race_Mexican Grand Prix,race_Mexico City Grand Prix,race_Miami Grand Prix,race_Monaco Grand Prix,race_Pacific Grand Prix,race_Portuguese Grand Prix,race_Qatar Grand Prix,race_Russian Grand Prix,race_Sakhir Grand Prix,race_San Marino Grand Prix,race_Saudi Arabian Grand Prix,race_Singapore Grand Prix,race_Spanish Grand Prix,race_Styrian Grand Prix,race_SÃ£o Paulo Grand Prix,race_Turkish Grand Prix,race_Tuscan Grand Prix,race_United States Grand Prix,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit,constructorId_10,constructorId_11,constructorId_117,constructorId_12,constructorId_13,constructorId_131,constructorId_14,constructorId_15,constructorId_16,constructorId_164,constructorId_166,constructorId_17,constructorId_18,constructorId_19,constructorId_2,constructorId_20,constructorId_205,constructorId_206,constructorId_207,constructorId_208,constructorId_209,constructorId_21,constructorId_210,constructorId_211,constructorId_213,constructorId_214,constructorId_22,constructorId_23,constructorId_24,constructorId_25,constructorId_26,constructorId_27,constructorId_28,constructorId_29,constructorId_3,constructorId_30,constructorId_31,constructorId_4,constructorId_5,constructorId_51,constructorId_6,constructorId_7,constructorId_8,constructorId_9
Reindex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1
2199517,1995,29,4,1,0,0,0,17,1.0,70,0,4.0,1.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219955,1995,26,4,1,0,0,0,5,3.0,70,0,4.0,3.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
219951,1995,24,1,1,0,0,0,1,6.0,71,0,16.0,6.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219952,1995,34,3,1,0,1,0,2,0.0,30,1,16.0,10.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False
219956,1995,30,6,1,0,0,0,6,2.0,70,0,13.0,8.0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False


### Send to CSV

In [22]:
process_df.to_csv('model_data_v2.csv', index=True)

# Conclusion

In this notebook, we've elevated our F1 race winner prediction model through meticulous feature engineering. By decomposing the 'Status' column into more descriptive factors like 'Mechanical Issues' and 'Driver Issues,' we’ve provided our model with clearer indicators of team reliability and driver consistency. We've also integrated 'Weather Conditions' and 'Circuit Characteristics' to capture the variables affecting race strategy and performance.

Further, we addressed the challenge of missing data by imputing null values using column means to maintain the integrity of our dataset without compromising the overall distribution of critical metrics. We enhanced our feature set by encoding categorical variables through one-hot encoding and prepped our dataset for advanced modeling techniques by dropping non-essential columns. With these improvements, we're setting a solid groundwork for a model that not only processes but also interprets the nuances of race dynamics, edging us closer to accurately forecasting F1 race winners.

NB - although this data brings in new features, a lot of the data processing and collection for this notebook was done in excel and via various internet sources. One of the weaknesses of this study in its current form, is that there wasn't an easy and systematic way of pulling this data. Post-bootcamp, I want to revisit this section to help document how this data was collected (step-by-step) and how to automate this data collection process. Due to time pressure with the bootcamp, I couldn't get this done before the submission data, but the csv is on this github, so people can feel free to check/verify the information at their own leisure.