# F1 Win Prediction Project
#### Alex Boardman - BrainStation

### Introduction to Feature Engineering

In the high-octane world of Formula 1 racing, where fractions of a second can be the difference between victory and defeat, the art and science of predictive modeling take on a thrilling edge. At the heart of this endeavor is Feature Engineering, a critical process where we carefully select and transform raw data into informative inputs (features) that our predictive models can understand and utilize. This process is not just helpful but essential in tailoring our data to reflect the nuances of the sport, allowing us to capture the complexities of race dynamics that influence a driver's likelihood of winning.

In our quest to predict F1 race winners, we've refined our features to reflect the multifaceted nature of the sport. By decomposing the 'Status' of race outcomes into categories like 'Mechanical Issues' and 'Driver Issues', we aim to give the model a clearer picture of team reliability and driver consistency. We've also incorporated 'Weather Conditions', recognizing its profound impact on strategy and performance. This was meticulously compiled from both a dedicated Formula 1 dataset and detailed race reports.

Moreover, 'Circuit Characteristics' were evaluated to account for the diverse challenges posed by different tracks, whether they favor high-speed performance or the technical prowess suited to street circuits. We've collated data on 'Recent Form' to catch the momentum of drivers and teams, understanding that past performance can be a harbinger of future results.

The unsung heroes in the pit lane have not been overlooked; 'Team Strategy and Pit Crew Performance' data has been reintegrated to spotlight their influence on race outcomes. And lastly, the 'Engine and Tyre Performance' feature draws on historical data to estimate the durability of these critical components, which often decide the fate of a race.

Through these enhancements in feature engineering, we've set the stage for a model that doesn't just process data but interprets the pulse of the race, giving us unprecedented insights into what makes a champion.

## Data Preprocessing

### Importing Libraries and Notebook Setup

In [1]:
# Install libraries
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

In [3]:
process_df = pd.read_csv('C:/Users/Alex/OneDrive/BrainStation/Data_Science_Bootcamp/Capstone_Project/capstone-Aboard89/data/f1_data_capstone_v2.csv')

In [4]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10921 entries, 0 to 10920
Data columns (total 91 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   New_column                               10921 non-null  int64  
 1   race_index                               10921 non-null  int64  
 2   Grand_Prix                               10921 non-null  object 
 3   year                                     10921 non-null  int64  
 4   F2_champion                              10921 non-null  int64  
 5   Former_F1_World_Champion                 10921 non-null  int64  
 6   home_race                                10921 non-null  int64  
 7   constructorId                            10921 non-null  int64  
 8   starting_grid_position                   10921 non-null  int64  
 9   points_in_previous_race                  10921 non-null  float64
 10  laps_in_previous_race                    10921

In [5]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0,New_column,race_index,Grand_Prix,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Weather_Conditions,Circuit_Type,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits
0,2117,2,Argentine Grand Prix,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,1,0,1.0,,,
1,215,2,Argentine Grand Prix,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,1,1.0,,,
2,231,2,Argentine Grand Prix,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,1,0,1.0,,,
3,232,2,Argentine Grand Prix,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,0,0.0,3.0,88.958,29.652667
4,266,2,Argentine Grand Prix,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Dry,Permanent Race Track,0,0,0.0,2.0,56.85,28.425


Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

### One-Hot Encoding

In [6]:
# One-hot encode "Weather_Conditions" column
process_df = pd.get_dummies(process_df, columns=['Weather_Conditions'], drop_first=True)


In [7]:
# One-hot encode "Circuit_Type" column
process_df = pd.get_dummies(process_df, columns=['Circuit_Type'], drop_first=True)

In [8]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0,New_column,race_index,Grand_Prix,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit.1
0,2117,2,Argentine Grand Prix,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,True,False,False,False,False,False
1,215,2,Argentine Grand Prix,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,,,,True,False,False,False,False,False
2,231,2,Argentine Grand Prix,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,True,False,False,False,False,False
3,232,2,Argentine Grand Prix,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,True,False,False,False,False,False
4,266,2,Argentine Grand Prix,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,True,False,False,False,False,False


In [10]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10921 entries, 0 to 10920
Data columns (total 95 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   New_column                               10921 non-null  int64  
 1   race_index                               10921 non-null  int64  
 2   Grand_Prix                               10921 non-null  object 
 3   year                                     10921 non-null  int64  
 4   F2_champion                              10921 non-null  int64  
 5   Former_F1_World_Champion                 10921 non-null  int64  
 6   home_race                                10921 non-null  int64  
 7   constructorId                            10921 non-null  int64  
 8   starting_grid_position                   10921 non-null  int64  
 9   points_in_previous_race                  10921 non-null  float64
 10  laps_in_previous_race                    10921

In [11]:
# Get columns that start with 'Weather_Conditions' or 'Circuit_Type'
columns_to_convert = [col for col in process_df.columns if 'Weather_Conditions' in col or 'Circuit_Type' in col]

# Convert True/False to 1/0 for each column in the list
for column in columns_to_convert:
    process_df[column] = process_df[column].astype(int)

In [12]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0,New_column,race_index,Grand_Prix,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit.1
0,2117,2,Argentine Grand Prix,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,1,0,0,0,0,0
1,215,2,Argentine Grand Prix,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,,,,1,0,0,0,0,0
2,231,2,Argentine Grand Prix,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,,,,1,0,0,0,0,0
3,232,2,Argentine Grand Prix,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0
4,266,2,Argentine Grand Prix,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0


In [13]:
process_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10921 entries, 0 to 10920
Data columns (total 95 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   New_column                               10921 non-null  int64  
 1   race_index                               10921 non-null  int64  
 2   Grand_Prix                               10921 non-null  object 
 3   year                                     10921 non-null  int64  
 4   F2_champion                              10921 non-null  int64  
 5   Former_F1_World_Champion                 10921 non-null  int64  
 6   home_race                                10921 non-null  int64  
 7   constructorId                            10921 non-null  int64  
 8   starting_grid_position                   10921 non-null  int64  
 9   points_in_previous_race                  10921 non-null  float64
 10  laps_in_previous_race                    10921

### Null Values

In [15]:
# Count null values per column
null_values_per_column = process_df.isnull().sum()

# Display the counts
print(null_values_per_column)

New_column                            0
race_index                            0
Grand_Prix                            0
year                                  0
F2_champion                           0
                                     ..
Weather_Conditions_Rain               0
Weather_Conditions_Very changeable    0
Circuit_Type_Permanent Race Track     0
Circuit_Type_Street Circuit           0
Circuit_Type_Street Circuit           0
Length: 95, dtype: int64


In [16]:
# Filter and show only columns that have null values
null_values_per_column = null_values_per_column[null_values_per_column > 0]

# Display the filtered counts
print(null_values_per_column)

Lapped                 280
Number_Of_Stops       1597
Total_time_in_pits    1597
Avg_time_in_pits      1597
dtype: int64


In [17]:
#Impute with the mean values

# List of columns with missing values to impute with mean
columns_to_impute = ['Lapped', 'Number_Of_Stops', 'Total_time_in_pits', 'Avg_time_in_pits']

# Loop through each column and impute missing values with mean of that column
for column in columns_to_impute:
    process_df[column] = process_df[column].fillna(process_df[column].mean())


In [18]:
# Verify the imputation
print(process_df[columns_to_impute].isnull().sum())

Lapped                0
Number_Of_Stops       0
Total_time_in_pits    0
Avg_time_in_pits      0
dtype: int64


In [19]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0,New_column,race_index,Grand_Prix,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit.1
0,2117,2,Argentine Grand Prix,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
1,215,2,Argentine Grand Prix,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
2,231,2,Argentine Grand Prix,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
3,232,2,Argentine Grand Prix,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0
4,266,2,Argentine Grand Prix,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0


### Reindex the dataframe

In [20]:
# Set 'New_column' as the new index of the DataFrame
process_df = process_df.set_index('New_column')

# Verify the new index
print(process_df.index)

Index([    2117,      215,      231,      232,      266,      268,    21521,
           2159,    21710,     2174,
       ...
       52711710,  5271179,  5271313,  5271318, 52721016,  5272100, 52721319,
       52721311, 52721412, 52721414],
      dtype='int64', name='New_column', length=10921)


In [21]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,race_index,Grand_Prix,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit
New_column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1
2117,2,Argentine Grand Prix,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
215,2,Argentine Grand Prix,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
231,2,Argentine Grand Prix,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
232,2,Argentine Grand Prix,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0
266,2,Argentine Grand Prix,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0


### Drop columns

In [22]:
# Drop 'race_index' and 'Grand_Prix' columns from the DataFrame
process_df = process_df.drop(columns=['race_index', 'Grand_Prix'])

# Verify the columns are dropped
print(process_df.columns)

Index(['year', 'F2_champion', 'Former_F1_World_Champion', 'home_race',
       'constructorId', 'starting_grid_position', 'points_in_previous_race',
       'laps_in_previous_race', 'race_win',
       'constructorId_points_at_stage_of_season',
       'driver_points_at_stage_of_season', 'engine_manufacturer_Acer',
       'engine_manufacturer_Arrows', 'engine_manufacturer_Asiatech',
       'engine_manufacturer_BMW', 'engine_manufacturer_Cosworth',
       'engine_manufacturer_Ferrari', 'engine_manufacturer_Ford',
       'engine_manufacturer_Hart', 'engine_manufacturer_Honda',
       'engine_manufacturer_Mecachrome', 'engine_manufacturer_Mercedes',
       'engine_manufacturer_Mugen-Honda', 'engine_manufacturer_Petronas',
       'engine_manufacturer_Peugeot', 'engine_manufacturer_Playlife',
       'engine_manufacturer_Red Bull', 'engine_manufacturer_Renault',
       'engine_manufacturer_Supertec', 'engine_manufacturer_Toro Rosso',
       'engine_manufacturer_Toyota', 'engine_manufacturer_Yama

In [23]:
pd.set_option('display.max_columns', None)
process_df.head()

Unnamed: 0_level_0,year,F2_champion,Former_F1_World_Champion,home_race,constructorId,starting_grid_position,points_in_previous_race,laps_in_previous_race,race_win,constructorId_points_at_stage_of_season,driver_points_at_stage_of_season,engine_manufacturer_Acer,engine_manufacturer_Arrows,engine_manufacturer_Asiatech,engine_manufacturer_BMW,engine_manufacturer_Cosworth,engine_manufacturer_Ferrari,engine_manufacturer_Ford,engine_manufacturer_Hart,engine_manufacturer_Honda,engine_manufacturer_Mecachrome,engine_manufacturer_Mercedes,engine_manufacturer_Mugen-Honda,engine_manufacturer_Petronas,engine_manufacturer_Peugeot,engine_manufacturer_Playlife,engine_manufacturer_Red Bull,engine_manufacturer_Renault,engine_manufacturer_Supertec,engine_manufacturer_Toro Rosso,engine_manufacturer_Toyota,engine_manufacturer_Yamaha,constructor_nationality_American,constructor_nationality_Austrian,constructor_nationality_British,constructor_nationality_Dutch,constructor_nationality_French,constructor_nationality_German,constructor_nationality_Indian,constructor_nationality_Irish,constructor_nationality_Italian,constructor_nationality_Japanese,constructor_nationality_Malaysian,constructor_nationality_Russian,constructor_nationality_Spanish,constructor_nationality_Swiss,Nationality_American,Nationality_Argentine,Nationality_Australian,Nationality_Austrian,Nationality_Belgian,Nationality_Brazilian,Nationality_British,Nationality_Canadian,Nationality_Chinese,Nationality_Colombian,Nationality_Czech,Nationality_Danish,Nationality_Dutch,Nationality_Finnish,Nationality_French,Nationality_German,Nationality_Hungarian,Nationality_Indian,Nationality_Indonesian,Nationality_Irish,Nationality_Italian,Nationality_Japanese,Nationality_Malaysian,Nationality_Mexican,Nationality_Monegasque,Nationality_New Zealander,Nationality_Polish,Nationality_Portuguese,Nationality_Russian,Nationality_Spanish,Nationality_Swedish,Nationality_Swiss,Nationality_Thai,Nationality_Venezuelan,Mechanical,Driver_Issue,Lapped,Number_Of_Stops,Total_time_in_pits,Avg_time_in_pits,Weather_Conditions_Dry,Weather_Conditions_Rain,Weather_Conditions_Very changeable,Circuit_Type_Permanent Race Track,Circuit_Type_Street Circuit,Circuit_Type_Street Circuit
New_column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1
2117,1995,0,0,0,1,17,1.0,70,0,4.0,1.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
215,1995,0,0,0,1,5,3.0,70,0,4.0,3.0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
231,1995,0,0,0,3,1,6.0,71,0,16.0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1.0,2.107465,54.14118,26.069149,1,0,0,0,0,0
232,1995,0,1,0,3,2,0.0,30,1,16.0,10.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,3.0,88.958,29.652667,1,0,0,0,0,0
266,1995,0,0,0,6,6,2.0,70,0,13.0,8.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,2.0,56.85,28.425,1,0,0,0,0,0


### Send to CSV

In [49]:
process_df.to_csv('model_data_v2.csv', index=False)