**Further data preprocessing for machine learning algorithms**
---

---
**Importing necessary libraries and datasets**

In [8]:
import pandas as pd 
import numpy as np 

df = pd.read_csv('mdf7.csv')

---
**Needed modifications**

In [9]:
'''
In this part of the code, the columns for driver and constructor wins and poles, are being modified 
automatically whenever a driver wins or takes pole position. The update will be visible in the next 
race of the win or pole position. 
'''

df.rename(columns={'updated_wins': 'constructors_wins', 'updated_poles': 'constructors_poles'}, inplace=True)

races = df[['season', 'circuit_id', 'round']].drop_duplicates().values

for race in races:
    season, circuit, rnd = race
    winning_constructor = df[(df['season'] == season) & 
                             (df['circuit_id'] == circuit) & 
                             (df['round'] == rnd) & 
                             (df['Rank'] == 1)]['constructor_id'].values[0]
    pole_constructor = df[(df['season'] == season) & 
                          (df['circuit_id'] == circuit) & 
                          (df['round'] == rnd) & 
                          (df['grid'] == 1)]['constructor_id'].values[0]
    df.loc[(df['season'] == season) & 
           (df['circuit_id'] == circuit) & 
           (df['round'] == rnd) & 
           (df['constructor_id'] == winning_constructor), 
           'constructors_wins'] -=1
    df.loc[(df['season'] == season) & 
           (df['circuit_id'] == circuit) & 
           (df['round'] == rnd) & 
           (df['constructor_id'] == pole_constructor), 
           'constructors_poles'] -= 1

df.loc[df['Rank'] == 1, ['updated_driver_wins']] -= 1
df.loc[df['grid'] == 1, ['updated_driver_poles']] -= 1    

    

In [10]:
'''
In this part of the code, we encode the variables that need one-hot encoding for machine learning
purposes. Some further names modifcations take place. Moreover, columns that are directly correlated 
with the target variable are dropped to avoid any implications in the predictions' results. 
'''

columns_to_encode = ['driver_id','circuit_id','grid', 'constructor_id', 'configuration', 'drs_zones', 'Engine Manufacturer']

df1 = pd.get_dummies(df, columns=columns_to_encode)

df1.rename(columns={'updated_driver_wins': 'driver_wins', 'updated_driver_poles': 'driver_poles'}, inplace=True)

df1 = df1[df1['Finished'] != 0]

df1.drop(['Finished','Points','circuit_country','year_of_birth', 'Laps', 'constructor_country_of_origin','driver_country_of_origin',
         'Engine', 'Displacement', 'Cylinders', 'Valves'], axis = 1, inplace = True)




In [11]:
df1.to_csv('df1.csv', index = False)