### F1 Emilia Romagna GP Prediction 

2024 Emilia Romagna GP prediction model was constructed based on 20 mini models: each projecting finishing place for each driver based on linear regression algorithm. All 20 results were then gathered and sorted from lowest to highest.

Following variables were used in the model:

Explained variables (Y):
-End of race position (pozycja)

Explanatory variables (X):
-Starting grid positionn (s)
-Qualifying result (q)
-Final Practice result (f)
-Did Not Finish Indicator (dnf)
-Team Constructor standing (c)
-Last Race at the circuit position (l)
-Did Not Finish the last race at the circuit indicator (ld)

Note 1: As Emilia Romagna 2023 GP was cancelled, the race results used for the l variable were the 2023 Spanish GP, which was the 8th GP of the last season (Imola 2024 GP is the 7th of the season, however the 7th GP of the 2023 season was the Monaco GP, which is a very specific street circuit, therefore the change to the Spanish GP. 
Similar situation with the 2024 Chinese GP which was cancelled, so the results from 2023 Azerbeijan GP were used (both races were the 4th GP of the respective seasons).

Note 2: As not every driver was in F1 at the beginning of the 2023 season. Therefore, results of drivers who were in the same car were used (e.g. Nick De Vries started in 2023, however he was replaced by Daniel Ricciardo).

Note 3: In every race there is a possibility of car problems happening for each driver, sometimes resulting in a DNF. Therefore the propability of such an event were randomised for each driver. As on average 2 to 3 drivers have car problems each race the propability was set at 12.5%. 

### Car problems randomise

In [269]:
num_drivers = 20

np.random.seed(10)

#Generating values between 0 (no car problem) and 1 (car problem)
random_values = np.random.rand(num_drivers)

car_problem = (random_values < 0.125).astype(int)

drivers = [f'Driver {i+1}' for i in range(num_drivers)]
cp = pd.DataFrame({'Driver': drivers, 'DNF': car_problem})

print(cp)

       Driver  DNF
0    Driver 1    0
1    Driver 2    1
2    Driver 3    0
3    Driver 4    0
4    Driver 5    0
5    Driver 6    0
6    Driver 7    0
7    Driver 8    0
8    Driver 9    0
9   Driver 10    1
10  Driver 11    0
11  Driver 12    0
12  Driver 13    1
13  Driver 14    0
14  Driver 15    0
15  Driver 16    0
16  Driver 17    0
17  Driver 18    0
18  Driver 19    0
19  Driver 20    0


### Importing libraries and data

In [247]:
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

os.chdir("/Users/sbbogdyn/Downloads")

df = pd.read_excel('Zeszyt1.xlsx')

#Removing the last row of df which is the Imola GP
df_pre = df[:-1]

### Performing forecasts for each driver. Methodology will be showcased on the Max Verstappen example.

### Y1 - Max Verstappen

In [273]:
#Selecting Max Verstappen variables from the dataframe

y1_pre = df_pre['MV pozycja']
columns_to_select = ['smv', 'qmv', 'fmv', 'dnfmv', 'cmv', 'lmv', 'ldmv']

x1_pre = df_pre[columns_to_select]

# Splitting the data into training and test sets
x1_pre_train, x1_pre_test, y1_pre_train, y1_pre_test = train_test_split(x1_pre, y1_pre, test_size=0.16, random_state=1)

# Performing linear regression
model = LinearRegression()
model.fit(x1_pre_train, y1_pre_train)
y1_pred = model.predict(x1_pre_test)

#Subtracting the Emila Romagna GP data to perform the forecast 
df_prognose = df.iloc[[-1]]

y1_pro = df_prognose['MV pozycja']
columns_to_select = ['smv', 'qmv', 'fmv', 'dnfmv', 'cmv', 'lmv', 'ldmv']

x1_pro = df_prognose[columns_to_select]

#Forecasting driver final position
y1_act_pred = model.predict(x1_pro)

print(y1_act_pred)

[0.5]


### Y2 - Sergio Perez

In [274]:
y2_pre = df_pre['SP pozycja']
columns_to_select = ['ssp', 'qsp', 'fsp','dnfsp', 'csp', 'lsp', 'ldsp']

x2_pre = df_pre[columns_to_select]


x2_pre_train, x2_pre_test, y2_pre_train, y2_pre_test = train_test_split(x2_pre, y2_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x2_pre_train, y2_pre_train)
y2_pred = model.predict(x2_pre_test)

df_prognose = df.iloc[[-1]]

y2_pro = df_prognose['SP pozycja']
columns_to_select = ['ssp', 'qsp', 'fsp', 'dnfsp', 'csp', 'lsp', 'ldsp']

x2_pro = df_prognose[columns_to_select]

y2_act_pred = model.predict(x2_pro)

print(y2_act_pred)

[8.4]


### Y3 - Lewis Hamilton

In [275]:
y3_pre = df_pre['LH pozycja']
columns_to_select = ['slh', 'qlh', 'flh','dnflh', 'clh', 'llh', 'ldlh']

x3_pre = df_pre[columns_to_select]


x3_pre_train, x3_pre_test, y3_pre_train, y3_pre_test = train_test_split(x3_pre, y3_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x3_pre_train, y3_pre_train)
y3_pred = model.predict(x3_pre_test)

df_prognose = df.iloc[[-1]]

y3_pro = df_prognose['LH pozycja']
columns_to_select = ['slh', 'qlh', 'flh', 'dnflh', 'clh', 'llh', 'ldlh']

x3_pro = df_prognose[columns_to_select]

y3_act_pred = model.predict(x3_pro)

print(y3_act_pred)

[15.59558824]


### Y4 - George Russell

In [276]:
y4_pre = df_pre['GR pozycja']
columns_to_select = ['sgr', 'qgr', 'fgr','dnfgr', 'cgr', 'lgr', 'ldgr']

x4_pre = df_pre[columns_to_select]


x4_pre_train, x4_pre_test, y4_pre_train, y4_pre_test = train_test_split(x4_pre, y4_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x4_pre_train, y4_pre_train)
y4_pred = model.predict(x4_pre_test)

df_prognose = df.iloc[[-1]]

y4_pro = df_prognose['GR pozycja']
columns_to_select = ['sgr', 'qgr', 'fgr', 'dnfgr', 'cgr', 'lgr', 'ldgr']

x4_pro = df_prognose[columns_to_select]

y4_act_pred = model.predict(x4_pro)

print(y2_act_pred)

[8.4]


### Y5 - Charles Leclerc

In [278]:
y5_pre = df_pre['CL pozycja']
columns_to_select = ['scl', 'qcl', 'fcl','dnfcl', 'ccl', 'lcl', 'ldcl']

x5_pre = df_pre[columns_to_select]


x5_pre_train, x5_pre_test, y5_pre_train, y5_pre_test = train_test_split(x5_pre, y5_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x5_pre_train, y5_pre_train)
y5_pred = model.predict(x5_pre_test)

df_prognose = df.iloc[[-1]]

y5_pro = df_prognose['CL pozycja']
columns_to_select = ['scl', 'qcl', 'fcl','dnfcl', 'ccl', 'lcl', 'ldcl']

x5_pro = df_prognose[columns_to_select]

y5_act_pred = model.predict(x5_pro)

print(y5_act_pred)

[3.38625576]


### Y6 - Carlos Sainz

In [279]:
y6_pre = df_pre['CS pozycja']
columns_to_select = ['scs', 'qcs', 'fcs','dnfcs', 'ccs', 'lcs', 'ldcs']

x6_pre = df_pre[columns_to_select]


x6_pre_train, x6_pre_test, y6_pre_train, y6_pre_test = train_test_split(x6_pre, y6_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x6_pre_train, y6_pre_train)
y6_pred = model.predict(x6_pre_test)

df_prognose = df.iloc[[-1]]

y6_pro = df_prognose['CS pozycja']
columns_to_select = ['scs', 'qcs', 'fcs','dnfcs', 'ccs', 'lcs', 'ldcs']

x6_pro = df_prognose[columns_to_select]

y6_act_pred = model.predict(x6_pro)

print(y6_act_pred)

[9.21052632]


### Y7 - Lando Norris

In [280]:
y7_pre = df_pre['LN pozycja']
columns_to_select = ['sln', 'qln', 'fln','dnfln', 'cln', 'lln', 'ldln']

x7_pre = df_pre[columns_to_select]


x7_pre_train, x7_pre_test, y7_pre_train, y7_pre_test = train_test_split(x7_pre, y7_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x7_pre_train, y7_pre_train)
y7_pred = model.predict(x7_pre_test)

df_prognose = df.iloc[[-1]]

y7_pro = df_prognose['LN pozycja']
columns_to_select = ['sln', 'qln', 'fln','dnfln', 'cln', 'lln', 'ldln']

x7_pro = df_prognose[columns_to_select]

y7_act_pred = model.predict(x7_pro)

print(y7_act_pred)

[7.46938776]


### Y8 - Oscar Piastri

In [281]:
y8_pre = df_pre['OP pozycja']
columns_to_select = ['sop', 'qop', 'fop','dnfop', 'cop', 'lop', 'ldop']

x8_pre = df_pre[columns_to_select]


x8_pre_train, x8_pre_test, y8_pre_train, y8_pre_test = train_test_split(x8_pre, y8_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x8_pre_train, y8_pre_train)
y8_pred = model.predict(x8_pre_test)

df_prognose = df.iloc[[-1]]

y8_pro = df_prognose['OP pozycja']
columns_to_select = ['sop', 'qop', 'fop','dnfop', 'cop', 'lop', 'ldop']

x8_pro = df_prognose[columns_to_select]

y8_act_pred = model.predict(x8_pro)

print(y8_act_pred)

[19.15]


### Y9 - Fernando Alonso

In [282]:
y9_pre = df_pre['FA pozycja']
columns_to_select = ['sfa', 'qfa', 'ffa','dnffa', 'cfa', 'lfa', 'ldfa']

x9_pre = df_pre[columns_to_select]


x9_pre_train, x9_pre_test, y9_pre_train, y9_pre_test = train_test_split(x9_pre, y9_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x9_pre_train, y9_pre_train)
y9_pred = model.predict(x9_pre_test)

df_prognose = df.iloc[[-1]]

y9_pro = df_prognose['FA pozycja']
columns_to_select = ['sfa', 'qfa', 'ffa', 'dnffa', 'cfa', 'lfa', 'ldfa']

x9_pro = df_prognose[columns_to_select]

y9_act_pred = model.predict(x9_pro)

print(y9_act_pred)

[17.32060544]


### Y10 - Lance Stroll

In [283]:
y10_pre = df_pre['LS pozycja']
columns_to_select = ['sls', 'qls', 'fls','dnfls', 'cls', 'lls', 'ldls']

x10_pre = df_pre[columns_to_select]


x10_pre_train, x10_pre_test, y10_pre_train, y10_pre_test = train_test_split(x10_pre, y10_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x10_pre_train, y10_pre_train)
y10_pred = model.predict(x10_pre_test)

df_prognose = df.iloc[[-1]]

y10_pro = df_prognose['LS pozycja']
columns_to_select = ['sls', 'qls', 'fls', 'dnfls', 'cls', 'lls', 'ldls']

x10_pro = df_prognose[columns_to_select]

y10_act_pred = model.predict(x10_pro)

print(y10_act_pred)

[18.01276742]


### Y11 - Nico Hulkenberg

In [284]:
y11_pre = df_pre['NH pozycja']
columns_to_select = ['snh', 'qnh', 'fnh','dnfnh', 'cnh', 'lnh', 'ldnh']

x11_pre = df_pre[columns_to_select]


x11_pre_train, x11_pre_test, y11_pre_train, y11_pre_test = train_test_split(x11_pre, y11_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x11_pre_train, y11_pre_train)
y11_pred = model.predict(x11_pre_test)

df_prognose = df.iloc[[-1]]

y11_pro = df_prognose['NH pozycja']
columns_to_select = ['snh', 'qnh', 'fnh', 'dnfnh', 'cnh', 'lnh', 'ldnh']

x11_pro = df_prognose[columns_to_select]

y11_act_pred = model.predict(x11_pro)

print(y11_act_pred)

[10.74712644]


### Y12 - Kevin Magnussen

In [285]:
y12_pre = df_pre['KM pozycja']
columns_to_select = ['skm', 'qkm', 'fkm','dnfkm', 'ckm', 'lkm', 'ldkm']

x12_pre = df_pre[columns_to_select]


x12_pre_train, x12_pre_test, y12_pre_train, y12_pre_test = train_test_split(x12_pre, y12_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x12_pre_train, y12_pre_train)
y12_pred = model.predict(x12_pre_test)

df_prognose = df.iloc[[-1]]

y12_pro = df_prognose['KM pozycja']
columns_to_select = ['skm', 'qkm', 'fkm', 'dnfkm', 'ckm', 'lkm', 'ldkm']

x12_pro = df_prognose[columns_to_select]

y12_act_pred = model.predict(x12_pro)

print(y12_act_pred)

[9.63679784]


### Y13 - Yuki Tsunoda

In [286]:
y13_pre = df_pre['YT pozycja']
columns_to_select = ['syt', 'qyt', 'fyt','dnfyt', 'cyt', 'lyt', 'ldyt']

x13_pre = df_pre[columns_to_select]


x13_pre_train, x13_pre_test, y13_pre_train, y13_pre_test = train_test_split(x13_pre, y13_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x13_pre_train, y13_pre_train)
y13_pred = model.predict(x13_pre_test)

df_prognose = df.iloc[[-1]]

y13_pro = df_prognose['YT pozycja']
columns_to_select = ['syt', 'qyt', 'fyt', 'dnfyt', 'cyt', 'lyt', 'ldyt']

x13_pro = df_prognose[columns_to_select]

y13_act_pred = model.predict(x13_pro)

print(y13_act_pred)

[18.68487395]


### Y14 - Daniel Ricciardo

In [287]:
y14_pre = df_pre['DR pozycja']
columns_to_select = ['sdr', 'qdr', 'fdr','dnfdr', 'cdr', 'ldr', 'lddr']

x14_pre = df_pre[columns_to_select]


x14_pre_train, x14_pre_test, y14_pre_train, y14_pre_test = train_test_split(x14_pre, y14_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x14_pre_train, y14_pre_train)
y14_pred = model.predict(x14_pre_test)

df_prognose = df.iloc[[-1]]

y14_pro = df_prognose['DR pozycja']
columns_to_select = ['sdr', 'qdr', 'fdr', 'dnfdr', 'cdr', 'ldr', 'lddr']

x14_pro = df_prognose[columns_to_select]

y14_act_pred = model.predict(x14_pro)

print(y14_act_pred)

[16.59345458]


### Y15 - Alexander Albon

In [288]:
y15_pre = df_pre['AA pozycja']
columns_to_select = ['saa', 'qaa', 'faa','dnfaa', 'caa', 'laa', 'ldaa']

x15_pre = df_pre[columns_to_select]


x15_pre_train, x15_pre_test, y15_pre_train, y15_pre_test = train_test_split(x15_pre, y15_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x15_pre_train, y15_pre_train)
y15_pred = model.predict(x15_pre_test)

df_prognose = df.iloc[[-1]]

y15_pro = df_prognose['AA pozycja']
columns_to_select = ['saa', 'qaa', 'faa', 'dnfaa', 'caa', 'laa', 'ldaa']

x15_pro = df_prognose[columns_to_select]

y15_act_pred = model.predict(x15_pro)

print(y15_act_pred)

[26.05084746]


### Y16 - Logan Sargeant

In [289]:
y16_pre = df_pre['LSR pozycja']
columns_to_select = ['slsr', 'qlsr', 'flsr','dnflsr', 'clsr', 'llsr', 'ldlsr']

x16_pre = df_pre[columns_to_select]


x16_pre_train, x16_pre_test, y16_pre_train, y16_pre_test = train_test_split(x16_pre, y16_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x16_pre_train, y16_pre_train)
y16_pred = model.predict(x16_pre_test)

df_prognose = df.iloc[[-1]]

y16_pro = df_prognose['LSR pozycja']
columns_to_select = ['slsr', 'qlsr', 'flsr', 'dnflsr', 'clsr', 'llsr', 'ldlsr']

x16_pro = df_prognose[columns_to_select]

y16_act_pred = model.predict(x16_pro)

print(y16_act_pred)

[16.60911592]


### Y17 - Valtteri Bottas

In [290]:
y17_pre = df_pre['VB pozycja']
columns_to_select = ['svb', 'qvb', 'fvb','dnfvb', 'cvb', 'lvb', 'ldvb']

x17_pre = df_pre[columns_to_select]


x17_pre_train, x17_pre_test, y17_pre_train, y17_pre_test = train_test_split(x17_pre, y17_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x17_pre_train, y17_pre_train)
y17_pred = model.predict(x17_pre_test)

df_prognose = df.iloc[[-1]]

y17_pro = df_prognose['VB pozycja']
columns_to_select = ['svb', 'qvb', 'fvb', 'dnfvb', 'cvb', 'lvb', 'ldvb']

x17_pro = df_prognose[columns_to_select]

y17_act_pred = model.predict(x17_pro)

print(y17_act_pred)

[15.13505747]


### Y18 - Zhou Guanyu 

In [291]:
y18_pre = df_pre['GZ pozycja']
columns_to_select = ['sgz', 'qgz', 'fgz','dnfgz', 'cgz', 'lgz', 'ldgz']

x18_pre = df_pre[columns_to_select]


x18_pre_train, x18_pre_test, y18_pre_train, y18_pre_test = train_test_split(x18_pre, y18_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x18_pre_train, y18_pre_train)
y18_pred = model.predict(x18_pre_test)

df_prognose = df.iloc[[-1]]

y18_pro = df_prognose['GZ pozycja']
columns_to_select = ['sgz', 'qgz', 'fgz', 'dnfgz', 'cgz', 'lgz', 'ldgz']

x18_pro = df_prognose[columns_to_select]

y18_act_pred = model.predict(x18_pro)

print(y18_act_pred)

[10.83100734]


### Y19 - Esteban Ocon

In [292]:
y19_pre = df_pre['EO pozycja']
columns_to_select = ['seo', 'qeo', 'feo','dnfeo', 'ceo', 'leo', 'ldeo']

x19_pre = df_pre[columns_to_select]


x19_pre_train, x19_pre_test, y19_pre_train, y19_pre_test = train_test_split(x19_pre, y19_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x19_pre_train, y19_pre_train)
y19_pred = model.predict(x19_pre_test)

df_prognose = df.iloc[[-1]]

y19_pro = df_prognose['EO pozycja']
columns_to_select = ['seo', 'qeo', 'feo', 'dnfeo', 'ceo', 'leo', 'ldeo']

x19_pro = df_prognose[columns_to_select]

y19_act_pred = model.predict(x19_pro)

print(y19_act_pred)

[18.66044776]


### Y20 - Pierre Gasly

In [293]:
y20_pre = df_pre['PG pozycja']
columns_to_select = ['spg', 'qpg', 'fpg','dnfpg', 'cpg', 'lpg', 'ldpg']

x20_pre = df_pre[columns_to_select]


x20_pre_train, x20_pre_test, y20_pre_train, y20_pre_test = train_test_split(x20_pre, y20_pre, test_size=0.16, random_state=1)


model = LinearRegression()
model.fit(x20_pre_train, y20_pre_train)
y20_pred = model.predict(x20_pre_test)

df_prognose = df.iloc[[-1]]

y20_pro = df_prognose['PG pozycja']
columns_to_select = ['spg', 'qpg', 'fpg', 'dnfpg', 'cpg', 'lpg', 'ldpg']

x20_pro = df_prognose[columns_to_select]

y20_act_pred = model.predict(x20_pro)

print(y20_act_pred)

[10.46599787]


### Final Results

In [294]:
data = [
    ('Max Verstappen', y1_act_pred),
    ('Sergio Perez', y2_act_pred),
    ('Lewis Hamilton', y3_act_pred),
    ('George Russell', y4_act_pred),
    ('Charles Leclerc', y5_act_pred),
    ('Carlos Sainz', y6_act_pred),
    ('Lando Norris', y7_act_pred),
    ('Oscar Piastri', y8_act_pred),
    ('Fernando Alonso', y9_act_pred),
    ('Lance Stroll', y10_act_pred),
    ('Nico Hulkenberg', y11_act_pred),
    ('Kevin Magnussen', y12_act_pred),
    ('Yuki Tsunoda', y13_act_pred),
    ('Daniel Ricciardo', y14_act_pred),
    ('Alexander Albon', y15_act_pred),
    ('Logan Sargeant', y16_act_pred),
    ('Valtteri Bottas', y17_act_pred),
    ('Zhou Guanyu', y18_act_pred),
    ('Esteban Ocon', y19_act_pred),
    ('Pierre Gasly', y20_act_pred),
]

# Sorting the list based on the y_act_pred value
sorted_data = sorted(data, key=lambda x: x[1])

# Converting the list into a DataFrame
results = pd.DataFrame(sorted_data, columns=['Driver', 'Power Ranking'])

print(results)

              Driver          Power Ranking
0     Max Verstappen  [0.49999999999999933]
1    Charles Leclerc    [3.386255761161175]
2       Lando Norris    [7.469387755101922]
3       Sergio Perez    [8.399999999999967]
4       Carlos Sainz     [9.21052631578948]
5    Kevin Magnussen    [9.636797840307343]
6       Pierre Gasly   [10.465997866758913]
7    Nico Hulkenberg   [10.747126436781615]
8        Zhou Guanyu   [10.831007335447602]
9     George Russell   [14.428571428571427]
10   Valtteri Bottas   [15.135057471264407]
11    Lewis Hamilton   [15.595588235294109]
12  Daniel Ricciardo   [16.593454575930274]
13    Logan Sargeant    [16.60911592230045]
14   Fernando Alonso   [17.320605438795045]
15      Lance Stroll     [18.0127674157703]
16      Esteban Ocon   [18.660447761194035]
17      Yuki Tsunoda    [18.68487394957984]
18     Oscar Piastri   [19.149999999999988]
19   Alexander Albon   [26.050847457627118]
