# F1: Compute the F1 ranking

In this notebook, I will try to calculate the ranking of an F1 race based on results from earlier seasons. I will do this using this dataset [formula-1-world-championship-1950-2020](https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020)
What makes this an interesting subject is that in 2022, the biggest rule changes in a couple of years happened. The dataset used here is from 1950-2022 but we will only use the part from 2011 to 2021 to train and 2022 to test the model.
In the model we will use the following features: position after the qualifications, the number of pitstops in the race and the constructors position. The constructors position is the sum of both the drivers of the team.

First, I will import the data and create a dataframe using the kaggle API. I will also download the kaggle library if not installed yet. Lastly, I will unzip the file

In [1]:
# To run without downloading the folder, uncomment the following lines and fill in your kaggle_username and kaggle_key
"""
#import the data using the kaggle API
import os
os.environ['KAGGLE_USERNAME'] = 'api_name'
os.environ['KAGGLE_KEY'] = 'api_key'

#install the kaggle library if not installed yet
import pip
def install(package):
    if hasattr(pip, 'main'):
        pip.main(['install', package])
    else:
        pip._internal.main(['install', package])

install('kaggle')
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

api.dataset_download_files('rohanrao/formula-1-world-championship-1950-2020', path=".")

#unzip the file
import zipfile
with zipfile.ZipFile('formula-1-world-championship-1950-2020.zip', 'r') as zip_ref:
    zip_ref.extractall('formula-1-world-championship-1950-2020')
"""

Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.


ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Content-Length': '0', 'Date': 'Fri, 03 Jun 2022 14:23:35 GMT', 'Access-Control-Allow-Credentials': 'true', 'Set-Cookie': 'ka_sessionid=93d9a6e8f34f318972da951d0226f2a0; max-age=2626560; path=/, GCLB=CM_QxMPbiYeVWQ; path=/; HttpOnly', 'Turbolinks-Location': 'https://www.kaggle.com/api/v1/datasets/download/rohanrao/formula-1-world-championship-1950-2020', 'Strict-Transport-Security': 'max-age=63072000; includeSubDomains; preload', 'Content-Security-Policy': "object-src 'none'; script-src 'nonce-dPMoCjFkSl6/4wP29HocSQ==' 'report-sample' 'unsafe-inline' 'unsafe-eval' 'strict-dynamic' https: http:; frame-src 'self' https://www.kaggleusercontent.com https://www.youtube.com/embed/ https://polygraph-cool.github.io https://www.google.com/recaptcha/ https://form.jotform.com https://submit.jotform.us https://submit.jotformpro.com https://submit.jotform.com https://www.docdroid.com https://www.docdroid.net https://kaggle-static.storage.googleapis.com https://kaggle-static-staging.storage.googleapis.com https://kkb-dev.jupyter-proxy.kaggle.net https://kkb-staging.jupyter-proxy.kaggle.net https://kkb-production.jupyter-proxy.kaggle.net https://kkb-dev.firebaseapp.com https://kkb-staging.firebaseapp.com https://kkb-production.firebaseapp.com https://kaggle-metastore-test.firebaseapp.com https://kaggle-metastore.firebaseapp.com https://apis.google.com https://content-sheets.googleapis.com/ https://accounts.google.com/ https://storage.googleapis.com https://docs.google.com https://drive.google.com; base-uri 'none'; report-uri https://csp.withgoogle.com/csp/kaggle/20201130;", 'X-Content-Type-Options': 'nosniff', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})


Now that the dataset has been downloaded, we can start using it.

## Files


In [2]:
import pandas as pd
pd.set_option('mode.chained_assignment', None)
def openFile(file):
    df = pd.read_csv(file)
    return df

In [3]:
file_name_qualifying = 'formula-1-world-championship-1950-2020/qualifying.csv'
qualifying = openFile(file_name_qualifying)
file_name_pitstops = 'formula-1-world-championship-1950-2020/pit_stops.csv'
pitstops = openFile(file_name_pitstops)
file_name_constructors_standing = 'formula-1-world-championship-1950-2020/constructor_standings.csv'
constructorsstanding = openFile(file_name_constructors_standing)
file_name_driverstanding = 'formula-1-world-championship-1950-2020/driver_standings.csv'
driverstanding = openFile(file_name_driverstanding)
file_name_races = 'formula-1-world-championship-1950-2020/races.csv'
races = openFile(file_name_races)
file_name_driver = 'formula-1-world-championship-1950-2020/drivers.csv'
drivers = openFile(file_name_driver)
file_name_results = 'formula-1-world-championship-1950-2020/results.csv'
result = openFile(file_name_results)

## Data
The data used will start in 2011. This is because the pitstop data only starts from raceid 841, which is in 2011. There are still over 200 races in the dataset and per race 20 drivers. So there are more than 4000 samples.
Also, data from 1950 is less reliable so a larger dataset is not directly a better dataset.
In the code cell below, you can see that pitstops starts at raceid 841.

In [4]:
pitstops.head()

Unnamed: 0,raceId,driverId,stop,lap,time,duration,milliseconds
0,841,153,1,1,17:05:23,26.898,26898
1,841,30,1,1,17:05:52,25.021,25021
2,841,17,1,11,17:20:48,23.426,23426
3,841,4,1,12,17:22:34,23.251,23251
4,841,13,1,13,17:24:10,23.842,23842


In [5]:
qualifying.head()

Unnamed: 0,qualifyId,raceId,driverId,constructorId,number,position,q1,q2,q3
0,1,18,1,1,22,1,1:26.572,1:25.187,1:26.714
1,2,18,9,2,4,2,1:26.103,1:25.315,1:26.869
2,3,18,5,1,23,3,1:25.664,1:25.452,1:27.079
3,4,18,13,6,2,4,1:25.994,1:25.691,1:27.178
4,5,18,2,2,3,5,1:25.960,1:25.518,1:27.236


## Constructor position
As earlier stated, the constructors position is the sum of both the drivers of the team. It is a good indicator of the reliability of the team. It is also a good indicator of the team's performance overall.


In [6]:
constructorsstanding.head()

Unnamed: 0,constructorStandingsId,raceId,constructorId,points,position,positionText,wins
0,1,18,1,14.0,1,1,1
1,2,18,2,8.0,3,3,0
2,3,18,3,9.0,2,2,0
3,4,18,4,5.0,4,4,0
4,5,18,5,2.0,5,5,0


## Pitstops
Each driver must finish the race on at least two different tyres. The different tyres are: soft, medium and hard, intermediate and wet. Intermediate and wet are if the track is wet or it is raining. The soft, medium and hard tyres are decided by Pirelli each race. They are compound 1, 2, 3, 4 or 5 (c1 - .. - c5). From these, three compunds are then chosen per race. When a driver finishes, he must have driven on at least two different compounds, otherwise he will be disqualified for that race. It follows that each driver must make at least one pit stop. For a more in depth explanation: [Wikipedia](https://en.wikipedia.org/wiki/Formula_One_tyres)

In [7]:
pitstops.head()

Unnamed: 0,raceId,driverId,stop,lap,time,duration,milliseconds
0,841,153,1,1,17:05:23,26.898,26898
1,841,30,1,1,17:05:52,25.021,25021
2,841,17,1,11,17:20:48,23.426,23426
3,841,4,1,12,17:22:34,23.251,23251
4,841,13,1,13,17:24:10,23.842,23842


## Pooling data
To make it possible to combine and compare data from different files, I will pool the data.

In [8]:
#calculate the amount of pitstops as the dataset gives us every time the driver made a pitstop. If we take the highest number for every driver, we get the amount of pitstops per driver.
from functools import reduce
filtered_pitstops_stop = pitstops.groupby(['raceId', 'driverId']).agg({'stop':'max'}).reset_index()

merged_csv = reduce(lambda x,y: pd.merge(x,y, on=[ 'driverId'], how='inner'), [qualifying, drivers])
merged_csv = reduce(lambda x,y: pd.merge(x,y, on=[ 'raceId'], how='inner'), [merged_csv, races])
merged_csv = reduce(lambda x,y: pd.merge(x,y, on=[ 'raceId','driverId'], how='inner'), [merged_csv, filtered_pitstops_stop, driverstanding])
merged_csv = reduce(lambda x,y: pd.merge(x,y, on=[ 'raceId','constructorId'], how='inner'), [merged_csv, constructorsstanding])

In [9]:
merged_csv.head()

Unnamed: 0,qualifyId,raceId,driverId,constructorId,number_x,position_x,q1,q2,q3,driverRef,...,driverStandingsId,points_x,position_y,positionText_x,wins_x,constructorStandingsId,points_y,position,positionText_y,wins_y
0,4538,841,1,1,3,2,1:25.384,1:24.595,1:24.307,hamilton,...,64691,18.0,2,2,0,24661,26.0,2,2,0
1,4540,841,18,1,4,4,1:25.886,1:24.957,1:24.779,button,...,64695,8.0,6,6,0,24661,26.0,2,2,0
2,4555,841,5,205,20,19,1:29.254,\N,\N,kovalainen,...,64707,0.0,18,18,0,24666,0.0,7,7,0
3,4556,841,15,205,21,20,1:29.342,\N,\N,trulli,...,64702,0.0,13,13,0,24666,0.0,7,7,0
4,4544,841,13,6,6,8,1:26.031,1:25.611,1:25.599,massa,...,64696,6.0,7,7,0,24663,18.0,3,3,0


## split into X, Y, train and test

### X
Train and test

In [10]:
X_full = merged_csv[
    ['year', 'circuitId', 'name', 'driverId', 'forename', 'surname', 'position_y', 'stop', 'position']].rename(
    columns={'position_y': 'position_qualy', 'position': 'position_constructor'})
#split in train and test on position_qualy, position_constructor and stop
X_train = X_full[X_full['year'] <= 2021]
X_train_filtered = X_train.filter(['position_qualy', 'stop', 'position_constructor'])
X_test = X_full[X_full['year'] > 2021]
X_test_filtered = X_test.filter(['position_qualy', 'stop', 'position_constructor'])

In [11]:
X_test_filtered.head()

Unnamed: 0,position_qualy,stop,position_constructor
4242,3,3,2
4243,4,3,2
4244,9,3,5
4245,7,3,5
4246,17,3,7


In [12]:
X_test

Unnamed: 0,year,circuitId,name,driverId,forename,surname,position_qualy,stop,position_constructor
4242,2022,3,Bahrain Grand Prix,1,Lewis,Hamilton,3,3,2
4243,2022,3,Bahrain Grand Prix,847,George,Russell,4,3,2
4244,2022,3,Bahrain Grand Prix,4,Fernando,Alonso,9,3,5
4245,2022,3,Bahrain Grand Prix,839,Esteban,Ocon,7,3,5
4246,2022,3,Bahrain Grand Prix,807,Nico,Hülkenberg,17,3,7
4247,2022,3,Bahrain Grand Prix,840,Lance,Stroll,12,3,7
4248,2022,3,Bahrain Grand Prix,815,Sergio,Pérez,18,3,10
4249,2022,3,Bahrain Grand Prix,830,Max,Verstappen,19,3,10
4250,2022,3,Bahrain Grand Prix,817,Daniel,Ricciardo,14,3,9
4251,2022,3,Bahrain Grand Prix,846,Lando,Norris,15,3,9


### Y
Train and test

In [13]:
# Y
Y = merged_csv[['year', 'position_x']].rename({'position_x': 'position_finish'}, axis=1)
#split in train and test
Y_train = Y[Y['year'] <= 2021]
Y_test = Y[Y['year'] > 2021]

In [14]:
Y_test.head()

Unnamed: 0,year,position_finish
4242,2022,5
4243,2022,9
4244,2022,8
4245,2022,11
4246,2022,17


## (De)normalize data
I will have to normalize the data. This will get us a better result.
I will do this using this formula:
$\frac{data – np.min(data)}{np.max(data) – np.min(data)}$
This will get us results between 0 and 1.
To later show a clear picture of the result and ranking, I will also denormalize the data.
I use the same formula but remodelled it:
$data * (max - min) + min$

In [15]:
def normalize(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))
def denormalize(data, min, max):
    return (data * (max - min)) + min

I will apply the normalize function to the data.



In [16]:
import numpy as np
#train
#X
X_norm_train = normalize(X_train_filtered)
X_norm_train.insert(0, 'ones', 1)
#Y
Y_norm_train = normalize(Y_train)

#test
#X
X_norm_test = normalize(X_test_filtered)
X_norm_test.insert(0, 'ones', 1)
#Y
Y_norm_test = normalize(Y_test)

  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)


## Regression
Now, I will write the function to calculate the ranking using the linear regression algorithm. I will do this without using any libraries.
I have to make sure I don't overfit the data by not using too many degrees of freedom.
For one degree of freedom, I can use the following formula: Yi = aplha + beta * Xi + epsilon. Where Xi is the input and Yi is the output and epsilon is the error. beta is the coefficient and alpha is the intercept.
For multiple degrees of freedom, I can use the following formula: Yi =beta1 * Xi1 + beta2 * Xi2 + beta(n-1) * Xi(n-1) + epsilon. This is called the unique optimal parameter.

### Unique optimal parameters
Using $\beta$ = (X * $X^T$ $)^-1$ * $X^T$ * Y, I can calculate the optimal parameters for the regression. (X * $X^T$ $)^-1$ * $X^T$ is built in numpy as np.linalg.pinv(X) [np.linalg.pinv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html). Then, I have to multiply the matrix of X with the matrix of Y. I will do this using the built in function: np.matmul() [np.matmul](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html).

In [17]:
def unique_optimal_parameters(X, Y):
    return np.matmul(np.linalg.pinv(X), Y)

## Bèta
Using the formula above, I calculate bèta to use in the formula for the regression.

In [18]:
beta = unique_optimal_parameters(X_norm_train, Y_norm_train['position_finish'])
beta

array([0.07973602, 0.46047464, 0.00195947, 0.40552228])

# Test
Let's complete the puzzle by throwing all the pieces together.
As earlier written, I use np.matmul to multiply the matrix of X with the matrix of bèta.
To understand the result, I denormalize the values as well.

In [19]:
def predict(X, beta):
    return np.matmul(X, beta)

In [20]:
X_pred = predict(X_norm_test, beta.T)
X_pred_denorm = denormalize(X_pred,np.min(X_test_filtered['position_qualy']), np.max(X_test_filtered['position_qualy']) ).values.reshape(-1).astype(float)

Create an extra column with the raw prediction values.
I also changed the column name of the circuits from 'name' to 'circuit'.
And lastly, I added the column 'actual' to the dataframe to compare the actual and the predicted ranking.

In [21]:
X_test['prediction'] = X_pred_denorm.tolist()
#rename name to circuit_name
X_test.rename(columns={'name':'circuit'}, inplace=True)
X_test['actual'] = Y_test['position_finish'].values.reshape(-1).astype(int)

In [22]:
X_test.head()

Unnamed: 0,year,circuitId,circuit,driverId,forename,surname,position_qualy,stop,position_constructor,prediction,actual
4242,2022,3,Bahrain Grand Prix,1,Lewis,Hamilton,3,3,2,4.329266,5
4243,2022,3,Bahrain Grand Prix,847,George,Russell,4,3,2,4.789741,9
4244,2022,3,Bahrain Grand Prix,4,Fernando,Alonso,9,3,5,9.660422,8
4245,2022,3,Bahrain Grand Prix,839,Esteban,Ocon,7,3,5,8.739472,11
4246,2022,3,Bahrain Grand Prix,807,Nico,Hülkenberg,17,3,7,15.056424,17


# Result analysis
Let's calculate the difference between the predicted and the actual ranking and show it in the percent.

In [23]:
def result_analysis(X_test):
    X_test['difference_prediction'] = X_test['prediction'] - X_test['actual']
    X_test['difference_ranking'] = X_test['predicted_ranking'] - X_test['actual']
    X_test['percentage_difference_prediction'] = ((X_test['difference_prediction'] / X_test['actual']  * 100 ).round(2)).astype(str) + '%'
    X_test['percentage_difference_ranking'] = ((X_test['difference_ranking'] / X_test['actual']  * 100 ).round(2)).astype(str) + '%'

In [24]:
#create ranking based on prediciton
X_test['predicted_ranking'] = X_test.groupby(['circuitId'])['prediction'].rank(ascending=True).astype(int)
result_analysis(X_test)

In [25]:
X_test.head()

Unnamed: 0,year,circuitId,circuit,driverId,forename,surname,position_qualy,stop,position_constructor,prediction,actual,predicted_ranking,difference_prediction,difference_ranking,percentage_difference_prediction,percentage_difference_ranking
4242,2022,3,Bahrain Grand Prix,1,Lewis,Hamilton,3,3,2,4.329266,5,3,-0.670734,-2,-13.41%,-40.0%
4243,2022,3,Bahrain Grand Prix,847,George,Russell,4,3,2,4.789741,9,4,-4.210259,-5,-46.78%,-55.56%
4244,2022,3,Bahrain Grand Prix,4,Fernando,Alonso,9,3,5,9.660422,8,10,1.660422,2,20.76%,25.0%
4245,2022,3,Bahrain Grand Prix,839,Esteban,Ocon,7,3,5,8.739472,11,7,-2.260528,-4,-20.55%,-36.36%
4246,2022,3,Bahrain Grand Prix,807,Nico,Hülkenberg,17,3,7,15.056424,17,14,-1.943576,-3,-11.43%,-17.65%


Let's make the set easy to read.
Convert the predictions to positions from 1 to 20.
Even tough there are multiple predictions for 5th place at the Australian grand prix, there can only be one 5th place. For this example none of both the predicted 5th places are ranked 5th because there aren't enough small values to fill up places one to four.
I also show the ranking from 1 to 20.

In [26]:
#order ranking and circuit from 1 to 20
X_test = X_test.sort_values(by=['circuit','predicted_ranking'])
X_test = X_test[['year','circuit','forename', 'surname', 'position_qualy', 'stop', 'position_constructor', 'prediction', 'predicted_ranking','actual', 'difference_prediction', 'difference_ranking', 'percentage_difference_prediction', 'percentage_difference_ranking']]

In [27]:
X_test

Unnamed: 0,year,circuit,forename,surname,position_qualy,stop,position_constructor,prediction,predicted_ranking,actual,difference_prediction,difference_ranking,percentage_difference_prediction,percentage_difference_ranking
4294,2022,Australian Grand Prix,Charles,Leclerc,1,1,1,2.514984,1,1,1.514984,0,151.5%,0.0%
4280,2022,Australian Grand Prix,George,Russell,2,1,2,3.831562,2,6,-2.168438,-4,-36.14%,-66.67%
4279,2022,Australian Grand Prix,Lewis,Hamilton,5,1,2,5.212986,3,5,0.212986,-2,4.26%,-40.0%
4283,2022,Australian Grand Prix,Sergio,Pérez,4,1,3,5.608613,4,3,2.608613,1,86.95%,33.33%
4284,2022,Australian Grand Prix,Max,Verstappen,6,1,3,6.529563,5,2,4.529563,3,226.48%,150.0%
4286,2022,Australian Grand Prix,Lando,Norris,8,1,4,8.306615,6,4,4.306615,2,107.67%,50.0%
4282,2022,Australian Grand Prix,Esteban,Ocon,7,1,5,8.702243,7,8,0.702243,-1,8.78%,-12.5%
4285,2022,Australian Grand Prix,Daniel,Ricciardo,11,1,4,9.688039,8,7,2.688039,1,38.4%,14.29%
4287,2022,Australian Grand Prix,Valtteri,Bottas,10,1,6,10.939769,9,12,-1.060231,-3,-8.84%,-25.0%
4289,2022,Australian Grand Prix,Kevin,Magnussen,9,1,7,11.335397,10,17,-5.664603,-7,-33.32%,-41.18%


The prediction is done. From the prediciton, ranking, difference and percentage difference we can see that the prediction is decent. Formula 1 is a sport with a lot of important details and human errors. This makes it hard to exactly predict the results. This shows when the precentage_difference is extremely high. Then I should look into the race to analyse what happened.
The position and position_constructor are alternating a lot in the beginning of the season. We can see this for Max Verstappen and Red Bull. He started the season either winning or DNF'ing (Did not finish by a problem with his car). I expect that the prediction will get better, further on in the season as the points will be more spread out.

The prediction doesn't go below 2 and above 18. because of this, the difference is in those position bigger than in the middle.

## Compare with sklearn.linear_model -  Linear Regression
If delta is positive, my model was closer to the real value. If delta is negative, the sklearn model was closer away from the real value.

In [29]:
from sklearn.linear_model import LinearRegression
linreg = LinearRegression().fit(X_norm_train, Y_norm_train['position_finish'])
print('linear model coeff (w): {}'.format(linreg.coef_))
sklearn_pred = denormalize(linreg.predict(X_norm_test),np.min(X_test_filtered['position_qualy']), np.max(X_test_filtered['position_qualy']))
#add the prediction to sklearn_pred
sklearn_pred = pd.DataFrame(sklearn_pred, columns=['skl_prediction'])
sklearn_pred['own_prediction'] = X_test['prediction'].tolist()
sklearn_pred['actual'] = X_test['actual'].tolist()
sklearn_pred['difference_skl'] = sklearn_pred['skl_prediction'] - sklearn_pred['actual']
sklearn_pred['difference_own'] = sklearn_pred['own_prediction'] - sklearn_pred['actual']
sklearn_pred['delta'] = np.abs(sklearn_pred['difference_skl']) - np.abs(sklearn_pred['difference_own'])
skl_count = 0
own_count = 0
#if sklearn_pred['delta'] is bigger than zero, skl_count +=1
for i in range(len(sklearn_pred)):
    if sklearn_pred['delta'][i] < 0:
        skl_count +=1
    if sklearn_pred['delta'][i] > 0:
        own_count +=1
print('sklearn model was closer to the real value {} times.'.format(skl_count), 'My own model was closer to the real value {} times'.format(own_count), 'My own model was better than sklearn {} times.'.format(own_count - skl_count))
sklearn_pred

linear model coeff (w): [0.         0.46047464 0.00195947 0.40552228]
sklearn model was closer to the real value 11 times. My own model was closer to the real value 44 times My own model was better than sklearn 33 times


Unnamed: 0,skl_prediction,own_prediction,actual,difference_skl,difference_own,delta
0,4.329266,2.514984,1,3.329266,1.514984,1.814282
1,4.789741,3.831562,6,-1.210259,-2.168438,-0.958179
2,9.660422,5.212986,5,4.660422,0.212986,4.447436
3,8.739472,5.608613,3,5.739472,2.608613,3.130859
4,15.056424,6.529563,2,13.056424,4.529563,8.526861
5,12.754051,8.306615,4,8.754051,4.306615,4.447436
6,18.085206,8.702243,8,10.085206,0.702243,9.382964
7,18.545681,9.688039,7,11.545681,2.688039,8.857643
8,15.387205,10.939769,12,3.387205,-1.060231,2.326974
9,15.84768,11.335397,17,-1.15232,-5.664603,-4.512283


 the 55 data points, the sklearn model was closer to the real value 11 times while my own model was closer to the real value 44 times.
As earlier stated, the further in the season, the better the prediction. This is also clear in the delta values. test

# Conclusion
I started this project hoping to fully simulate the F1 season. By choosing pitstops, the drivers and the team scores, I wasn't able to predict the results for races that haven't been completed. It makes sense as the scores will change (which I could complete with the predicted ranking) but I can't predict the amount of pitstops. In the new regulations, they use new 18 inch tyres which might change the amount of pitstops. If I would take the average over the years, that also wouldn't be fair because an outlier would change the data a lot. New circuits wouldn't have a lot of data.  Tyre management is a very important part of Formula 1 and since some drivers changed teams and others driver are for the first time in F1 and much more. It wouldn't be a trusted feature.

## Sources
- [Formula 1: The Complete Guide to the Season](https://www.formula1.com/en/results.html)
- [Formula 1 Race Predictor](https://towardsdatascience.com/formula-1-race-predictor-5d4bfae887da)
- [Multivariate Linear Regression in Python Without Scikit-Learn using Normal Equation](https://medium.com/@siddhantagarwal99/multivariate-linear-regression-in-python-without-scikit-learn-using-normal-equation-bc3ab4334f11)
- [formula-1-world-championship-1950-2020](https://www.kaggle.com/datasets/rohanrao/formula-1-world-championship-1950-2020)