# The baseline model
We can start out by creating a baseline model, which we can use to compare our new and improved models up against. We start by importing necessary packages, and by fixing the seed for reprodusability.

In [11]:
import numpy as np
from utils import DataAggregator

In [12]:
np.random.seed(42)

In [13]:
data_aggregator = DataAggregator()
data = data_aggregator.get_data(["E0"])
data.head()

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A
0,E0,15/08/09,Aston Villa,Wigan,0,2,A,0,1,A,...,14,4,6,2,2,0,0,1.67,3.6,5.5
1,E0,15/08/09,Blackburn,Man City,0,2,A,0,1,A,...,9,5,4,2,1,0,0,3.6,3.25,2.1
2,E0,15/08/09,Bolton,Sunderland,0,1,A,0,1,A,...,10,4,7,2,1,0,0,2.25,3.25,3.25
3,E0,15/08/09,Chelsea,Hull,2,1,H,1,1,D,...,15,12,4,1,2,0,0,1.17,6.5,21.0
4,E0,15/08/09,Everton,Arsenal,1,6,A,0,3,A,...,13,4,9,0,0,0,0,3.2,3.25,2.3


In [14]:
home_percentage, draw_percentage, away_percentage = data["FTR"].value_counts(normalize=True)

In [15]:
print(f"The number of games won by the home team is {home_percentage:.2%}")
print(f"The number of games ending in a draw is {draw_percentage:.2%}")
print(f"The number of games won by the away team is {away_percentage:.2%}")

The number of games won by the home team is 45.95%
The number of games ending in a draw is 29.49%
The number of games won by the away team is 24.56%


Now we can start by implementing the baseline model. This model will be based on the probabilities defined above, picking the outcome of the game. This can easily be done using numpys np.random.choice(array, probabilites).

In [None]:
possible_outcomes = ["H", "D", "A"]
for index, row in data.iterrows():
    data.at[index, "prediction"] = np.random.choice(possible_outcomes, p=[home_percentage, draw_percentage, away_percentage])

In [17]:
accuracy, won = data_aggregator.calculate_accuracy(data, "FTR", "prediction")

In [None]:
print(f"""The accuracy of the model is {accuracy:.2%}
The accuracy of the model is {accuracy*len(data):.0f} out of {len(data)} games
""")

The accuracy of the model is 35.47%
The accuracy of the model is 2994 out of 8440 games.



In [19]:
print(f"With this model, the expected return on value would be {won:.2f}€")

With this model, the expected return on value would be -5009.83€


For further comparison, lets save the metrics of the model.

In [20]:
data_aggregator.save_metrics("baseline", accuracy, won)