# Odds model
For our second model, we want to take advantage of the fact that we have the odds calculated by the odds company as part of our dataset. What happens if we just predict that the team with the lowest odds wins the match?

In [1]:
import numpy as np
from utils import DataAggregator

In [2]:
np.random.seed(42)

In [3]:
data_aggregator = DataAggregator()
data = data_aggregator.get_data(["E0"])
data.head()

Unnamed: 0,Div,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR,HTHG,HTAG,HTR,...,AF,HC,AC,HY,AY,HR,AR,B365H,B365D,B365A
0,E0,13/08/05,Aston Villa,Bolton,2,2,D,2,2,D,...,16,7,8,0,2,0,0,2.3,3.25,3.0
1,E0,13/08/05,Everton,Man United,0,2,A,0,1,A,...,14,8,6,3,1,0,0,5.0,3.4,1.72
2,E0,13/08/05,Fulham,Birmingham,0,0,D,0,0,D,...,13,6,6,1,2,0,0,2.37,3.25,2.87
3,E0,13/08/05,Man City,West Brom,0,0,D,0,0,D,...,11,3,6,2,3,0,0,1.72,3.4,5.0
4,E0,13/08/05,Middlesbrough,Liverpool,0,0,D,0,0,D,...,11,5,0,2,3,1,0,2.87,3.2,2.4


Now we can start by implementing the base model. This model will like mentioned be based on the betting company's odds calculations for each match. We will simply choose the outcome with the lowest odds, either a home win, an away win, or a draw.

In [4]:
def predict_result(row):
    if row["B365H"] <= row["B365D"] and row["B365H"] <= row["B365A"]:
        return "H"
    elif row["B365D"] < row["B365H"] and row["B365D"] < row["B365A"]:
        return "D"
    return "A"

data["prediction"] = data.apply(predict_result, axis=1)


In [5]:
accuracy, won = data_aggregator.calculate_accuracy(data, "FTR", "prediction")

In [6]:
print(f"""The accuracy of the model is {accuracy:.2%}
The accuracy of the model is {accuracy*len(data):.0f} out of {len(data)} games.
""")

The accuracy of the model is 54.54%
The accuracy of the model is 4603 out of 8440 games.



In [7]:
print(f"With this model, the expected return of investment would be {won:.2f}€")

With this model, the expected return of investment would be -2464.29€


As we can se from the result above, this model is actually quite accurate, with an accuracy of 54.54%. However, since it always chooses the lowest odds option, the return of investment is negative. This makes sense because even though the model wins 54.54% of its bets, it will rarely get a large return from a single bet. If this model would have yielded a positive return of investment, the betting companies would have had to make a lot of big mistakes in their odds calculations.

In [8]:
data_aggregator.save_metrics("odds", accuracy, won)