# Inference

We have previously trained a model, now we provide it new data, and attempt to generate some predictions

Load in the old model

In [1]:
import models

mod = models.loadModel("model.joblib")
mod

And load the new dataset

In [2]:
trainData, testData, cols = models.loadData("cleaned_results.csv")

print(f"Columns: {', '.join(cols)}")
trainData["x"].head()

Columns: Home_Team, Away_Team, Season, Round, Elo_home, Elo_away


Unnamed: 0,Home_Team,Away_Team,Season,Round,Elo_home,Elo_away
2540,448,365,2022,10,83,64
2730,50,205,2022,3,97,78
1103,165,172,2022,13,59,47
2603,548,33,2022,17,38,67
2665,161,69,2022,24,71,62


We also need to reduce our dataset to the selected features

In [3]:
import numpy as np
feats = list(np.load("selectedFeatures.npy"))

trainData = models.subData(trainData, feats)
testData = models.subData(testData, feats)

print(f"Selected features: {', '.join(feats)}")
trainData["x"].head()

Selected features: Home_Team, Away_Team, Season, Round, Elo_home, Elo_away


Unnamed: 0,Away_Team,Elo_away,Elo_home,Home_Team,Round,Season
2540,365,64,83,448,10,2022
2730,205,78,97,50,3,2022
1103,172,47,59,165,13,2022
2603,33,67,38,548,17,2022
2665,69,62,71,161,24,2022


We can then evaluate this model on the newer data

In [4]:
models.trainAndScore(mod, trainData, testData)
models.performace(mod, trainData, testData)

Training model: LinearDiscriminantAnalysis
Performance summary for LinearDiscriminantAnalysis
Score:
- Training:  0.4703
- Testing:   0.4512
- Difference:-0.0191
Performance summary for LinearDiscriminantAnalysis
Score:
- Training:  0.4703
- Testing:   0.4512
- Difference:-0.0191


(0.4512338425381904, 0.4703115814226925)

# Predictions

We now have trained a model up to the newest dataset, we can start to generate predictions

First we load the new features and ensure they are reduced to our selected features

In [5]:
newData, _, newCols = models.loadData("to_predict.csv", 0, hasY=False)
newData = models.subData(newData, feats)

newData["x"].head()

Unnamed: 0,Away_Team,Elo_away,Elo_home,Home_Team,Round,Season
120,432,74,73,98,33.0,2022.0
35,417,46,60,503,35.0,2022.0
4,326,61,75,533,30.0,2022.0
25,80,52,77,460,42.0,2022.0
1,184,54,48,431,30.0,2022.0


And then make a set of predictions

In [6]:
prediction = mod.predict(newData["x"])
prediction

array([2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3, 2, 3, 2, 2, 2, 3, 3, 2, 3, 3,
       3, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2,
       2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2,
       2, 2, 3, 3, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3, 2, 2, 3,
       2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 3, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 3,
       2, 2, 2, 2, 3, 3])

Which we can translate into a set of outcomes

In [7]:
predicted = newData["x"].copy()
predicted["Outcome"] = prediction

# convert from number to score
# 0 for zero draw, 1 for nonzerodraw, 2 for homewin, 3 for awaywin
predicted["Outcome"].replace(0, "0-0", inplace=True)
predicted["Outcome"].replace(1, "Tie", inplace=True)
predicted["Outcome"].replace(2, "Home Win", inplace=True)
predicted["Outcome"].replace(3, "Away Win", inplace=True)

predicted

Unnamed: 0,Away_Team,Elo_away,Elo_home,Home_Team,Round,Season,Outcome
120,432,74,73,98,33.0,2022.0,Home Win
35,417,46,60,503,35.0,2022.0,Home Win
4,326,61,75,533,30.0,2022.0,Home Win
25,80,52,77,460,42.0,2022.0,Home Win
1,184,54,48,431,30.0,2022.0,Home Win
...,...,...,...,...,...,...,...
116,518,39,63,191,30.0,2022.0,Home Win
31,154,52,52,147,35.0,2022.0,Home Win
81,350,72,75,524,30.0,2022.0,Home Win
48,211,60,45,198,30.0,2022.0,Away Win


And show some interesting stats about the prediction

In [8]:
h = len(np.where(prediction==2)[0])
a = len(np.where(prediction==3)[0])
t = len(prediction) - h - a

f"The network predicts there will be {t} Ties, {h} Home wins, and {a} Away wins."

'The network predicts there will be 0 Ties, 108 Home wins, and 30 Away wins.'