Skip to content

Latest commit

 

History

History
315 lines (281 loc) · 8.29 KB

intl_02_euro_2020.md

File metadata and controls

315 lines (281 loc) · 8.29 KB

Euro 2020 (2021) predictions

Computational notebooks for this analysis can be found here and here

The challenge

Inspired by my performance in an office World Cup predictor, I decided to take that model and, hopefully, improve on it for Euro 2020.

I wasn't part of a similar workplace competition this time, so decided to enter UEFA's online prediction competition.

The data

I wanted to make this a supervised learning model. To this end I looked at past (and present) competitions and metrics for the countries involed.

I gathered fixtures/results from FBRef, stadium info from Wikipedia, Elo data from eloratings.net, and population and GDP data from the Penn tables (as now maintained by University of Groningen). The past tournaments and Elo ratings went back to 2000. Penn tables data was taken as per the end of the previous calendar year (e.g. 1999 figures for matches played in 2000).

png png png png png png png

The method

I focussed on what I felt were a handful of key indicators from previous work: Elo ratings and Home advantage (as used in WC 2018 predictions), and Experience, Population and GDP per capita (as used in Soccernomics by Kuper & Szymanski).

A random 20% of the past tournament matches were held for testing. This gave us 140 training samples and 35 test samples.

I opted for 2 target variables: Goal difference and Goal total. Goal diff is a metric widely used for predicting results but also capturing Goal total allows us (in theory) to simply convert into predicted match scores for both teams

Both targets were then fitted using a selection of 10 regression algorithms.

  • Dummy (mean) - always predicts the mean of the training set
  • Dummy (median) - always predicts the median of the training set
  • Linear Regression
  • Lasso
  • Ridge
  • Random Forest
  • Gradient Boost
  • Support Vector Machine (linear kernel)
  • Support Vector Machine (rbf kernel)
  • Custom Elo Regressor - approximates my World Cup 2018 model

(All but the EloRegressor had a standardised scaling applied to avoid any affects of differently scaled features)

png png

From this, I selected the Elo model for Goal diff and Lasso for Goal total.

png png

The results

The full dataset was assigned predictions, which could then be compared with actual results as they came in.

As part of this, "prediction points" were calculated based on the same citeria used in the World Cup comp. To recap, the original scoring system was 3 points for correct score, 2 points for correct goal difference, 1 point for correct result per game.

The predictions for Euro 2020 group matches were then entered into UEFA's Tournament and Match predictors.

Group A
Italy
Switzerland
Turkey
Wales
Group B
Belgium
Denmark
Russia
Finland
Group C
Netherlands
Ukraine
Austria
North Macedonia
Group D
England
Croatia
Czech Republic
Scotland
Group E
Spain
Poland
Sweden
Slovakia
Group C
Germany
France
Portugal
Hungary
Third-placed teams
Portugal
Austria
Sweden
Russia
Turkey
Czech Republic

From this I could extrapolate the knockout results as follows...

png

After the Group stage the knockout predictions were updated to the following...

png

Finally, here is a summary of all the models' predictions vs actual results...

Matches played Points per game % correct result % correct goal diff % correct score Goals per game (predicted) Goals per game (actual) % games won (predicted) % games won (actual)
2000 31 0.71 35% 19% 16% 2.48 2.84 52% 87%
2004 31 0.74 42% 23% 10% 2.61 2.74 74% 74%
2008 31 0.74 35% 26% 13% 2.45 2.61 45% 87%
2012 31 0.84 42% 26% 16% 2.45 2.58 45% 84%
2016 51 0.86 45% 27% 14% 2.69 2.31 75% 78%
2021 51 1.04 55% 31% 18% 2.57 2.78 67% 76%
Training 140 0.81 42% 24% 14% 2.55 2.64 61% 81%
Testing 35 0.71 34% 26% 11% 2.57 2.37 56% 86%
Live 51 1.04 55% 31% 18% 2.57 2.78 67% 76%
Overall 226 0.85 44% 26% 15% 2.56 2.63 62% 81%

I was really pleased with how the model performed. In the Uefa match predictor I placed in the top 20% of all competitors with 145 pts (vs 255 for the winner). As I didn't make any first score predictions or use the 2x boosters avaialable, I felt this was pretty reasonable. In the Uefa tournament predictor, I placed in the top 32% with 49ts, and in the top 9% with 24pts just for the knockout predictions. As with my World Cup model it under-predicted the number of goals and wins. But within a much more robust and test-able framework there's greater scope to refine this before the next tournament!