Euro 2020 (2021) predictions

Computational notebooks for this analysis can be found here and here

The challenge

Inspired by my performance in an office World Cup predictor, I decided to take that model and, hopefully, improve on it for Euro 2020.

I wasn't part of a similar workplace competition this time, so decided to enter UEFA's online prediction competition.

The data

I wanted to make this a supervised learning model. To this end I looked at past (and present) competitions and metrics for the countries involed.

I gathered fixtures/results from FBRef, stadium info from Wikipedia, Elo data from eloratings.net, and population and GDP data from the Penn tables (as now maintained by University of Groningen). The past tournaments and Elo ratings went back to 2000. Penn tables data was taken as per the end of the previous calendar year (e.g. 1999 figures for matches played in 2000).

The method

I focussed on what I felt were a handful of key indicators from previous work: Elo ratings and Home advantage (as used in WC 2018 predictions), and Experience, Population and GDP per capita (as used in Soccernomics by Kuper & Szymanski).

A random 20% of the past tournament matches were held for testing. This gave us 140 training samples and 35 test samples.

I opted for 2 target variables: Goal difference and Goal total. Goal diff is a metric widely used for predicting results but also capturing Goal total allows us (in theory) to simply convert into predicted match scores for both teams

Both targets were then fitted using a selection of 10 regression algorithms.

Dummy (mean) - always predicts the mean of the training set
Dummy (median) - always predicts the median of the training set
Linear Regression
Lasso
Ridge
Random Forest
Gradient Boost
Support Vector Machine (linear kernel)
Support Vector Machine (rbf kernel)
Custom Elo Regressor - approximates my World Cup 2018 model

(All but the EloRegressor had a standardised scaling applied to avoid any affects of differently scaled features)

From this, I selected the Elo model for Goal diff and Lasso for Goal total.

The results

The full dataset was assigned predictions, which could then be compared with actual results as they came in.

As part of this, "prediction points" were calculated based on the same citeria used in the World Cup comp. To recap, the original scoring system was 3 points for correct score, 2 points for correct goal difference, 1 point for correct result per game.

The predictions for Euro 2020 group matches were then entered into UEFA's Tournament and Match predictors.

Group A
Italy
Switzerland
Turkey
Wales

Group B
Belgium
Denmark
Russia
Finland

Group C
Netherlands
Ukraine
Austria
North Macedonia

Group D
England
Croatia
Czech Republic
Scotland

Group E
Spain
Poland
Sweden
Slovakia

Group C
Germany
France
Portugal
Hungary

Third-placed teams
Portugal
Austria
Sweden
Russia
Turkey
Czech Republic

From this I could extrapolate the knockout results as follows...

After the Group stage the knockout predictions were updated to the following...

Finally, here is a summary of all the models' predictions vs actual results...

	Matches played	Points per game	% correct result	% correct goal diff	% correct score	Goals per game (predicted)	Goals per game (actual)	% games won (predicted)	% games won (actual)
2000	31	0.71	35%	19%	16%	2.48	2.84	52%	87%
2004	31	0.74	42%	23%	10%	2.61	2.74	74%	74%
2008	31	0.74	35%	26%	13%	2.45	2.61	45%	87%
2012	31	0.84	42%	26%	16%	2.45	2.58	45%	84%
2016	51	0.86	45%	27%	14%	2.69	2.31	75%	78%
2021	51	1.04	55%	31%	18%	2.57	2.78	67%	76%
Training	140	0.81	42%	24%	14%	2.55	2.64	61%	81%
Testing	35	0.71	34%	26%	11%	2.57	2.37	56%	86%
Live	51	1.04	55%	31%	18%	2.57	2.78	67%	76%
Overall	226	0.85	44%	26%	15%	2.56	2.63	62%	81%

I was really pleased with how the model performed. In the Uefa match predictor I placed in the top 20% of all competitors with 145 pts (vs 255 for the winner). As I didn't make any first score predictions or use the 2x boosters avaialable, I felt this was pretty reasonable. In the Uefa tournament predictor, I placed in the top 32% with 49ts, and in the top 9% with 24pts just for the knockout predictions. As with my World Cup model it under-predicted the number of goals and wins. But within a much more robust and test-able framework there's greater scope to refine this before the next tournament!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intl_02_euro_2020.md

intl_02_euro_2020.md

Euro 2020 (2021) predictions

Files

intl_02_euro_2020.md

Latest commit

History

intl_02_euro_2020.md

File metadata and controls

Euro 2020 (2021) predictions