# Euro 2020 (2021) Predictions

<!-- Written report for this analysis can be found [here](../reports/boro_01_market_value.md) -->

## 1. Business Understanding

* Determine Busines Objectives
* Situation Assessment
* Determine Data Mining Goal
* Produce Project Plan

```
# 1. Predict results of every match at Euro 2020
# 2. Make predictions before each round of competition
# 3. Ideally, at each round, use the predictions to simulate remainder of competition
# 4. Check against other predictions and actual results
# 5. Write up process (report/blog)
```

## 2. Data Understanding

* Collect Initial Data
* Describe Data
* Explore Data
* Verify Data Quality

### EURO 2020 fixtures/results
* https://en.wikipedia.org/wiki/UEFA_Euro_2020
* https://www.whoscored.com/Regions/247/Tournaments/124/Seasons/7329/Stages/16297/Show/International-European-Championship-2020
* https://www.uefa.com/uefaeuro-2020/fixtures-results/#/md/33673
* https://fbref.com/en/comps/676/schedule/UEFA-Euro-Scores-and-Fixtures

### Historic results
* https://www.staff.city.ac.uk/r.j.gerrard/football/aifrform.html (1871-2001)
* https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017/data (1872-)
* https://fbref.com/en/comps/676/history/European-Championship-Seasons (2000-)
* https://en.wikipedia.org/wiki/UEFA_Euro_2020_qualifying (qualifying)
* https://fbref.com/en/comps/678/Euro-Qualifying-Stats (qualifying)

### ELO ratings
* https://en.m.wikipedia.org/wiki/World_Football_Elo_Ratings
* https://www.eloratings.net/2021_European_Championship / https://www.eloratings.net/about

### Historic trends
* https://blog.annabet.com/soccer-goal-probabilities-poisson-vs-actual-distribution/
* https://en.wikipedia.org/wiki/Poisson_distribution

### GDP
* https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)
* https://en.wikipedia.org/wiki/List_of_countries_by_past_and_projected_GDP_(nominal)
* https://www.rug.nl/ggdc/productivity/pwt/

In [3]:
import pandas as pd
import os

In [74]:
comp_list = []
for file in os.listdir("../data/raw/fbr/competition/"):
    if not file.startswith("Euro"):
        continue
    if not file.endswith(".csv"):
        continue
        
    df = pd.read_csv(os.path.join("../data/raw/fbr/competition/", file))
    df["Filename"] = file
    comp_list.append(df)

# len(comp_list)
comp = pd.concat(comp_list)
comp.dropna(subset=["Round"], inplace=True)
comp.reset_index(drop=True, inplace=True)
comp.columns = ['Round', 'Wk', 'Day', 'Date', 'Time', 'Team_1', 'Score', 'Team_2',
       'Attendance', 'Venue', 'Referee', 'Match Report', 'Notes', 'Filename']
comp["Year"] = comp.Date.str[:4]
comp["Team_abbrev_1"] = comp["Team_1"].str[-2:]
comp["Team_1"] = comp["Team_1"].str[:-3]
comp["Team_abbrev_2"] = comp["Team_2"].str[:2]
comp["Team_2"] = comp["Team_2"].str[3:]
comp["Goals_1"] = comp.Score.str.extract(pat="([0-9]{1,2})–[0-9]{1,2}")
comp["Goals_2"] = comp.Score.str.extract(pat="[0-9]{1,2}–([0-9]{1,2})")
for i in range (1, 3):
    comp["Goals_"+str(i)] = pd.to_numeric(comp["Goals_"+str(i)], errors='coerce')
comp["Goal_diff"] = comp.Goals_1 - comp.Goals_2
comp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 211 entries, 0 to 210
Data columns (total 20 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Round          211 non-null    object 
 1   Wk             168 non-null    float64
 2   Day            211 non-null    object 
 3   Date           211 non-null    object 
 4   Time           211 non-null    object 
 5   Team_1         211 non-null    object 
 6   Score          175 non-null    object 
 7   Team_2         211 non-null    object 
 8   Attendance     175 non-null    float64
 9   Venue          211 non-null    object 
 10  Referee        175 non-null    object 
 11  Match Report   211 non-null    object 
 12  Notes          16 non-null     object 
 13  Filename       211 non-null    object 
 14  Year           211 non-null    object 
 15  Team_abbrev_1  211 non-null    object 
 16  Team_abbrev_2  211 non-null    object 
 17  Goals_1        175 non-null    float64
 18  Goals_2   

In [75]:
comp.loc[:, ["Date", "Year", "Team_1", "Team_2", "Goals_1", "Goals_2", "Goal_diff"]].sample(10, random_state=42)

Unnamed: 0,Date,Year,Team_1,Team_2,Goals_1,Goals_2,Goal_diff
30,2000-07-02,2000,France,Italy,2.0,1.0,1.0
173,2016-07-07,2016,Germany,France,0.0,2.0,-2.0
140,2016-06-16,2016,Ukraine,Northern Ireland,0.0,2.0,-2.0
75,2008-06-13,2008,Netherlands,France,4.0,1.0,3.0
60,2004-07-01,2004,Greece,Czech Republic,1.0,0.0,1.0
208,2021-06-23,2021,Slovakia,Spain,,,
45,2004-06-19,2004,Latvia,Germany,0.0,0.0,0.0
183,2021-06-14,2021,Poland,Slovakia,,,
9,2000-06-15,2000,Sweden,Turkey,0.0,0.0,0.0
100,2012-06-11,2012,Ukraine,Sweden,2.0,1.0,1.0


## 3. Data Preperation

* Select Data
* Clean Data
* Construct Data
* Integrate Data
* Format Data

## 4. Modelling

* Select Modelling Technique
* Generate Test Design
* Build Model
* Assess Model

### Updated WC model
* https://github.com/deacona/the-ball-is-round/blob/master/reports/intl_01_world_cup_2018.md
* https://github.com/deacona/the-ball-is-round/blob/master/notebooks/intl_01_world_cup_2018.ipynb

### "Soccernomics"
* goal diff = (0.6666 * home adv) + (0.5 * relative experience) + (0.1 * relative population) + (0.1 * relative gdp/head) + ...
* e.g. England vs Germany at Euro 96
    * Home = England = 1
    * Exp = 84k v 84k = 0
    * Pop = 57 v 81 = -0.4
    * GDP/h = 1627492 / 57 v 2633828 / 81 = -0.1
    * GD = (0.6666 * 1) + (0.5 * 0) + (0.1 * -0.4) + (0.1 * -0.1) = 0.6
* http://www.soccernomics-agency.com/wordpress/wp-content/uploads/2017/10/soccer-convergence-1.pdf

### Dixon-Coles (and other probability models)
* https://dashee87.github.io/football/python/predicting-football-results-with-statistical-modelling-dixon-coles-and-time-weighting/
* http://www.statsandsnakeoil.com/2018/06/05/modelling-the-world-cup-with-regista/
* http://opisthokonta.net/?cat=48

## 5. Evaluation

* Evaluate Results
* Review Process
* Determine Next Steps

```
# % correct score, goal diff, result, points
# vs historic trends (goals, W/D/L)
```

## 6. Deployment

* Plan Deployment
* Plan Monitoring and Maintenance
* Produce Final Report
* Review Project