This is a graphical analysis of FiveThirtyEight's 2018 FIFA World Cup tournament predictions. We will do a sort of 'Bayesian' analysis in that we will compare how well the predictions match suspected, logical trends in the tournament data (for example, how we might expect game score differentials to decrease as games become more competitive later in the tournament).

First, we import the CSV dataset into a pandas dataframe.

In [None]:
import pandas as pd

df = pd.read_csv("wc_matches.csv")

with pd.option_context('display.max_rows', 10, 'display.max_columns', None):
    display(df)

For our first analysis, we are going to compare the projected scores to the actual scores for each match by plotting the scalar goal differentials.

In [43]:
scores = df[['team1', 'team2', 'proj_score1', 'proj_score2', 'score1', 'score2']].copy()

scores['proj_diff'] = abs(scores['proj_score1'] - scores['proj_score2'])
scores['diff'] = abs(scores['score1'] - scores['score2'])

with pd.option_context('display.max_rows', 10, 'display.max_columns', None):
    display(scores)

Unnamed: 0,team1,team2,proj_score1,proj_score2,score1,score2,proj_diff,diff
0,Russia,Saudi Arabia,2.03,0.73,5,0,1.30,5
1,Egypt,Uruguay,0.82,1.61,0,1,0.79,1
2,Morocco,Iran,1.13,0.86,0,1,0.27,1
3,Portugal,Spain,1.07,1.60,3,3,0.53,0
4,France,Australia,2.61,0.69,2,1,1.92,1
...,...,...,...,...,...,...,...,...
59,Russia,Croatia,1.10,1.52,2,2,0.42,0
60,France,Belgium,1.62,1.52,1,0,0.10,1
61,Croatia,England,1.10,1.34,2,1,0.24,1
62,Belgium,England,1.42,1.39,2,0,0.03,2


Preparing data for visualizing

In [None]:
proj_data = scores['proj_diff'].tolist()
real_data = scores['diff'].tolist()
matches = [team1 + '-' + team2 for team1, team2 in zip(scores['team1'], scores['team2'])]

In [None]:
import plotly
import plotly.plotly as py
import plotly.graph_objs as go


Formatting bar graph plot.

In [19]:
import plotly.plotly as py
import plotly.graph_objs as go

plotly.tools.set_credentials_file(username='alimsta', api_key='r3rCBmfUAdlXGChHeolY')

proj_bar = go.Bar(
    x=matches,
    y=proj_data,
    text=proj_data,
    textposition = 'auto',
    marker=dict(
        color='rgb(204,204,204)'
        ),
    name = 'Projected',
    opacity=0.7
)

real_bar = go.Bar(
    x=matches,
    y=real_data,
    text=real_data,
    textposition = 'auto',
    marker=dict(
        color='rgb(49,130,189)'
        ),
    name = 'Actual',
    opacity=0.7
)

plots = [proj_bar,real_bar]


py.iplot(plots, filename='games-graph')

One might expect there to be a decrease in goal differential (particularly starting during the playoff stage where the top two teams from each group have moved on) as games become more competitive. We can see if this phenomenon exists by creating regression lines for the projected and actual games.

In [42]:
from scipy import stats
import numpy as np

xi = np.arange(1, 65, 1)
y1 = proj_data
y2 = real_data

slope1, intercept1, r_value1, p_value1, std_err1 = stats.linregress(xi,y1)
line1 = slope1*xi+intercept1

slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(xi,y2)
line2 = slope2*xi+intercept2

trace1 = go.Scatter(
                  x=xi,
                  y=line1,
                  mode='lines',
                  marker=dict(color='blue'),
                  name='Projected Fit'
                  )

trace2 = go.Scatter(
                  x=xi,
                  y=line2,
                  mode='lines',
                  marker=dict(color='green'),
                  name='Real Fit'
                  )

scatter1 = go.Scatter(x = xi, 
                     y = proj_data,
                     text = matches,
                    mode = 'markers',
                    marker=dict(color='blue'),
                     name = 'Projected',
                     opacity = 0.5)
scatter2 = go.Scatter(x = xi, 
                     y = real_data,
                     text = matches,
                      mode = 'markers',
                      marker=dict(color='green'),
                     name = 'Real',
                     opacity = 0.5)


plots = [scatter1,scatter2,trace1,trace2]


py.iplot(plots, filename='scatter-graph')

We can see a gradual decrease in the actual goal differential through the tournament as we would expect. FiveThirtyEight's projected scores also corroborate this observation, albeit with greater slope. This could suggest some things about their predictions. They could believe that, based on the knockout stage games, the advancing teams have little variance in performance and so games should be extremely close in score. Additionally, they could have historical reasons to believe the scores would be closer, perhaps referencing previous head-to-head mathches.