The English Premier League season is starting to take shape after 12 games. Patterns are starting to emerge about where things might end up come May. While the league table is what it is, for some time now I have been of the belief that tracking expected goals and expected goals against is the best way to track who the best performing teams in the league are. The reason I believe this is that a lot of randomness is involved in the scoring of a goal, and much less in the creation of it. To be sure, there is a lot of randomness throughout the game, including in the creation of the goal, but I believe this is more true for the scoring of the goal itself than the creation actions. A team that consistently creates goal scoring opportunities is likely to eventually consistently put those chances away, whereas a team that scores goals despite not consistently creating good opportunities is likely to regress to its actual chance creation levels during the course of a season. The same concept applies to expected goals against, a team that consistently gives up good scoring opportunities to the opposition will during the course of a season concede a lot of goals. Therefore, it is my belief that the team that achives the greatest expected goal difference(xGD) througout the season is going to have the best chance of winning the league come May. Inversely, the teams that accumulate the least xGD come may will have the greater chance of getting relegated. 

With this in mind, I am going to be tracking the top 5 and bottom 5 teams in xGD during the season. In this notebook I will build two charts that I will use to show the reaults. 

The first thing I did was import the necessary libraries. Some I had to install, but I removed the installation cells because it does not look presentable with them. Also, I may have imported some libraries that ended up not being used because I went through a few iterations of this where I built different charts that I ended up discarding in favor of simplicity and efficiency

In [10]:
from matplotlib import pyplot as plt
import matplotlib as mpl
import seaborn.objects as so
from dotenv import load_dotenv
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from wordcloud import WordCloud
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd 
import altair as alt
import boto3
import io
import os

load_dotenv()

True

In [3]:
bucket = os.getenv('AWS_BUCKET_NAME')
access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
secret_key = os.getenv('AWS_PRIVATE_KEY')

s3_client = boto3.client(
        's3',
        aws_access_key_id=access_key_id,
        aws_secret_access_key=secret_key)

read_response = s3_client.get_object(Bucket=bucket, Key='EPL 2024-2025 xG.csv')

xG_df = pd.read_csv(io.BytesIO(read_response['Body'].read()))


In [5]:
xG_df['xGD cumulative sum'] = xG_df['xGD_sum_over_gameweek']

In [6]:
top5 = xG_df[xG_df['gameweek']==xG_df['gameweek'].max()].sort_values(by='xGD cumulative sum', ascending=False).head(5)

top5_xG_teams = top5['team'].unique()
top_5_xGD = xG_df[xG_df['team'].isin(top5_xG_teams)]


In [12]:
bottom5 = xG_df[xG_df['gameweek']==xG_df['gameweek'].max()].sort_values(by='xGD cumulative sum', ascending=True).head(5)

bottom5_xG_teams = bottom5['team'].unique()
bottom_5_xGD = xG_df[xG_df['team'].isin(bottom5_xG_teams)]

In [7]:
so.Plot.config.theme.update(mpl.rcParams)


In [8]:
plt.figure(figsize=(12, 8))

<Figure size 1200x800 with 0 Axes>

<Figure size 1200x800 with 0 Axes>

In [31]:
fig = px.line(top_5_xGD, x='gameweek', y='xGD cumulative sum', color='team', width=800, height=400)
fig.update_layout(plot_bgcolor='white', title=dict(text='Top 5 Teams in cumulative xGD', font=dict(size=18), yref='paper'))
fig.update_layout(title_xanchor='left')
fig.update_layout(title_font_lineposition='under')


fig.update_xaxes(
    mirror=False,
    ticks='outside',
    showline=True,
    linecolor='black',
    gridcolor='lightgrey'
)
fig.update_yaxes(
    mirror=False,
    ticks='outside',
    showline=True,
    linecolor='black',
    gridcolor='lightgrey'
)

fig.update_traces(cliponaxis=False)
fig.update_layout(
    margin=dict(r=120),  # Add space for labels
    width=900,  # Wider figure
    height=500,
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False)
)




Looking at this chart, it is clear that Liverpool have managed to separate themselves from the crowd in the last 3 gameweeks. Whether this trend continues remains to be seen, but as things stand my theory around xGD is holding true, as Liverpool are currently 8 points clear in the league table. The rest of the top 5 are teams that should feel that they are in the title race and can have a good chance if a few results go their way. The next iteration of this project will include the league table but for now I am focusing on xGD and will incrementally add features using the agile methodology. 

The next chart is going to look at the lower end of the xGD table.

In [33]:
fig2 = px.line(bottom_5_xGD, x='gameweek', y='xGD cumulative sum', color='team', width=800, height=400)
fig2.update_layout(plot_bgcolor='white', title=dict(text='Bottom 5 Teams in cumulative xGD', font=dict(size=18), yref='paper'))
fig2.update_layout(title_xanchor='left')
fig2.update_layout(title_font_lineposition='under')


for category in bottom_5_xGD['team'].unique():
    category_data = bottom_5_xGD[bottom_5_xGD['team'] == category]
    fig2.add_trace(go.Scatter(
        x=[category_data['gameweek'].iloc[-1]],  # Last x value
        y=[category_data['xGD cumulative sum'].iloc[-1]],  # Last y value
        text=[category],  # Label text
        mode='text',
        textposition='middle right',
        showlegend=False  # Hide from legend
    ))
fig2.update_traces(cliponaxis=False)
fig2.update_layout(
    margin=dict(r=120),  # Add space for labels
    width=900,  # Wider figure
    height=500,
    xaxis=dict(showgrid=False),
    yaxis=dict(showgrid=False)
)
fig2.update_xaxes(
    mirror=False,
    ticks='outside',
    showline=True,
    linecolor='black',
    gridcolor='lightgrey'
)
fig2.update_yaxes(
    mirror=False,
    ticks='outside',
    showline=True,
    linecolor='black',
    gridcolor='lightgrey'
)


The bottom 4 teams in xGD are in a dogfight, while Palace are not looking as bad in comparison even though they are bottom 5. If this trend continues for these 5 teams, they are very likely to find themselves in the relegation battle in the latter stages of the season.