# Aston Villa v. Brighton - 2024-12-30 match analysis

*This notebook intends to walk through the theory and calculations I am currently using in my model to calculate probability of 1 x 2 result and a table of over / under goal projections. I will break the calculations out into utility functions stored in `../src/utils` for ease of use in future explorations. This is to document the calculations.*

### Instructions:

1. Ensure all cells in all notebooks in `/notebooks` directory have been run.
2. Define `home_side` and `away_side` in first cell. If making a comparison with odds from `03_retrieve_odds.ipynb` also enter matchday.
3. Click 'run all'
4. Probability for 1 x 2 and over / under can be viewed in the last cell
5. *Suggestion:* copy and paste notebook to `../output/< matchday_directory >/` and rename file. IMPORTANT: Update file path in lines 14 and 15 if needed if you make a copy.



### Import League Data & Define Sides

Define 'home_side' and 'away_side'
Ensure correct output



In [25]:
import pandas as pd
from scipy.stats import poisson
import numpy as np

"""
BUG: if copying from `notebooks/03_analysis_template_notebook.ipynb` to a directory in output
for storage, ensure the file path for reading the csv will need to be updated from '../data/processed/home_table.csv'
to '../../data/processed/home_table.csv' if frames are run again.

TLDR
if you are in ./notebooks/03_analysis_tempalte.csv you may run this cell as is. If you are not, check that following
read_csv commands read from the correct directory.
"""
home_dataframe = pd.read_csv('../data/processed/home_table.csv')
away_dataframe = pd.read_csv('../data/processed/away_table.csv')

"""
!!! INPUT TEAMS & MATCHDAY HERE !!!

wrapped in quotes, eg.

home_side = "Bournemouth"
away_side = "West Ham"
matchday = "2024-12-16" # (optional)

If there is a ValueError,
(1) check spelling against the error message. Must be exact.
(2) check capitalization
(3) check that it is wrapped in quotes

!!! INPUT TEAMS & MATCHDAY HERE !!!
"""
home_side = "Aston Villa"
away_side = "Brighton"
matchday = "2024-12-30" # (optional)

# validate input
clubs = set(home_dataframe['Team'].values)
if home_side not in clubs:
    raise ValueError(f"Invalid entry for 'home_side'. Expected one of {clubs}, but received '{home_side}'")
elif away_side not in clubs:
    raise ValueError(f"Invalid entry for 'away_side'. Expected one of {clubs}, but received '{away_side}'")
else:
    print(f"Ready to calculate probability for {home_side} v. {away_side}")
    print("\nNOTEBOOK TITLE (copy and paste): \n")
    print(f"{home_side} v. {away_side} - {matchday} match analysis")

Ready to calculate probability for Aston Villa v. Brighton

NOTEBOOK TITLE (copy and paste): 

Aston Villa v. Brighton - 2024-12-30 match analysis


### Calculate Projected Goals

In [26]:
# 2. grab their gpm_scored and gpm_conceded
home_side_gpm_scored = home_dataframe.loc[
    home_dataframe["Team"] == home_side,
    "gpm_scored"
].values[0]

home_side_gpm_conceded = home_dataframe.loc[
    home_dataframe["Team"] == home_side,
    "gpm_conceded"
].values[0]

print(home_side_gpm_scored, home_side_gpm_conceded)

away_side_gpm_scored = away_dataframe.loc[
    away_dataframe["Team"] == away_side,
    "gpm_scored"
].values[0]

away_side_gpm_conceded = away_dataframe.loc[
    away_dataframe["Team"] == away_side,
    "gpm_conceded"
].values[0]

print(away_side_gpm_scored, away_side_gpm_conceded)

# 3.
epl_home_avg_gpm_scored = round(sum(home_dataframe["gpm_scored"].values) / 20, 2)
epl_home_avg_gpm_conceded = round(sum(home_dataframe["gpm_conceded"].values) / 20, 2)
print(epl_home_avg_gpm_scored, epl_home_avg_gpm_conceded)

# 4. create a ratio (attack and defense rating) of their gpm to the league average gpm
home_attack_rating = home_side_gpm_scored / epl_home_avg_gpm_scored
away_defense_rating = away_side_gpm_conceded / epl_home_avg_gpm_scored
away_attack_rating = away_side_gpm_scored / epl_home_avg_gpm_conceded
home_defense_rating = home_side_gpm_conceded / epl_home_avg_gpm_conceded

print(home_attack_rating, away_defense_rating, away_attack_rating, home_defense_rating)

# home projected goals
home_projected_goals = home_attack_rating * away_defense_rating * epl_home_avg_gpm_scored
# away projected goals
away_projected_goals = away_attack_rating * home_defense_rating * epl_home_avg_gpm_conceded
print("\n ================== projected_goals ==================== \n")
print(f"HOME {home_side} projected goals:", home_projected_goals)
print(f"AWAY {away_side} projected goals: ", away_projected_goals)

1.6666666666666667 1.1111111111111112
1.5555555555555556 1.5555555555555556
1.53 1.43
1.0893246187363834 1.016702977487291 1.087801087801088 0.7770007770007771


HOME Aston Villa projected goals: 1.6945049624788187
AWAY Brighton projected goals:  1.2086678753345423


### Poisson Distribution of Goal Probabilities

In [27]:
# Poisson probability for home team goals 0 - 8
home_score_prob = []
away_score_prob = []
for i in range(9):
    prob = poisson.pmf(i, home_projected_goals)
    home_score_prob.append(prob)
for i in range(9):
    prob = poisson.pmf(i, away_projected_goals)
    away_score_prob.append(prob)

home_score_prob = np.array(home_score_prob)
away_score_prob = np.array(away_score_prob)
# print(home_score_prob)
# print(away_score_prob)

exact_score_prob = np.outer(away_score_prob, home_score_prob)
print(f"{home_side} v. {away_side}", exact_score_prob)

Aston Villa v. Brighton [[5.48489170e-02 9.29417620e-02 7.87451385e-02 4.44780093e-02
  1.88420519e-02 6.38559008e-03 1.80340235e-03 4.36553461e-04
  9.24677507e-05]
 [6.62941239e-02 1.12335722e-01 9.51767192e-02 5.37591410e-02
  2.27737828e-02 7.71805759e-03 2.17971448e-03 5.27648144e-04
  1.11762800e-04]
 [4.00637890e-02 6.78882892e-02 5.75185215e-02 3.24884734e-02
  1.37629698e-02 4.66428414e-03 1.31727544e-03 3.18875680e-04
  6.75420529e-05]
 [1.61412716e-02 2.73514648e-02 2.31735964e-02 1.30892580e-02
  5.54495317e-03 1.87919013e-03 5.30716167e-04 1.28471597e-04
  2.72119698e-05]
 [4.87735910e-03 8.26470920e-03 7.00229538e-03 3.95514142e-03
  1.67550169e-03 5.67829186e-04 1.60364896e-04 3.88198731e-05
  8.22255844e-06]
 [1.17902145e-03 1.99785770e-03 1.69268989e-03 9.56090476e-04
  4.05025014e-04 1.37263379e-04 3.87655795e-05 9.38406670e-06
  1.98766845e-06]
 [2.37507559e-04 4.02457737e-04 3.40983316e-04 1.92599307e-04
  8.15901205e-05 2.76509728e-05 7.80911844e-06 1.89036999e-06


### Under Market

In [28]:
thresholds = [0.5, 1.5, 2.5, 3.5]
sums = []

for t in thresholds:
    mask = np.add.outer(
        np.arange(
            exact_score_prob.shape[0]
        ),
        np.arange(
            exact_score_prob.shape[1]
        )
    ) <= t
    # print(mask)
    sums.append(exact_score_prob[mask].sum())

# print(sums)
under_market_df = pd.DataFrame({
    "Goals": thresholds,
    "Prob": sums
})
under_market_df["Implied Odds"] = 1 / under_market_df["Prob"]
print("~  ~ under market ~  ~\n", f"{home_side} v. {away_side}\n", under_market_df)

~  ~ under market ~  ~
 Aston Villa v. Brighton
    Goals      Prob  Implied Odds
0    0.5  0.054849     18.231901
1    1.5  0.214085      4.671046
2    2.5  0.445229      2.246033
3    3.5  0.668914      1.494961


In [29]:
thresholds = [0.5, 1.5, 2.5, 3.5]
sums = []

for t in thresholds:
    mask = np.add.outer(
        np.arange(
            exact_score_prob.shape[0]
        ),
        np.arange(
            exact_score_prob.shape[1]
        )
    ) >= t
    sums.append(exact_score_prob[mask].sum())

# print(sums)
over_market_df = pd.DataFrame({
    "Goals": thresholds,
    "Prob": sums
})

over_market_df["Implied Odds"] = 1 / over_market_df["Prob"]
print("~  ~ over market ~  ~\n", f"{home_side} v. {away_side}\n", over_market_df)

~  ~ over market ~  ~
 Aston Villa v. Brighton
    Goals      Prob  Implied Odds
0    0.5  0.945076      1.058116
1    1.5  0.785840      1.272524
2    2.5  0.554695      1.802791
3    3.5  0.331011      3.021046


### 1 x 2 Probability

In [30]:
# Poisson probability for home team goals 0 - 8
home_score_prob = []
away_score_prob = []
for i in range(9):
    prob = poisson.pmf(i, home_projected_goals)
    home_score_prob.append(prob)
for i in range(9):
    prob = poisson.pmf(i, away_projected_goals)
    away_score_prob.append(prob)

home_score_prob = np.array(home_score_prob)
away_score_prob = np.array(away_score_prob)
# print(home_score_prob)
# print(away_score_prob)

exact_score_prob = np.outer(away_score_prob, home_score_prob)

# print(exact_score_prob)
home_win_prob = np.sum(np.triu(exact_score_prob, k=1))
away_win_prob = np.sum(np.tril(exact_score_prob, k=-1))
draw_prob = np.trace(exact_score_prob)

# prepare data in readable dataframe
data = {
    "winner": [f"{home_side} (1)", "Draw (x)", f"{away_side} (2)"],
    "prob": [home_win_prob, draw_prob, away_win_prob],
}
result_df = pd.DataFrame(data)
result_df['implied_odds'] = 1 / result_df['prob']
print("      ~  ~ moneyline market ~  ~\n", result_df)

      ~  ~ moneyline market ~  ~
             winner      prob  implied_odds
0  Aston Villa (1)  0.487529      2.051158
1         Draw (x)  0.239613      4.173391
2     Brighton (2)  0.272782      3.665930


# Full Results Summary:

In [31]:
# helper function to translate discrepency between Understat and Odds-Api naming
def translate(string):

    translations = {
        "Brighton": "Brighton and Hove Albion",
        "Ipswich": "Ipswich Town",
        "Leicester": "Leicester City",
        "Tottenham": "Tottenham Hotspur",
        "West Ham": "West Ham United"
    }

    if string in translations:
        return translations[string]
    return string

home_side, away_side = translate(home_side), translate(away_side)

match_string = f"{home_side} v. {away_side}"

print(f"CALCULATED PROBABILITIES FOR {home_side} v. {away_side}")
print("\n\n        ~ ~ moneyline market ~ ~\n\n", result_df)
print("\n\n          ~ ~ over market ~ ~ \n\n", over_market_df)
print("\n\n          ~ ~ under market ~ ~ \n\n", under_market_df)

try:
    odds_df = pd.read_csv(f"../data/processed/odds/odds_matchday_{matchday}.csv")
    filtered_odds = odds_df.loc[odds_df["match"] == match_string]
    implied = {
        "1": 1 / filtered_odds["1"],
        "x": 1 / filtered_odds["x"],
        "2": 1 / filtered_odds["2"],
        "point": filtered_odds["ovr_und_point"],
        "over": 1 / filtered_odds["over"],
        "under": 1 / filtered_odds["under"]
    }
    implied_odds_df = pd.DataFrame(implied)

    print(f"\n ~~ MyBookie Odds for {match_string} ~~\n",filtered_odds.drop(columns=["match"]))
    print(f"\n ~~ MyBookie Odds IMPLIED for {match_string} ~~\n", implied_odds_df)
except:
    print(f"No data for sportsbook odds on matchday {matchday}. To view against real odds, run `notebooks/03_retrieve_odds.ipynb` ran correctly and ensure matchday is correct in the first cell of this notebook")


CALCULATED PROBABILITIES FOR Aston Villa v. Brighton and Hove Albion


        ~ ~ moneyline market ~ ~

             winner      prob  implied_odds
0  Aston Villa (1)  0.487529      2.051158
1         Draw (x)  0.239613      4.173391
2     Brighton (2)  0.272782      3.665930


          ~ ~ over market ~ ~ 

    Goals      Prob  Implied Odds
0    0.5  0.945076      1.058116
1    1.5  0.785840      1.272524
2    2.5  0.554695      1.802791
3    3.5  0.331011      3.021046


          ~ ~ under market ~ ~ 

    Goals      Prob  Implied Odds
0    0.5  0.054849     18.231901
1    1.5  0.214085      4.671046
2    2.5  0.445229      2.246033
3    3.5  0.668914      1.494961

 ~~ MyBookie Odds for Aston Villa v. Brighton and Hove Albion ~~
       1     x    2  ovr_und_point  over  under
0  2.01  3.65  3.6            3.0  1.97   1.85

 ~~ MyBookie Odds IMPLIED for Aston Villa v. Brighton and Hove Albion ~~
           1         x         2  point      over     under
0  0.497512  0.273973  0.2

# Conclusion and Pick
Brighton in a bad run of form in the last 6, haven't won in 6. (I'm looking to update the model to make probabilities on the last 6 games). Villa in better shape, scoring 7 in their last 5. Model gives implied win at 48% close to MyBookie's 49%. Villa win looks most likely and with reasonable value. I wish I had a system for player props, time to start scoping out development on that.

### My Pick:
**Villa @ 2.01** -- *1u* 
result: WIN
