# 1 x 2 Probability for Given English Premier League Match

This is a reusable notebook for calculating the probability of home win (1), draw (x) or away win (2), shorthanded 1 x 2, for any given Premier League match given the away and home records of a given club and the average home and away performance of all Premier League clubs.

# TABLE OF CONTENTS
1. [Retrieve home and away tables](#retrieve-home-and-away-tables)
2. [Clean DataFrames](#clean-dataframes)
3. [Generate projected goals for match](#generate-projected-goals-for-match)
4. [Calculate 1 x 2 probability with Poisson distribution](#calculate-1-x-2-probability-with-poisson-distribution)


# Retrieve home and away tables
Use epl_table_scraper to retrieve home and away tables from UnderStat.

The scraper defaults to the EPL table, but this can retrieve other leagues with the following inputs if desired:

1. Bundesliga - "https://understat.com/league/Bundesliga"
2. La Liga - "https://understat.com/league/La_liga"
3. Serie A - "https://understat.com/league/Serie_A"
4. Ligue 1 - "https://understat.com/league/Ligue_1"

eg.
```python
scraper = WebScraper(url="https://understat.com/league/La_liga")
```


In [1]:
import sys
sys.path.append('../scraper')

from epl_table_scraper import WebScraper


# Retrieve 'Home' table
scraper = WebScraper()
scraper.load_page()
scraper.click_element("home-away2")
home_df = scraper.scrape_table()

scraper.close()

print("\n======================== home_df ==========================\n")
print(home_df)

# Retrieve 'Away' table
scraper = WebScraper()
scraper.load_page()
scraper.click_element("home-away3")
away_df = scraper.scrape_table()

scraper.close()

print("\n======================== away_df ==========================\n")
print(away_df)

scraper is locating element... check
element located:  <selenium.webdriver.remote.webelement.WebElement (session="a3baff70c0ed3af33ca4ea5d5dbbd7d9", element="f.842106F5AFBFE2AC3C2B8986FA113151.d.0EAF939982C7641C31E387943E88B21F.e.32")>
element html:  <input id="home-away2" type="radio" name="home-away" value="h">
clicked the button
checking table update...
confirmed table updated successfully
found table
closing driver...
driver closed


     №                     Team  M  W  D  L   G  GA PTS          xG  \
0    1                Brentford  8  7  1  0  26  14  22  19.38-6.62   
1    2                Liverpool  7  6  0  1  13   3  18  16.74+3.74   
2    3                  Arsenal  7  5  2  0  17   6  17  19.07+2.07   
3    4          Manchester City  7  5  1  1  15  10  16  17.28+2.28   
4    5              Aston Villa  8  4  3  1  13   9  15  18.13+5.13   
5    6                   Fulham  8  4  2  2  14  13  14  13.66-0.34   
6    7                Tottenham  8  4  1  3  20  11  13  16.8

# Clean DataFrames

Now with that we have the data scraped and ready from Understat, we want to prepare to agregate some statistics for the whole league. We need to:
- (1) remove unneeded columns
- (2) calculate new columns for:
    - (a) goals per match scored
    - (b) goals per match conceded

In [2]:
import pandas as pd

# drop unneeded columns
home_df_cln = home_df.drop(
    columns = [
        'W',
        'D',
        'L',
        'PTS',
        'xG',
        'xGA',
        'xPTS',
    ]
)

away_df_cln = away_df.drop(
    columns = [
        'W',
        'D',
        'L',
        'PTS',
        'xG',
        'xGA',
        'xPTS',
    ]
)

# convert to numeric values
columns_to_clean = ['M', 'G', 'GA']
home_df_cln[columns_to_clean] = home_df_cln[columns_to_clean].apply(
    pd.to_numeric,
    errors='coerce'
)

away_df_cln[columns_to_clean] = away_df_cln[columns_to_clean].apply(
    pd.to_numeric,
    errors='coerce'
)


# calculate gpm_scored and gpm_conceded
home_df_cln["gpm_scored"] = home_df_cln["G"] / home_df_cln["M"]
home_df_cln["gpm_conceded"] = home_df_cln["GA"] / home_df_cln["M"]
print("\n======================   ~ home_df cleaned ~   =====================\n\n", home_df_cln)

away_df_cln["gpm_scored"] = away_df_cln["G"] / away_df_cln["M"]
away_df_cln["gpm_conceded"] = away_df_cln["GA"] / away_df_cln["M"]
print("\n\n======================   ~ away_df cleaned ~   =====================\n\n", away_df_cln)




      №                     Team  M   G  GA  gpm_scored  gpm_conceded
0    1                Brentford  8  26  14    3.250000      1.750000
1    2                Liverpool  7  13   3    1.857143      0.428571
2    3                  Arsenal  7  17   6    2.428571      0.857143
3    4          Manchester City  7  15  10    2.142857      1.428571
4    5              Aston Villa  8  13   9    1.625000      1.125000
5    6                   Fulham  8  14  13    1.750000      1.625000
6    7                Tottenham  8  20  11    2.500000      1.375000
7    8              Bournemouth  7  10   6    1.428571      0.857143
8    9                 Brighton  7  12   9    1.714286      1.285714
9   10        Manchester United  8  13  11    1.625000      1.375000
10  11                  Chelsea  7  12   8    1.714286      1.142857
11  12        Nottingham Forest  7   8   6    1.142857      0.857143
12  13         Newcastle United  7   8   8    1.142857      1.142857
13  14                  Everton

# Generate projected goals for match
This cell calculates the projected goals for the home and away teams based on their attacking and defensive strengths relative to league averages. The steps are as follows:

## 1. Define Home and Away Sides
We first define the home and away teams for the match under consideration
## 2. Extract Goals Per Match (GPM) Data
Using pre-cleaned data (`home_df_cln` and `away_df_cln`), we extract each team’s:
    - Goals per match scored (`gpm_scored`): Average goals the team scores per match.
    - Goals per match conceded (`gpm_conceded`): Average goals the team allows per match.
      
## 3. Calculate League Averages
We calculate the league average goals per match (GPM):
    - `epl_home_avg_gpm_scored`: Average goals scored per match by all teams at home.
    - `epl_home_avg_gpm_conceded`: Average goals conceded per match by all teams at home.

This gives a baseline for assessing team performance relative to the league.

## 4. Calculate Team Strength Ratios
For both teams, we calculate attack and defense ratings relative to the league averages:
- Attack Rating: Team's goals scored per match compared to the league average.
- Defense Rating: Team's goals conceded per match compared to the league average.
### Home Team Ratings:
- `home_attack_rating`: Ratio of Bournemouth's goals scored to the league average.
- `home_defense_rating`: Ratio of Bournemouth's goals conceded to the league average.
### Away Team Ratings:
- `away_attack_rating`: Ratio of Brighton's goals scored to the league average.
- `away_defense_rating`: Ratio of Brighton's goals conceded to the league average.

## 5. Calculate Projected Goals
Using the ratings and league averages, we project goals for both teams:

- Home Projected Goals: `home_attack_rating * away_defense_rating * epl_home_avg_gpm_scored`
- Away Projected Goals: `away_attack_rating * home_defense_rating * epl_home_avg_gpm_conceded`
  
These projections estimate the expected goals for the home and away sides based on their relative strengths.

In [None]:
# 1. define home and away sides
home_side, away_side = "West Ham", "Wolverhampton Wanderers"

# 2. grab their gpm_scored and gpm_conceded
home_side_gpm_scored = home_df_cln.loc[
    home_df_cln["Team"] == home_side,
    "gpm_scored"
].values[0]

home_side_gpm_conceded = home_df_cln.loc[
    home_df_cln["Team"] == home_side,
    "gpm_conceded"
].values[0]

print(home_side_gpm_scored, home_side_gpm_conceded)

away_side_gpm_scored = away_df_cln.loc[
    away_df_cln["Team"] == away_side,
    "gpm_scored"
].values[0]

away_side_gpm_conceded = away_df_cln.loc[
    away_df_cln["Team"] == away_side,
    "gpm_conceded"
].values[0]

print(away_side_gpm_scored, away_side_gpm_conceded)

# 3.
epl_home_avg_gpm_scored = round(sum(home_df_cln["gpm_scored"].values) / 20, 2)
epl_home_avg_gpm_conceded = round(sum(home_df_cln["gpm_conceded"].values) / 20, 2)
print(epl_home_avg_gpm_scored, epl_home_avg_gpm_conceded)

# 4. create a ratio (attack and defense rating) of their gpm to the league average gpm
home_attack_rating = home_side_gpm_scored / epl_home_avg_gpm_scored
away_defense_rating = away_side_gpm_conceded / epl_home_avg_gpm_scored
away_attack_rating = away_side_gpm_scored / epl_home_avg_gpm_conceded
home_defense_rating = home_side_gpm_conceded / epl_home_avg_gpm_conceded

print(home_attack_rating, away_defense_rating, away_attack_rating, home_defense_rating)

# home projected goals
home_projected_goals = home_attack_rating * away_defense_rating * epl_home_avg_gpm_scored
# away projected goals
away_projected_goals = away_attack_rating * home_defense_rating * epl_home_avg_gpm_conceded
print("\n ================== projected_goals ==================== \n")
print("HOME projected goals:", home_projected_goals)
print("AWAY projected goals: ", away_projected_goals)


1.4285714285714286 2.142857142857143
1.5714285714285714 2.5714285714285716
1.62 1.38
0.8818342151675485 1.5873015873015872 1.1387163561076605 1.5527950310559007


HOME projected goals: 2.2675736961451247
AWAY projected goals:  2.440106477373558


# Calculate 1 x 2 probability with Poisson distribution
The Poisson distribution models the probability of a given number of events (k) occurring in a fixed interval, given the expected number of events (μ):

$$
P(k; \mu) = \frac{\mu^k \cdot e^{-\mu}}{k!}
$$

For our calculation: 
- μ = expected number of goals (projected goals)
- k = number of goals
  
in a period of time (one 90 minute match)

## 1. Generate Probabilities for Home and Away Scores
- For each potential score (0–8), calculate the Poisson probability using the respective team’s projected goals.
- Store the results in arrays `home_score_prob` and `away_score_prob`.

## 2. Exact Score Matrix
Using np.outer, create a matrix (exact_score_prob) that represents the joint probability of every possible scoreline

## 3. Calculate Outcome Probabilities
- Home Win Probability: Sum probabilities above the diagonal (home goals > away goals).
- Away Win Probability: Sum probabilities below the diagonal (away goals > home goals)
- Draw Probability: Sum the probabilities along the diagonal (home goals = away goals)

In [4]:
"""
Poisson distribution: with the expectation of mu events (goals) in a given interval (match),
the probability of k events ocurring in that interval is:
( mu^k * e^(-mu) ) / k!
This equation will be handled with scipy in the code.
"""
from scipy.stats import poisson
import numpy as np
# Poisson probability for home team goals 0 - 8
home_score_prob = []
away_score_prob = []
for i in range(9):
    prob = poisson.pmf(i, home_projected_goals)
    home_score_prob.append(prob)
for i in range(9):
    prob = poisson.pmf(i, away_projected_goals)
    away_score_prob.append(prob)

home_score_prob = np.array(home_score_prob)
away_score_prob = np.array(away_score_prob)
# print(home_score_prob)
# print(away_score_prob)

exact_score_prob = np.outer(away_score_prob, home_score_prob)

# print(exact_score_prob)
home_win_prob = np.sum(np.triu(exact_score_prob, k=1))
away_win_prob = np.sum(np.tril(exact_score_prob, k=-1))
draw_prob = np.trace(exact_score_prob)
print(f"home_win_prob ({home_side}): ", home_win_prob, "implied: ", 1 / home_win_prob, f"\naway_win_prob ({away_side}): ", away_win_prob, "implied: ", 1 / away_win_prob, "\ndraw_prob", draw_prob, "implied: ", 1 / draw_prob,)

home_win_prob (West Ham):  0.37409582279068676 implied:  2.673111911649219 
away_win_prob (Wolverhampton Wanderers):  0.43534110264376524 implied:  2.2970493572215918 
draw_prob 0.1890166504064349 implied:  5.290539208317046
