# 1 x 2 Probability for Given English Premier League Match

This is a reusable notebook for calculating the probability of home win (1), draw (x) or away win (2), shorthanded 1 x 2, for any given Premier League match given the away and home records of a given club and the average home and away performance of all Premier League clubs.

# TABLE OF CONTENTS
1. [1. Retrieve home and away tables](#retrieve-home-and-away-tables)
2. [Clean DataFrames](#clean-dataframes)
3. [3. Generate projected goals for match](#generate-projected-goals-for-match)


# Retrieve home and away tables
Use epl_table_scraper to retrieve home and away tables from UnderStat.

The scraper defaults to the EPL table, but this can retrieve other leagues with the following inputs if desired:

1. Bundesliga - "https://understat.com/league/Bundesliga"
2. La Liga - "https://understat.com/league/La_liga"
3. Serie A - "https://understat.com/league/Serie_A"
4. Ligue 1 - "https://understat.com/league/Ligue_1"

eg.
```python
scraper = WebScraper(url="https://understat.com/league/La_liga")
```


In [4]:
import sys
sys.path.append('../scraper')

from epl_table_scraper import WebScraper


# Retrieve 'Home' table
scraper = WebScraper()
scraper.load_page()
scraper.click_element("home-away2")
home_df = scraper.scrape_table()

scraper.close()

print("\n======================== home_df ==========================\n")
print(home_df)

# Retrieve 'Away' table
scraper = WebScraper()
scraper.load_page()
scraper.click_element("home-away3")
away_df = scraper.scrape_table()

scraper.close()

print("\n======================== away_df ==========================\n")
print(away_df)

scraper is locating element... check
element located:  <selenium.webdriver.remote.webelement.WebElement (session="fbb35787a1823d6e3871815e20a3e6a7", element="f.BA855CC9B10E3A6C02CF651ABC648A38.d.6AFFD2AC8A60DA4E2A6A6406CFA48285.e.28")>
element html:  <input id="home-away2" type="radio" name="home-away" value="h">
clicked the button
checking table update...
confirmed table updated successfully
found table
closing driver...
driver closed


     №                     Team  M  W  D  L   G  GA PTS          xG  \
0    1                Brentford  6  5  1  0  18  11  16  15.41-2.59   
1    2                Liverpool  6  5  0  1  11   3  15  13.04+2.04   
2    3          Manchester City  5  4  1  0  12   6  13  11.92-0.08   
3    4                Tottenham  6  4  0  2  16   6  12  12.59-3.41   
4    5                 Brighton  6  3  3  0  11   8  12  11.82+0.82   
5    6                  Arsenal  5  3  2  0  12   6  11  14.44+2.44   
6    7              Bournemouth  5  3  1  1   8   4  10  10.1

# Clean DataFrames

Now with that we have the data scraped and ready from Understat, we want to prepare to agregate some statistics for the whole league. We need to:
- (1) remove unneeded columns
- (2) calculate new columns for:
    - (a) goals per match scored
    - (b) goals per match conceded

In [3]:
import pandas as pd

# drop unneeded columns
home_df_cln = home_df.drop(
    columns = [
        'W',
        'D',
        'L',
        'PTS',
        'xG',
        'xGA',
        'xPTS',
    ]
)

away_df_cln = away_df.drop(
    columns = [
        'W',
        'D',
        'L',
        'PTS',
        'xG',
        'xGA',
        'xPTS',
    ]
)

# convert to numeric values
columns_to_clean = ['M', 'G', 'GA']
home_df_cln[columns_to_clean] = home_df_cln[columns_to_clean].apply(
    pd.to_numeric,
    errors='coerce'
)

away_df_cln[columns_to_clean] = away_df_cln[columns_to_clean].apply(
    pd.to_numeric,
    errors='coerce'
)


# calculate gpm_scored and gpm_conceded
home_df_cln["gpm_scored"] = home_df_cln["G"] / home_df_cln["M"]
home_df_cln["gpm_conceded"] = home_df_cln["GA"] / home_df_cln["M"]
print(home_df_cln)

away_df_cln["gpm_scored"] = away_df_cln["G"] / away_df_cln["M"]
away_df_cln["gpm_conceded"] = away_df_cln["GA"] / away_df_cln["M"]
print(away_df_cln)


     №                     Team  M   G  GA  gpm_scored  gpm_conceded
0    1                Brentford  6  18  11    3.000000      1.833333
1    2                Liverpool  6  11   3    1.833333      0.500000
2    3          Manchester City  5  12   6    2.400000      1.200000
3    4                Tottenham  6  16   6    2.666667      1.000000
4    5                 Brighton  6  11   8    1.833333      1.333333
5    6                  Arsenal  5  12   6    2.400000      1.200000
6    7              Bournemouth  5   8   4    1.600000      0.800000
7    8                   Fulham  5   9   7    1.800000      1.400000
8    9         Newcastle United  5   5   3    1.000000      0.600000
9   10        Manchester United  6   7   8    1.166667      1.333333
10  11                  Chelsea  6   9   8    1.500000      1.333333
11  12              Aston Villa  5   7   6    1.400000      1.200000
12  13        Nottingham Forest  6   7   6    1.166667      1.000000
13  14                 West Ham  6

1. define home and away sides
2. grab their gpm_scored and gpm_conceded
3. calculate league average gpm_score and gpm_conceded
4. create a ratio (attack and defense rating) of their gpm to the league average gpm

In [27]:
# 1. define home and away sides
home_side, away_side = "Bournemouth", "Brighton"

# 2. grab their gpm_scored and gpm_conceded
home_side_gpm_scored = home_df_cln.loc[
    home_df_cln["Team"] == home_side, 
    "gpm_scored"
].values[0]

home_side_gpm_conceded = home_df_cln.loc[
    home_df_cln["Team"] == home_side, 
    "gpm_conceded"
].values[0]

print(home_side_gpm_scored, home_side_gpm_conceded)

away_side_gpm_scored = away_df_cln.loc[
    away_df_cln["Team"] == away_side,
    "gpm_scored"
].values[0]

away_side_gpm_conceded = away_df_cln.loc[
    away_df_cln["Team"] == away_side,
    "gpm_conceded"
].values[0]

print(away_side_gpm_scored, away_side_gpm_conceded)

# 3. 
epl_home_avg_gpm_scored = round(sum(home_df_cln["gpm_scored"].values) / 20, 2)
epl_home_avg_gpm_conceded = round(sum(home_df_cln["gpm_conceded"].values) / 20, 2)
print(epl_home_avg_gpm_scored, epl_home_avg_gpm_conceded)

# 4. create a ratio (attack and defense rating) of their gpm to the league average gpm
home_attack_rating = home_side_gpm_scored / epl_home_avg_gpm_scored
away_defense_rating = away_side_gpm_conceded / epl_home_avg_gpm_scored
away_attack_rating = away_side_gpm_scored / epl_home_avg_gpm_conceded 
home_defense_rating = home_side_gpm_conceded / epl_home_avg_gpm_conceded

print(home_attack_rating, away_defense_rating, away_attack_rating, home_defense_rating)

# home projected goals
home_projected_goals = home_attack_rating * away_defense_rating * epl_home_avg_gpm_scored
# away projected goals
away_projected_goals = away_attack_rating * home_defense_rating * epl_home_avg_gpm_conceded
print("\n ================== projected_goals ==================== \n")
print("HOME projected goals:", home_projected_goals)
print("AWAY projected goals: ", away_projected_goals)


1.6 0.8
1.6 1.4
1.54 1.31
1.038961038961039 0.9090909090909091 1.2213740458015268 0.6106870229007634


HOME projected goals: 1.4545454545454546
AWAY projected goals:  0.9770992366412216
