## Fixture Difficulty Rating
- The aim for this  is to be able to give the user a category of how difficult their next game or next run of games is going to be for a player
- This is going to be useful because fixtures are key for players in fantasy football
- I was orignally going to try and use a machine learning model for this but I decided against it as I believe that making a formula that takes in teams form and league position will be more effective as it is a definitive thing.


In [1]:
import requests

url = 'https://fbref.com/en/comps/9/Premier-League-Stats'

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
}

response = requests.get(url, headers=headers)

with open("table-data.html", "w", encoding="utf-8") as file:
    file.write(response.text)

# This is extracting the raw html from the page into a local HTML file

In [4]:
from bs4 import BeautifulSoup
import pandas as pd

# Path to your saved HTML file
html_file = "table-data.html"

# Load the HTML file with BeautifulSoup
with open(html_file, "r", encoding="utf-8") as file:
    soup = BeautifulSoup(file, "lxml")  # You can also try 'lxml' if needed

# Locate the <table> with id 'stats_defense'
table = soup.find("table", {"id": "results2024-202591_overall"})

if table:
    print("Found <table id='results2024-202591_overall'>.")
    # Print the first 500 characters of the table for inspection
    print("Preview of <table> content:\n", table.prettify()[:500])

    # Parse the table using pandas
    try:
        df = pd.read_html(str(table))[0]
        # Save the DataFrame to a CSV file
        df.to_csv("table.csv", index=False)
        print("Player stats saved to 'table.csv'")
    except ValueError as e:
        print(f"Error parsing the table: {e}")
else:
    print("No <table> with id 'results2024-202591_overall' found.")

# This code was obtained from ChatGPT and it is used to extract tables from HTML files using BeautifulSoup and pandas.


Found <table id='results2024-202591_overall'>.
Preview of <table> content:
 <table class="stats_table sortable min_width force_mobilize" data-cols-to-freeze=",2" id="results2024-202591_overall">
 <caption>
  Premier League Table
 </caption>
 <colgroup>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
  <col/>
 </colgroup>
 <thead>
  <tr>
   <th aria-label="Rank" class="poptip sort_default_asc center" data-stat="rank" data-tip="&lt;strong&gt;Rank&lt;/
Player stats saved to 'table.csv'


  df = pd.read_html(str(table))[0]


I am following the same format that I have used for my previous models to obtain the data through a web scraper.

I found the data that I needed to obtain by going through the fbref website which I use for all of my data, I found the table that I needed and extracted the HTML for that page, once I found the table ID in the raw html of the league table that I needed I was able to extract that data into a csv so I can then use this for my new dataset for this model.

In [13]:
import pandas as pd
import numpy as np
import sklearn

df = pd.read_csv('table.csv')

print(df.head())

   Rk              Squad  MP   W   D  L  GF  GA  GD  Pts  Pts/MP    xG   xGA  \
0   1          Liverpool  29  21   7  1  69  27  42   70    2.41  65.0  25.1   
1   2            Arsenal  29  16  10  3  53  24  29   58    2.00  44.5  24.7   
2   3  Nottingham Forest  29  16   6  7  49  35  14   54    1.86  36.5  34.0   
3   4            Chelsea  29  14   7  8  53  37  16   49    1.69  55.9  39.5   
4   5    Manchester City  29  14   6  9  55  40  15   48    1.66  51.4  39.4   

    xGD  xGD/90     Last 5  Attendance      Top Team Scorer      Goalkeeper  \
0  39.8    1.37  W D W W W       60300   Mohamed Salah - 27         Alisson   
1  19.8    0.68  W L D D W       60277      Kai Havertz - 9      David Raya   
2   2.5    0.09  L L D W W       30080      Chris Wood - 18       Matz Sels   
3  16.4    0.57  L L W W L       39610     Cole Palmer - 14  Robert Sánchez   
4  12.0    0.41  W L W L D       52897  Erling Haaland - 21         Ederson   

   Notes  
0    NaN  
1    NaN  
2    NaN  


Here I'm just reading in the new csv that I extracted earlier on in the notebook.

In [14]:
df.columns = df.columns.str.strip() # ChatGPT code to remove whitespace from column names
df = df.drop(['Attendance', 'Notes', 'Top Team Scorer', 'Goalkeeper', 'Pts/MP', 'xG', 'xGA', 'xGD', 'xGD/90'], axis=1)

df.to_csv('filtered_table.csv', index=False)

Here I'm just removing whitespace from the column headers and then dropping the unnecessary columns and saving the results to a csv file.

My plan for this is to use their current league position and then their form to build this formula. 

The way I want to incorporate the recent form into account is by getting a total based off of the last 5 results so it will be the same as the points system in real life:
    - 3 points for a win
    - 1 point for a draw
    - 0 points for a loss

When I have this as a column for each team I can then do a check with their current league position added to their form points and then the difference of that score for both teams will indicate how easy or hard the fixture is.

In [15]:
df = pd.read_csv('filtered_table.csv')

df["NewRank"] = 21 - df["Rk"]

print(df[["Rk", "NewRank", "Squad"]])



    Rk  NewRank              Squad
0    1       20          Liverpool
1    2       19            Arsenal
2    3       18  Nottingham Forest
3    4       17            Chelsea
4    5       16    Manchester City
5    6       15   Newcastle United
6    7       14           Brighton
7    8       13             Fulham
8    9       12        Aston Villa
9   10       11        Bournemouth
10  11       10          Brentford
11  12        9     Crystal Palace
12  13        8  Manchester United
13  14        7          Tottenham
14  15        6            Everton
15  16        5           West Ham
16  17        4             Wolves
17  18        3       Ipswich Town
18  19        2     Leicester City
19  20        1        Southampton


I am doing this because I want the calculation for difficulty to work that the higher the score the tougher the team so for this 20 is top of the league and 1 would be the bottom

Now the next step is to create the column for the recent form.

In [16]:
points_map = {
    "W": 3,
    "D": 1,
    "L": 0
}

df["Form_Rating"] = df["Last 5"].apply(
    lambda x: sum(points_map.get(result, 0) for result in x.split())
)

I got ChatGPT to help me with this part as I was having trouble iterating thrugh all of them correctly and efficiently, now this field has an integer value for the recent form of the team.

I am going to have 3 categories for this, with easy, even and tough fixture.

In [29]:
team_a_name = "Southampton"
team_b_name = "Leicester City"
difficultycategory = ""


def head_to_head(team_a_name, team_b_name, difficultycategory):
    team_a = df[df["Squad"] == team_a_name]
    team_b = df[df["Squad"] == team_b_name]

    total_a = team_a["NewRank"].values[0] + team_a["Form_Rating"].values[0]
    total_b = team_b["NewRank"].values[0] + team_b["Form_Rating"].values[0]

    difference = abs(total_a - total_b)

    if difference <10:
        difficultycategory = "Even"
    
    elif difference < 20:
        difficultycategory = "Medium"

    else:
        difficultycategory = "Hard"

    return total_a, total_b, difficultycategory


total_a, total_b, difficultycategory = head_to_head(team_a_name, team_b_name, difficultycategory)
print(f"{team_a_name} total: {total_a}")
print(f"{team_b_name} total: {total_b}") 
print(f"Difficulty Category: {difficultycategory}")

Southampton total: 1
Leicester City total: 2
Difficulty Category: Even


This function now takes into account a teams recent form and their league position to be able to give the user an idea of how difficult their next fixture is going to be.