In this notebook, I will look to demo fetching data from fbref using the python package BeautifulSoup (via fbref class). The aim is to be able to fetch data to be able to do analysis on how various teams on how they're doing in Europe's top five leagues. To do this we will use the FBref class to grab data for the following:
* current league table
* fixtures and results for given season
* Match stats scuh as xG, shots, goals
* Player stats for Europe's top five leagues

Data grabbed from this class will clean data in the format ready for analysis.

# Imports

In [1]:
import os
os.chdir("../")
%autosave 0

Autosave disabled


In [2]:
from src.fbref.fbref_class import FBref

# Grabbing data using BeautifulSoup

Example website we will look to scrape: https://fbref.com/en/

### Instantiate Fbref() class

In [3]:
fb = FBref()

### Define beautifulsoup object for html page of interest

The goal is to grab the table named Big 5 European Leagues as well as grabbing any hyperlinks. We will start by grabbing the html of the entire page

In [4]:
input_url = 'https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats'

beautifulsoup_object = fb.instantiate_beautiful_soup_object(
    input_url
)

### Grab competition dictionary

In [5]:
competition_dict = fb.get_competition_dict()

In [6]:
competition_dict

{'9': 'Premier-League',
 '12': 'La-Liga',
 '11': 'Serie-A',
 '13': 'Ligue-1',
 '20': 'Bundesliga'}

### Get big five league table

In [7]:
big5_df = fb.get_big_5_leagues()

In [8]:
big5_df

Unnamed: 0,Competition Name,Gender,Country,First Season,Last Season,Tier,competition_link,competition_id
0,Premier League,M,eng ENG,1888-1889,2022-2023,1st,https://fbref.com/en/comps/9/history/Premier-L...,9
1,La Liga,M,es ESP,1988-1989,2022-2023,1st,https://fbref.com/en/comps/12/history/La-Liga-...,12
2,Ligue 1,M,fr FRA,1995-1996,2022-2023,1st,https://fbref.com/en/comps/13/history/Ligue-1-...,13
3,Fußball-Bundesliga,M,de GER,1988-1989,2022-2023,1st,https://fbref.com/en/comps/20/history/Bundesli...,20
4,Serie A,M,it ITA,1988-1989,2022-2023,1st,https://fbref.com/en/comps/11/history/Serie-A-...,11
5,Big 5 European Leagues Combined,M,,1995-1996,2022-2023,1st,https://fbref.com/en/comps/Big5/history/Big-5-...,Big5


### Get league table

Define parameters for competition_id and season_name

In [4]:
competition_id = '9'
season_name = '2022-2023'

Get league table for example season, Premier League 2022-2023

In [10]:
league_table_df = fb.get_season_stats_table('league_table', competition_id, season_name)
league_table_df.head(3)

Unnamed: 0,position,team,matches_played,wins,draws,losses,goals_for,goals_against,goal_difference,points,...,xg,xg_against,expected_goal_difference,expected_goal_difference_per_90,goals_per_game,goals_conceded_per_game,xg_per_game,xg_against_per_game,competition_id,season_name
0,1,Arsenal,28,22,3,3,66,26,40,69,...,53.8,27.0,26.7,0.95,2.36,0.93,1.92,0.96,9,2022_2023
1,2,Manchester City,27,19,4,4,67,25,42,61,...,56.1,21.5,34.6,1.28,2.48,0.93,2.08,0.8,9,2022_2023
2,3,Manchester Utd,26,15,5,6,41,35,6,50,...,40.8,32.6,8.2,0.31,1.58,1.35,1.57,1.25,9,2022_2023


### Get Home and away league table

In [11]:
league_table_home_away_df = fb.get_season_stats_table('home_away_league_table', competition_id, season_name)
league_table_home_away_df.head(3)

Unnamed: 0,rk,squad,home_mp,home_w,home_d,home_l,home_gf,home_ga,home_gd,home_pts,...,away_ga,away_gd,away_pts,away_pts_per_mp,away_xg,away_xga,away_xgd,away_xgd_per_90,competition_id,season_name
0,1,Arsenal,14,11,2,1,38,17,21,35,...,9,19,34,2.43,22.7,12.8,9.9,0.71,9,2022_2023
1,2,Manchester City,13,11,1,1,43,13,30,34,...,12,12,27,1.93,26.4,13.7,12.7,0.91,9,2022_2023
2,3,Manchester Utd,13,9,3,1,24,8,16,30,...,27,-10,20,1.54,16.9,19.5,-2.7,-0.21,9,2022_2023


### Get fixtures and results for example league and season

In [5]:
fixtures_df = fb.get_fixtures_and_results(competition_id, season_name)

In [6]:
fixtures_df.head(5)

Unnamed: 0,wk,day,date,time,home,home_xg,score,away_xg,away,attendance,...,match_report,notes,home_team_id,away_team_id,fixture_link,kickoff,home_score,away_score,season_name,competition_id
0,1,Fri,2022-08-05,20:00,Crystal Palace,1.2,0–2,1.0,Arsenal,25286,...,Match Report,,47c64c55,18bb7c10,https://fbref.com/en/matches/e62f6e78/Crystal-...,2022-08-05 20:00:00,0.0,2.0,2022_2023,9
1,1,Sat,2022-08-06,12:30,Fulham,1.2,2–2,1.2,Liverpool,22207,...,Match Report,,fd962109,822bd0ba,https://fbref.com/en/matches/6713c1dc/Fulham-L...,2022-08-06 12:30:00,2.0,2.0,2022_2023,9
2,1,Sat,2022-08-06,15:00,Tottenham,1.5,4–1,0.5,Southampton,61732,...,Match Report,,361ca564,33c895d4,https://fbref.com/en/matches/09d8a999/Tottenha...,2022-08-06 15:00:00,4.0,1.0,2022_2023,9
3,1,Sat,2022-08-06,15:00,Newcastle Utd,1.7,2–0,0.3,Nott'ham Forest,52245,...,Match Report,,b2b47a98,e4a775cb,https://fbref.com/en/matches/1ac96eb4/Newcastl...,2022-08-06 15:00:00,2.0,0.0,2022_2023,9
4,1,Sat,2022-08-06,15:00,Leeds United,0.8,2–1,1.3,Wolves,36347,...,Match Report,,5bfb9659,8cec06e1,https://fbref.com/en/matches/82702941/Leeds-Un...,2022-08-06 15:00:00,2.0,1.0,2022_2023,9


### Get stats for a given match

Get match_url and grab id for one of the teams (will choose home team here)

Example match: Crystal Palace vs Arsenal (home side: Palace)

In [7]:
match_url = fixtures_df.fixture_link.iloc[0]
home_team_id = fixtures_df.home_team_id.iloc[0]
away_team_id = fixtures_df.away_team_id.iloc[0]

Get dictionary for stats for given match

In [8]:
fixture_stat_dict = fb.get_fixture_stats(
    fixture_url = match_url, 
    home_id = home_team_id, 
    away_id = away_team_id,
)

Demo of fetching stats from below list:
* summary
* passing
* passing_types
* defense
* possession
* misc
* keeper
* shots

Fetching match data for shots

In [11]:
fixture_stat_dict['summary'].head(5)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Performance Gls,Performance Ast,Performance PK,Performance PKatt,...,Passes Cmp,Passes Att,Passes Cmp%,Passes PrgP,Carries Carries,Carries PrgC,Take-Ons Att,Take-Ons Succ,player_id,player_link
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,0,0,0,0,...,3,6,50.0,0,12,2,2,1,0562b7f1,https://fbref.com/en/players/0562b7f1/Odsonne-...
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,0,0,0,0,...,8,10,80.0,0,8,1,0,0,50e6dc35,https://fbref.com/en/players/50e6dc35/Jean-Phi...
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,0,0,0,0,...,30,39,76.9,2,44,7,5,3,b2bc3b1f,https://fbref.com/en/players/b2bc3b1f/Wilfried...
3,Jordan Ayew,9,gh GHA,"RW,AM",30-328,90,0,0,0,0,...,27,34,79.4,0,34,5,9,6,da052c14,https://fbref.com/en/players/da052c14/Jordan-Ayew
4,Eberechi Eze,10,eng ENG,AM,24-037,85,0,0,0,0,...,32,41,78.0,0,34,1,4,1,ae4fc6a4,https://fbref.com/en/players/ae4fc6a4/Eberechi...


### Player stats from top five leagues

Get standard data table

In [12]:
big5_players_standard_df = fb.get_big5_player_stats('standard', season_name)

In [13]:
big5_players_standard_df.head(5)

Unnamed: 0,rank,player_name,country,position,team,competition,age,born,matches_played,starts,...,expected_assists_per_90,xg_plus_expected_assists_per_90,non_penalty_xg_per_90,non_penalty_xg_plus_expected_assists_per_90,matches,goalkeeper,defender,midfielder,attacker,season_name
0,1,Brenden Aaronson,USA,"MF,FW",Leeds United,Premier League,22.0,2000.0,26.0,23.0,...,0.17,0.31,0.14,0.31,Matches,False,False,True,True,2022-2023
1,2,Paxten Aaronson,USA,MF,Eint Frankfurt,Bundesliga,19.0,2003.0,1.0,0.0,...,0.0,0.0,0.0,0.0,Matches,False,False,True,False,2022-2023
2,3,Yunis Abdelhamid,MAR,DF,Reims,Ligue 1,35.0,1987.0,28.0,28.0,...,0.02,0.08,0.06,0.08,Matches,False,True,False,False,2022-2023
3,4,Himad Abdelli,FRA,"MF,FW",Angers,Ligue 1,23.0,1999.0,20.0,14.0,...,0.14,0.2,0.06,0.2,Matches,False,False,True,True,2022-2023
4,5,Salis Abdul Samed,GHA,MF,Lens,Ligue 1,23.0,2000.0,26.0,26.0,...,0.06,0.09,0.03,0.09,Matches,False,False,True,False,2022-2023


Get passing table

In [14]:
big5_players_defense_df = fb.get_big5_player_stats('defense', season_name)

In [15]:
big5_players_defense_df.head(5)

Unnamed: 0,Rk,player_name,country,position,team,competition,age,born,no_of_nineties,tackles,...,attempted_tackles_on_dribblers_per_90,unsuccessful_tackles_on_dribblers_per_90,blocks_per_90,shots_blocked_per_90,passes_blocked_per_90,interceptions_per_90,tackles_plus_interceptions_per_90,clearances_per_90,errors_per_90,season_name
0,1.0,Brenden Aaronson,USA,"MF,FW",Leeds United,Premier League,22,2000.0,21.6,34.0,...,1.62,1.06,1.62,0.14,1.48,0.09,1.67,0.28,0.05,2022-2023
1,2.0,Paxten Aaronson,USA,MF,Eint Frankfurt,Bundesliga,19,2003.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2022-2023
2,3.0,Yunis Abdelhamid,MAR,DF,Reims,Ligue 1,35,1987.0,28.0,66.0,...,1.43,0.29,1.96,0.71,1.25,1.82,4.18,3.54,0.04,2022-2023
3,4.0,Himad Abdelli,FRA,"MF,FW",Angers,Ligue 1,23,1999.0,14.0,40.0,...,2.36,1.0,1.29,0.07,1.21,1.07,3.93,0.64,0.0,2022-2023
4,5.0,Salis Abdul Samed,GHA,MF,Lens,Ligue 1,23,2000.0,26.0,39.0,...,1.5,0.65,1.15,0.27,0.88,1.0,2.5,0.65,0.0,2022-2023


# Conclusion

This notebook successfully fetches and cleans data from FBref for fixtures, match stats and player stats. This data can now be used various analysis and modelling for teams and players across Europe's top five teams.

Possible next steps:
* Team analysis for the current season
* Categorise teams into play style
* Use data from fbref to predict football outcomes