In this notebook, I will look to demo fetching data from fbref using the python package BeautifulSoup. The aim is to be able to fetch data to be able to do analysis on how various teams on how they're doing in Europe's top five leagues. To do this we will use the FBref class to grab data for the following:
* current league table
* fixtures and results for given season
* Match stats scuh as xG, shots, goals
* Player stats for Europe's top five leagues

NB: FBref class also cleans that fetched.

# Imports

In [1]:
import os
os.chdir("../")
%autosave 0

Autosave disabled


In [2]:
from src.fbref.fbref_class import FBref

# Grabbing data using BeautifulSoup

Example website we will look to scrape: https://fbref.com/en/

### Instantiate Fbref() class

In [3]:
fb = FBref()

### Define beautifulsoup object for html page of interest

The goal is to grab the table named Big 5 European Leagues as well as grabbing any hyperlinks. We will start by grabbing the html of the entire page

In [4]:
input_url = 'https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats'

beautifulsoup_object = fb.instantiate_beautiful_soup_object(
    input_url
)

### Grab competition dictionary

In [5]:
competition_dict = fb.get_competition_dict()

In [6]:
competition_dict

{'9': 'Premier-League',
 '12': 'La-Liga',
 '11': 'Serie-A',
 '13': 'Ligue-1',
 '20': 'Bundesliga'}

### Get big five league table

In [7]:
big5_df = fb.get_big_5_leagues()

In [9]:
big5_df

Unnamed: 0,Competition Name,Gender,Country,First Season,Last Season,Tier,competition_link,competition_id
0,Premier League,M,eng ENG,1888-1889,2022-2023,1st,https://fbref.com/en/comps/9/history/Premier-L...,9
1,La Liga,M,es ESP,1988-1989,2022-2023,1st,https://fbref.com/en/comps/12/history/La-Liga-...,12
2,Ligue 1,M,fr FRA,1995-1996,2022-2023,1st,https://fbref.com/en/comps/13/history/Ligue-1-...,13
3,Fußball-Bundesliga,M,de GER,1988-1989,2022-2023,1st,https://fbref.com/en/comps/20/history/Bundesli...,20
4,Serie A,M,it ITA,1988-1989,2022-2023,1st,https://fbref.com/en/comps/11/history/Serie-A-...,11
5,Big 5 European Leagues Combined,M,,1995-1996,2022-2023,1st,https://fbref.com/en/comps/Big5/history/Big-5-...,Big5


### Get league table

Define parameters for competition_id and season_name

In [10]:
competition_id = '9'
season_name = '2022-2023'

Get league table for example season, Premier League 2022-2023

In [13]:
league_table_df = fb.get_season_stats_table('league_table', competition_id, season_name)
league_table_df.head(3)

Unnamed: 0,position,team,matches_played,wins,draws,losses,goals_for,goals_against,goal_difference,points,...,xg,xg_against,expected_goal_difference,expected_goal_difference_per_90,goals_per_game,goals_conceded_per_game,xg_per_game,xg_against_per_game,competition_id,season_name
0,1,Arsenal,14,12,1,1,33,11,22,37,...,26.2,11.8,14.3,1.02,2.36,0.79,1.87,0.84,9,2022_2023
1,2,Manchester City,14,10,2,2,40,14,26,32,...,27.6,11.2,16.4,1.17,2.86,1.0,1.97,0.8,9,2022_2023
2,3,Newcastle Utd,15,8,6,1,29,11,18,30,...,24.3,14.3,9.9,0.66,1.93,0.73,1.62,0.95,9,2022_2023


### Get Home and away league table

In [14]:
league_table_home_away_df = fb.get_season_stats_table('home_away_league_table', competition_id, season_name)
league_table_home_away_df.head(3)

Unnamed: 0,rk,squad,home_mp,home_w,home_d,home_l,home_gf,home_ga,home_gd,home_pts,...,away_ga,away_gd,away_pts,away_pts_per_mp,away_xg,away_xga,away_xgd,away_xgd_per_90,competition_id,season_name
0,1,Arsenal,6,6,0,0,19,7,12,18,...,4,10,19,2.38,10.8,7.0,3.9,0.48,9,2022_2023
1,2,Manchester City,8,7,0,1,30,9,21,21,...,5,5,11,1.83,9.5,5.8,3.7,0.61,9,2022_2023
2,3,Newcastle Utd,8,5,3,0,17,5,12,18,...,6,6,12,1.71,8.8,8.3,0.5,0.07,9,2022_2023


### Get fixtures and results for example league and season

In [15]:
fixtures_df = fb.get_fixtures_and_results(competition_id, season_name)

In [16]:
fixtures_df.head(5)

Unnamed: 0,wk,day,date,time,home,home_xg,score,away_xg,away,attendance,venue,referee,match_report,notes,home_id,away_id,fixture_link,kickoff,home_score,away_score
0,1,Fri,2022-08-05,20:00,Crystal Palace,1.2,0–2,1.0,Arsenal,25286,Selhurst Park,Anthony Taylor,Match Report,,47c64c55,18bb7c10,https://fbref.com/en/matches/e62f6e78/Crystal-...,2022-08-05 20:00:00,0.0,2.0
1,1,Sat,2022-08-06,12:30,Fulham,1.2,2–2,1.2,Liverpool,22207,Craven Cottage,Andy Madley,Match Report,,fd962109,822bd0ba,https://fbref.com/en/matches/6713c1dc/Fulham-L...,2022-08-06 12:30:00,2.0,2.0
2,1,Sat,2022-08-06,15:00,Tottenham,1.5,4–1,0.5,Southampton,61732,Tottenham Hotspur Stadium,Andre Marriner,Match Report,,361ca564,33c895d4,https://fbref.com/en/matches/09d8a999/Tottenha...,2022-08-06 15:00:00,4.0,1.0
3,1,Sat,2022-08-06,15:00,Newcastle Utd,1.7,2–0,0.3,Nott'ham Forest,52245,St. James' Park,Simon Hooper,Match Report,,b2b47a98,e4a775cb,https://fbref.com/en/matches/1ac96eb4/Newcastl...,2022-08-06 15:00:00,2.0,0.0
4,1,Sat,2022-08-06,15:00,Leeds United,0.8,2–1,1.3,Wolves,36347,Elland Road,Robert Jones,Match Report,,5bfb9659,8cec06e1,https://fbref.com/en/matches/82702941/Leeds-Un...,2022-08-06 15:00:00,2.0,1.0


### Get stats for a given match

Get match_url and grab id for one of the teams (will choose home team here)

Example match: Crystal Palace vs Arsenal (home side: Palace)

In [17]:
match_url = fixtures_df.fixture_link.iloc[0]
home_team_id = fixtures_df.home_id.iloc[0]

Get dictionary for stats for given match

In [18]:
stat_type_list = ['summary', 'passing', 'passing_types', 'defense', 'possession', 'misc', 'keeper', 'shots']

match_stats_dict = {}
for stat_type in stat_type_list:
    stat_df = fb.get_fixture_stats(
        fixture_url=match_url, team_id=home_team_id, stat_type=stat_type
    )
    match_stats_dict[stat_type] = stat_df
    
    

Get summary stats for Crystal Palace which contains information of goals, assists, xG etc

In [19]:
match_stats_dict['summary'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Performance Gls,Performance Ast,Performance PK,Performance PKatt,...,Expected npxG,Expected xAG,SCA SCA,SCA GCA,Passes Cmp,Passes Att,Passes Cmp%,Passes Prog,Dribbles Succ,Dribbles Att
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,0,0,0,0,...,0.2,0.0,1,0,3,6,50.0,0,1,2
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,0,0,0,0,...,0.1,0.2,0,0,8,10,80.0,0,0,0
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,0,0,0,0,...,0.1,0.6,3,0,30,39,76.9,2,3,5


Get passing stats for Crystal Palace

In [20]:
match_stats_dict['passing'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Total Cmp,Total Att,Total Cmp%,Total TotDist,...,Long Att,Long Cmp%,Ast,xAG,xA,KP,1/3,PPA,CrsPA,Prog
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,3,6,50.0,34,...,0,,0,0.0,0.0,0,0,0,0,0
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,8,10,80.0,78,...,0,,0,0.2,0.0,1,0,0,0,0
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,30,39,76.9,387,...,2,0.0,0,0.6,0.8,2,0,3,1,2


Get passing types stats for Crystal Palace

In [21]:
match_stats_dict['passing_types'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Att,Pass Types Live,Pass Types Dead,Pass Types FK,...,Pass Types Sw,Pass Types Crs,Pass Types TI,Pass Types CK,Corner Kicks In,Corner Kicks Out,Corner Kicks Str,Outcomes Cmp,Outcomes Off,Outcomes Blocks
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,6,6,0,0,...,0,0,0,0,0,0,0,3,0,0
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,10,10,0,0,...,0,0,0,0,0,0,0,8,0,1
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,39,39,0,0,...,0,4,0,0,0,0,0,30,0,2


Get defense stats for Crystal Palace

In [22]:
match_stats_dict['defense'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Tackles Tkl,Tackles TklW,Tackles Def 3rd,Tackles Mid 3rd,...,Vs Dribbles Att,Vs Dribbles Tkl%,Vs Dribbles Past,Blocks Blocks,Blocks Sh,Blocks Pass,Int,Tkl+Int,Clr,Err
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,0,0,0,0,...,0,,0,0,0,0,0,0,1,0
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,0,0,0,0,...,0,,0,0,0,0,0,0,0,0
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,3,2,0,1,...,2,50.0,1,0,0,0,0,3,0,0


Get possession stats for Crystal Palace

In [23]:
match_stats_dict['possession'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Touches Touches,Touches Def Pen,Touches Def 3rd,Touches Mid 3rd,Touches Att 3rd,Touches Att Pen,Touches Live,Dribbles Succ,Dribbles Att,Dribbles Succ%,Dribbles Mis,Dribbles Dis,Receiving Rec,Receiving Prog
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,20,1,3,9,9,4,20,1,2,50.0,2,3,10,0
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,13,0,0,5,8,4,13,0,0,,1,0,10,5
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,51,0,7,17,28,6,51,3,5,60.0,5,4,41,6


Get misc stats for Crystal Palace

In [24]:
match_stats_dict['misc'].head(3)

Unnamed: 0,Player,#,Nation,Pos,Age,Min,Performance CrdY,Performance CrdR,Performance 2CrdY,Performance Fls,...,Performance Crs,Performance Int,Performance TklW,Performance PKwon,Performance PKcon,Performance OG,Performance Recov,Aerial Duels Won,Aerial Duels Lost,Aerial Duels Won%
0,Odsonne Édouard,22,fr FRA,FW,24-201,57,0,0,0,2,...,0,0,0,0,0,0,2,0,1,0.0
1,Jean-Philippe Mateta,14,fr FRA,FW,25-038,33,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0.0
2,Wilfried Zaha,11,ci CIV,LW,29-268,90,0,0,0,2,...,4,0,2,0,0,0,9,0,0,


Get keeper stats for Crystal Palace

In [25]:
match_stats_dict['keeper'].head(3)

Unnamed: 0,Player,Nation,Age,Min,Shot Stopping SoTA,Shot Stopping GA,Shot Stopping Saves,Shot Stopping Save%,Shot Stopping PSxG,Launched Cmp,...,Passes Launch%,Passes AvgLen,Goal Kicks Att,Goal Kicks Launch%,Goal Kicks AvgLen,Crosses Opp,Crosses Stp,Crosses Stp%,Sweeper #OPA,Sweeper AvgDist
0,Vicente Guaita,es ESP,35-207,90,2,2,1,0.0,0.3,7,...,22.9,25.7,4,25.0,29.5,9,1,11.1,2,15.4


Get shots stats for Crystal Palace

In [26]:
match_stats_dict['shots'].head(3)

Unnamed: 0,Minute,Player,Squad,xG,PSxG,Outcome,Distance,Body Part,Notes,SCA 1 Player,SCA 1 Event,SCA 2 Player,SCA 2 Event
0,42,Odsonne Édouard,Crystal Palace,0.1,0.13,Saved,10,Head,,Joachim Andersen,Pass (Live),Eberechi Eze,Pass (Dead)
1,45+1,Odsonne Édouard,Crystal Palace,0.04,,Blocked,17,Right Foot,,Jordan Ayew,Pass (Live),Cheick Doucouré,Pass (Live)
2,45+1,Odsonne Édouard,Crystal Palace,0.08,,Blocked,9,Right Foot,,Odsonne Édouard,Shot,,


### Player stats from top five leagues

Get standard data table

In [29]:
big5_players_standard_df = fb.get_big5_player_stats('standard', season_name)

In [30]:
big5_players_standard_df.head(5)

Unnamed: 0,rank,player_name,country,position,team,competition,age,born,matches_played,starts,...,expected_assists_per_90,xg_plus_expected_assists_per_90,non_penalty_xg_per_90,non_penalty_xg_plus_expected_assists_per_90,matches,goalkeeper,defender,midfielder,attacker,season_name
0,1,Brenden Aaronson,USA,"MF,FW",Leeds United,Premier League,22-058,2000.0,14,14,...,0.22,0.36,0.14,0.36,Matches,False,False,True,True,2022-2023
1,2,Yunis Abdelhamid,MAR,DF,Reims,Ligue 1,35-082,1987.0,15,15,...,0.02,0.1,0.09,0.1,Matches,False,True,False,False,2022-2023
2,3,Himad Abdelli,FRA,"MF,FW",Angers,Ligue 1,23-032,1999.0,7,2,...,0.21,0.36,0.15,0.36,Matches,False,False,True,True,2022-2023
3,4,Salis Abdul Samed,GHA,MF,Lens,Ligue 1,22-268,2000.0,15,15,...,0.02,0.05,0.02,0.05,Matches,False,False,True,False,2022-2023
4,5,Laurent Abergel,FRA,MF,Lorient,Ligue 1,29-321,1993.0,10,10,...,0.06,0.08,0.02,0.08,Matches,False,False,True,False,2022-2023


Get passing table

In [35]:
big5_players_defense_df = fb.get_big5_player_stats('defense', season_name)

In [36]:
big5_players_defense_df.head(5)

Unnamed: 0,Rk,player_name,country,position,team,competition,age,born,no_of_nineties,tackles,...,attempted_tackles_on_dribblers_per_match,unsuccessful_tackles_on_dribblers_per_match,blocks_per_match,shots_blocked_per_match,passes_blocked_per_match,interceptions_per_match,tackles_plus_interceptions_per_match,clearances_per_match,errors_per_match,season_name
0,1.0,Brenden Aaronson,USA,"MF,FW",Leeds United,Premier League,22-058,2000.0,13.2,24.0,...,1.82,1.21,1.59,0.08,1.52,0.08,1.89,0.38,0.0,2022-2023
1,2.0,Yunis Abdelhamid,MAR,DF,Reims,Ligue 1,35-082,1987.0,15.0,33.0,...,1.27,0.27,2.0,0.8,1.2,2.27,4.47,2.87,0.0,2022-2023
2,3.0,Himad Abdelli,FRA,"MF,FW",Angers,Ligue 1,23-032,1999.0,2.6,5.0,...,1.15,0.0,0.0,0.0,0.0,1.15,3.08,0.77,0.0,2022-2023
3,4.0,Salis Abdul Samed,GHA,MF,Lens,Ligue 1,22-268,2000.0,15.0,22.0,...,1.53,0.6,1.13,0.27,0.87,1.2,2.67,0.87,0.0,2022-2023
4,5.0,Laurent Abergel,FRA,MF,Lorient,Ligue 1,29-321,1993.0,9.0,32.0,...,3.0,1.89,1.33,0.22,1.11,1.33,4.89,1.11,0.0,2022-2023


# Conclusion

This notebook successfully fetches and cleans data from FBref for fixtures, match stats and player stats. This data can now be used various analysis and modelling for teams and players across Europe's top five teams.

Possible next steps:
* Team analysis for the current season
* Categorise teams into play style
* Use data from fbref to predict football outcomes