### Fbref.com Importing and Cleaning Soccer Stat Data

Fbref.com scraping.
    
This is the code for pulling player and squad stats in fbref.com. By executing this code, we're pulling player and squad stats in the Big 5 European Leagues (Premier League (England), Ligue 1 (France), LaLiga (Spain), Serie A (Italy) and Bundesliga (Germany), as well as from five other notable European leagues (Primeira Liga (Portugal), Scottish Premiership (Scotland), EFL Championship (England - 2nd tier), Belgian First Division A (Belgium), and Dutch Eredivisie (Netherlands).

In Fbref, it can be difficult to web scrape data from tables. However, Fbref does have url links to stat tables if users want to embed tables into their websites. First, we acquire the links and assign names to them. In this example, there are about 40 urls. Instead of several pages, we can open one page to get the structure of the url. Once we have an url structure, we can conduct string manipulation to acquire all the required urls for our analysis using a nested for loop.

When you access a league page on fbref.com, the page will look something like this:
{Enter screenshot}

The page will show squad stats first, but to get player stats, you need to scroll down to the next table. We would select "Embed Table" under the "Share and Export" dropdown menu:
{Enter screenshot}

We, then, highlight the portion of the link that we need in the pop-up box:
{Enter screenshot}

The url structures are as follows:


In [None]:
#URL structure (players)

#Big 5 Leagues:
"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_[stat_group]"

#Other Leagues:
"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F[league_name]%2Fstats%2F[league_id]-Stats&div=div_stats_[stat_group]"


In [None]:
#URL structure (squads)

#Big 5 Leagues:
"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_[stat_group]_for"

#Other Leagues:
"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F[league_id]%2F[league_name]-Stats&div=div_stats_squads_[stat_group]_for"


As you can see above, we have two sets of url structures: one for the Big 5 European leagues and one for the other leagues which we'll call the "nonbig5". For the nonbig5 url structures, we're going to establish a dictionary that has  "league_name" as keys and "league_id" as values, and a list for "stat_group". For stat group, we'll make two lists of statistical categories (i.e., standard, shooting, etc.) for Big 5 and nonbig5 leagues. We create two lists because the Big 5 Leagues include every statistical category as those of the nonbig5, but also includes advanced statistics (i.e., advanced goalkeeping, defensive actions, possession) that are not available for the leagues outside of the Big 5 Leagues.

In [1]:
league_nonbig5 = {'Primeira-Liga':'32','Scottish-Premiership':'40','Championship':'10','Belgian-First-Division-A':'37',
                  'Dutch-Eredivisie':'23'}

stat_group_nonbig5 = ['standard','keepers','shooting','playing_time','misc']
stat_group_big5 = ['standard','keepers','keepersadv','shooting','passing','passing_types','gca','defense','possession','playing_time','misc']

Let's create two empty lists called player and squad pages which will house the urls we will acquire through the use of nested for loops.

In [2]:
player_pages = []
squad_pages = []

In [3]:
#For player stats (Non-Big5 Leagues):
for key,value in league_nonbig5.items():
    for k in stat_group_nonbig5:
        player_pages.append(str("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F"+str(key)+"%2Fstats%2F"+str(value)+"-Stats&div=div_stats_"+str(k))) 

In [4]:
#For player stats (Big 5 Leagues):
for i in stat_group_big5:
    player_pages.append(str("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_"+str(i)))

#"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_[stat_group]"


In [5]:
#For squad stats (Non-Big 5 Leagues):
for key,value in league_nonbig5.items():
    for k in stat_group_nonbig5:
        squad_pages.append(str("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F"+str(key)+"%2F"+str(value)+"-Stats&div=div_stats_squads_"+str(k)+"_for")) 
        
#"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F[league_id]%2F[league_name]-Stats&div=div_stats_squads_[stat_group]_for"


In [6]:
#For squad stats (Big 5 Leagues):
for i in stat_group_big5:
    squad_pages.append(str("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_"+str(i)+"_for"))

#"https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_[stat_group]_for"

In [None]:
player_pages

In [None]:
squad_pages

Assigning these urls names.

In [7]:
#Standard Stats
primeiraliga_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fstats%2FPrimeira-Liga-Stats&div=div_stats_standard"
scottishprem_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fstats%2FScottish-Premiership-Stats&div=div_stats_standard"
championship_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fstats%2FChampionship-Stats&div=div_stats_standard"
belfirstdiva_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fstats%2FBelgian-First-Division-A-Stats&div=div_stats_standard"
eredivisie_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fstats%2FDutch-Eredivisie-Stats&div=div_stats_standard"
big5_player_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_standard"

primeiraliga_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2FPrimeira-Liga-Stats&div=div_stats_squads_standard_for"
scottishprem_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fstats%2FScottish-Premiership-Stats&div=div_stats_squads_standard_for"
championship_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fstats%2FChampionship-Stats&div=div_stats_squads_standard_for"
belfirstdiva_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2FBelgian-First-Division-A-Stats&div=div_stats_squads_standard_for"
eredivisie_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fstats%2FDutch-Eredivisie-Stats&div=div_stats_squads_standard_for"
big5_squad_stand_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fstats%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_standard_for"


In [8]:
#Goalkeeping 
primeiraliga_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fkeepers%2FPrimeira-Liga-Stats&div=div_stats_keeper"
scottishprem_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fkeepers%2FScottish-Premiership-Stats&div=div_stats_keeper"
championship_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fkeepers%2FChampionship-Stats&div=div_stats_keeper"
belfirstdiva_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fkeepers%2FBelgian-First-Division-A-Stats&div=div_stats_keeper"
eredivisie_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fkeepers%2FDutch-Eredivisie-Stats&div=div_stats_keeper"
big5_player_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fkeepers%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_keeper"

primeiraliga_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fkeepers%2FPrimeira-Liga-Stats&div=div_stats_squads_keeper_for"
scottishprem_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fkeepers%2FScottish-Premiership-Stats&div=div_stats_squads_keeper_for"
championship_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fkeepers%2FChampionship-Stats&div=div_stats_squads_keeper_for"
belfirstdiva_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fkeepers%2FBelgian-First-Division-A-Stats&div=div_stats_squads_keeper_for"
eredivisie_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fkeepers%2FDutch-Eredivisie-Stats&div=div_stats_squads_keeper_for"
big5_squad_gk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fkeepers%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_keeper_for"


In [9]:
#Advanced Goalkeeping
big5_player_advgk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fkeepersadv%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_keeper_adv"

big5_squad_advgk_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fkeepersadv%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_keeper_adv_for"


In [10]:
#Shooting
primeiraliga_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fshooting%2FPrimeira-Liga-Stats&div=div_stats_shooting"
scottishprem_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fshooting%2FScottish-Premiership-Stats&div=div_stats_shooting"
championship_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fshooting%2FChampionship-Stats&div=div_stats_shooting"
belfirstdiva_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fshooting%2FBelgian-First-Division-A-Stats&div=div_stats_shooting"
eredivisie_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fshooting%2FDutch-Eredivisie-Stats&div=div_stats_shooting"
big5_player_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fshooting%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_shooting"

primeiraliga_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fshooting%2FPrimeira-Liga-Stats&div=div_stats_squads_shooting_for"
scottishprem_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fshooting%2FScottish-Premiership-Stats&div=div_stats_squads_shooting_for"
championship_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fshooting%2FChampionship-Stats&div=div_stats_squads_shooting_for"
belfirstdiva_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fshooting%2FBelgian-First-Division-A-Stats&div=div_stats_squads_shooting_for"
eredivisie_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fshooting%2FDutch-Eredivisie-Stats&div=div_stats_squads_shooting_for"
big5_squad_shoot_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fshooting%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_shooting_for"



In [11]:
#Passing
big5_player_pass_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpassing%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_passing"

big5_squad_pass_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpassing%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_passing_for"


In [12]:
#Pass Types
big5_player_ptype_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpassing_types%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_passing_types"

big5_squad_ptype_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpassing_types%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_passing_types_for"



In [13]:
#Goal and Shot Creation
big5_player_gca_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fgca%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_gca"

big5_squad_gca_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fgca%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_gca_for"


In [14]:
#Defensive Actions
big5_player_def_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fdefense%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_defense"

big5_squad_def_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fdefense%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_defense_for"


In [15]:
#Possession
big5_player_poss_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpossession%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_possession"

big5_squad_poss_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fpossession%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_possession_for"

In [16]:
#Playing Time
primeiraliga_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fplayingtime%2FPrimeira-Liga-Stats&div=div_stats_playing_time"
scottishprem_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fplayingtime%2FScottish-Premiership-Stats&div=div_stats_playing_time"
championship_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fplayingtime%2FChampionship-Stats&div=div_stats_playing_time"
belfirstdiva_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fplayingtime%2FBelgian-First-Division-A-Stats&div=div_stats_playing_time"
eredivisie_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fplayingtime%2FDutch-Eredivisie-Stats&div=div_stats_playing_time"
big5_player_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fplayingtime%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_playing_time"

primeiraliga_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fplayingtime%2FPrimeira-Liga-Stats&div=div_stats_squads_playing_time_for"
scottishprem_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fplayingtime%2FScottish-Premiership-Stats&div=div_stats_squads_playing_time_for"
championship_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fplayingtime%2FChampionship-Stats&div=div_stats_squads_playing_time_for"
belfirstdiva_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fplayingtime%2FBelgian-First-Division-A-Stats&div=div_stats_squads_playing_time_for"
eredivisie_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fplayingtime%2FDutch-Eredivisie-Stats&div=div_stats_squads_playing_time_for"
big5_squad_pt_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fplayingtime%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_playing_time_for"


In [17]:
#Miscellaneous Stats
primeiraliga_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fmisc%2FPrimeira-Liga-Stats&div=div_stats_misc"
scottishprem_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fmisc%2FScottish-Premiership-Stats&div=div_stats_misc"
championship_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fmisc%2FChampionship-Stats&div=div_stats_misc"
belfirstdiva_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fmisc%2FBelgian-First-Division-A-Stats&div=div_stats_misc"
eredivisie_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fmisc%2FDutch-Eredivisie-Stats&div=div_stats_misc"
big5_player_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fmisc%2Fplayers%2FBig-5-European-Leagues-Stats&div=div_stats_misc"

primeiraliga_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F32%2Fmisc%2FPrimeira-Liga-Stats&div=div_stats_squads_misc_for"
scottishprem_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F40%2Fmisc%2FScottish-Premiership-Stats&div=div_stats_squads_misc_for"
championship_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F10%2Fmisc%2FChampionship-Stats&div=div_stats_squads_misc_for"
belfirstdiva_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F37%2Fmisc%2FBelgian-First-Division-A-Stats&div=div_stats_squads_misc_for"
eredivisie_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F23%2Fmisc%2FDutch-Eredivisie-Stats&div=div_stats_squads_misc_for"
big5_squad_misc_url = "https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2FBig5%2Fmisc%2Fsquads%2FBig-5-European-Leagues-Stats&div=div_stats_squads_misc_for"


Importing pandas and numpy for the data wrangling tasks.

In [18]:
#Importing packages
import pandas as pd
import numpy as np

We're going to use pandas to read the data in these urls and convert them into several pandas dataframes using the "read_html" function. If someone knows how to automate this process, please share your secret (or not so secret) sauce. 

This may take a several minutes, so be patient!!!

In [19]:
#Standard Stats
primeiraliga_player_stand_df = pd.read_html(primeiraliga_player_stand_url, header=1)[0]
scottishprem_player_stand_df = pd.read_html(scottishprem_player_stand_url, header=1)[0]
championship_player_stand_df = pd.read_html(championship_player_stand_url, header=1)[0]
belfirstdiva_player_stand_df = pd.read_html(belfirstdiva_player_stand_url, header=1)[0]
eredivisie_player_stand_df = pd.read_html(eredivisie_player_stand_url, header=1)[0]
big5_player_stand_df = pd.read_html(big5_player_stand_url, header=1)[0]

primeiraliga_squad_stand_df = pd.read_html(primeiraliga_squad_stand_url, header=1)[0]
scottishprem_squad_stand_df = pd.read_html(scottishprem_squad_stand_url, header=1)[0]
championship_squad_stand_df = pd.read_html(championship_squad_stand_url, header=1)[0]
belfirstdiva_squad_stand_df = pd.read_html(belfirstdiva_squad_stand_url, header=1)[0]
eredivisie_squad_stand_df = pd.read_html(eredivisie_squad_stand_url, header=1)[0]
big5_squad_stand_df = pd.read_html(big5_squad_stand_url, header=1)[0]



In [20]:
primeiraliga_player_stand_df['Comp'] = "Primeira Liga"
scottishprem_player_stand_df['Comp'] = "Scottish Premiership"
championship_player_stand_df['Comp'] = "EFL Championship"
belfirstdiva_player_stand_df['Comp'] = "Belgian First Division A"
eredivisie_player_stand_df['Comp'] = "Eredivisie"

primeiraliga_squad_stand_df['Comp'] = "Primeira Liga"
scottishprem_squad_stand_df['Comp'] = "Scottish Premiership"
championship_squad_stand_df['Comp'] = "EFL Championship"
belfirstdiva_squad_stand_df['Comp'] = "Belgian First Division A"
eredivisie_squad_stand_df['Comp'] = "Eredivisie"

#primeiraliga_squad_gk_df['Country'] = "Portugal"
#scottishprem_squad_gk_df['Country'] = "Scotland"
#championship_squad_gk_df['Country'] = "England"
#belfirstdiva_squad_gk_df['Country'] = "Belgium"
#eredivisie_squad_gk_df['Country'] = "Netherlands"

Using the pandas concatenate function "concat" to aggregate the standard stats for non-Big 5 Leagues.

In [21]:
# Aggregating standard stats dataframe for non-big 5
player_stand_df = pd.concat([primeiraliga_player_stand_df,scottishprem_player_stand_df,championship_player_stand_df,
                             belfirstdiva_player_stand_df,eredivisie_player_stand_df,big5_player_stand_df])

player_stand_df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Starts,Min,...,Comp,xG,npxG,xA,npxG+xA,xG.1,xA.1,xG+xA,npxG.1,npxG+xA.1
0,1,Rodrigo Abascal,uy URU,DF,Boavista,28-029,1994,15,15,1307,...,Primeira Liga,,,,,,,,,
1,2,Giorgi Aburjania,ge GEO,MF,Gil Vicente FC,27-041,1995,16,3,371,...,Primeira Liga,,,,,,,,,
2,3,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,22,22,1980,...,Primeira Liga,,,,,,,,,
3,4,JoÃ£o Afonso,pt POR,DF,Santa Clara,31-260,1990,12,10,938,...,Primeira Liga,,,,,,,,,
4,5,Lucas Ãfrico,br BRA,DF,Estoril,27-007,1995,11,11,967,...,Primeira Liga,,,,,,,,,


In [None]:
player_stand_df.columns

In [22]:
player_stand_df = player_stand_df[['Player','Nation', 'Pos', 'Squad','Comp', 'Age', 'Born','MP','Starts',
                                   'Min', '90s', 'Gls', 'Ast', 'G-PK', 'PK', 'PKatt', 'CrdY', 'CrdR','Gls.1','Ast.1', 'G+A',
                                   'G-PK.1', 'G+A-PK','xG', 'npxG', 'xA', 'npxG+xA', 'xG.1', 'xA.1','xG+xA', 'npxG.1',
                                   'npxG+xA.1']]

player_stand_df.head()

Unnamed: 0,Player,Nation,Pos,Squad,Comp,Age,Born,MP,Starts,Min,...,G+A-PK,xG,npxG,xA,npxG+xA,xG.1,xA.1,xG+xA,npxG.1,npxG+xA.1
0,Rodrigo Abascal,uy URU,DF,Boavista,Primeira Liga,28-029,1994,15,15,1307,...,0.0,,,,,,,,,
1,Giorgi Aburjania,ge GEO,MF,Gil Vicente FC,Primeira Liga,27-041,1995,16,3,371,...,0.73,,,,,,,,,
2,Antonio AdÃ¡n,es ESP,GK,Sporting CP,Primeira Liga,34-275,1987,22,22,1980,...,0.0,,,,,,,,,
3,JoÃ£o Afonso,pt POR,DF,Santa Clara,Primeira Liga,31-260,1990,12,10,938,...,0.0,,,,,,,,,
4,Lucas Ãfrico,br BRA,DF,Estoril,Primeira Liga,27-007,1995,11,11,967,...,0.0,,,,,,,,,


In [23]:
squad_stand_df = pd.concat([primeiraliga_squad_stand_df,scottishprem_squad_stand_df,championship_squad_stand_df,
                            belfirstdiva_squad_stand_df,eredivisie_squad_stand_df,big5_squad_stand_df])

squad_stand_df.head()

Unnamed: 0,Squad,# Pl,Age,Poss,MP,Starts,Min,90s,Gls,Ast,...,Rk,xG,npxG,xA,npxG+xA,xG.1,xA.1,xG+xA,npxG.1,npxG+xA.1
0,Arouca,32,27.6,48.8,21,231,1890,21.0,19,10,...,,,,,,,,,,
1,Belenenses,34,25.5,42.8,21,229,1890,21.0,12,7,...,,,,,,,,,,
2,Benfica,30,27.8,64.2,21,231,1890,21.0,55,45,...,,,,,,,,,,
3,Boavista,26,26.7,43.0,21,228,1890,21.0,23,15,...,,,,,,,,,,
4,Braga,33,25.9,54.8,21,231,1890,21.0,37,24,...,,,,,,,,,,


In [24]:
player_stand_df = player_stand_df.rename(columns={'Gls.1':'Gls90','Ast.1':'Ast90','G+A':'G+A90', 'G-PK.1':'G-PK90',
                                                  'G+A-PK':'G+A-PK90','xG.1':'xG90','xA.1':'xA90','xG+xA':'xG+xA90',
                                                  'npxG.1':'npxG90','npxG+xA.1':'npxG+xA90'})
player_stand_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5497 entries, 0 to 2845
Data columns (total 32 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Player     5497 non-null   object
 1   Nation     5491 non-null   object
 2   Pos        5496 non-null   object
 3   Squad      5497 non-null   object
 4   Comp       5497 non-null   object
 5   Age        5491 non-null   object
 6   Born       5491 non-null   object
 7   MP         5497 non-null   object
 8   Starts     5497 non-null   object
 9   Min        5497 non-null   object
 10  90s        5497 non-null   object
 11  Gls        5497 non-null   object
 12  Ast        5497 non-null   object
 13  G-PK       5497 non-null   object
 14  PK         5497 non-null   object
 15  PKatt      5497 non-null   object
 16  CrdY       5497 non-null   object
 17  CrdR       5497 non-null   object
 18  Gls90      5497 non-null   object
 19  Ast90      5497 non-null   object
 20  G+A90      5497 non-null   obj

Description of variables:
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
Born: The year the player was born.
MP: Total matches played by player.
Starts: Total starts for player.
Min: Total minutes played by player.
90s: Total minutes played by player per 90 minutes (Min/90).
Gls: Total goals scored by player.
Ast: Total assists by player.
G-PK: Total non-penalty kick goals by player.
PK: Goals by penalty kick by player.
PKatt: Penalty kick attempts by player.
CrdY: Number of total yellow cards by player.
CrdR: Number of total red cards by player.
Gls90: Goals scored by player per 90 minutes.
Ast90: Assists by player per 90 minutes.
G+A90: Goals scored plus assists by player per 90 minutes.
G-PK90: Total non-penalty kick goals by player per 90 minutes.
G+A-PK90: Goals scored plus assists minus goals scored by penalty kick by player per 90 minutes.
Matches: Link to individual match data for player.
Competition: Name of the league that the player plays in.
Country: Name of the country that the league is based in.

In [25]:
squad_stand_df = squad_stand_df.rename(columns={'Gls.1':'Gls90','Ast.1':'Ast90','G+A':'G+A90', 'G-PK.1':'G-PK90',
                                                'G+A-PK':'G+A-PK90','xG.1':'xG90','xA.1':'xA90','xG+xA':'xG+xA90',
                                                'npxG.1':'npxG90','npxG+xA.1':'npxG+xA90'})
squad_stand_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 188 entries, 0 to 97
Data columns (total 31 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Squad      188 non-null    object 
 1   # Pl       188 non-null    int64  
 2   Age        188 non-null    float64
 3   Poss       188 non-null    float64
 4   MP         188 non-null    int64  
 5   Starts     188 non-null    int64  
 6   Min        188 non-null    int64  
 7   90s        188 non-null    float64
 8   Gls        188 non-null    int64  
 9   Ast        188 non-null    int64  
 10  G-PK       188 non-null    int64  
 11  PK         188 non-null    int64  
 12  PKatt      188 non-null    int64  
 13  CrdY       188 non-null    int64  
 14  CrdR       188 non-null    int64  
 15  Gls90      188 non-null    float64
 16  Ast90      188 non-null    float64
 17  G+A90      188 non-null    float64
 18  G-PK90     188 non-null    float64
 19  G+A-PK90   188 non-null    float64
 20  Comp       

In [26]:
squad_stand_df = squad_stand_df[['Squad','# Pl','Age','Comp','MP','Starts','Min','Poss','90s', 'Gls', 'Ast',
                                  'G-PK', 'PK', 'PKatt', 'CrdY', 'CrdR','Gls90','Ast90', 'G+A90','G-PK', 'G+A-PK90','xG', 'npxG',
                                  'xA', 'npxG+xA', 'xG90', 'xA90','xG+xA90', 'npxG90','npxG+xA90']]

squad_stand_df.head()

Unnamed: 0,Squad,# Pl,Age,Comp,MP,Starts,Min,Poss,90s,Gls,...,G+A-PK90,xG,npxG,xA,npxG+xA,xG90,xA90,xG+xA90,npxG90,npxG+xA90
0,Arouca,32,27.6,Primeira Liga,21,231,1890,48.8,21.0,19,...,1.29,,,,,,,,,
1,Belenenses,34,25.5,Primeira Liga,21,229,1890,42.8,21.0,12,...,0.86,,,,,,,,,
2,Benfica,30,27.8,Primeira Liga,21,231,1890,64.2,21.0,55,...,4.67,,,,,,,,,
3,Boavista,26,26.7,Primeira Liga,21,228,1890,43.0,21.0,23,...,1.76,,,,,,,,,
4,Braga,33,25.9,Primeira Liga,21,231,1890,54.8,21.0,37,...,2.71,,,,,,,,,


Description of variables:
Squad: Name of squad/team.
#Pl : Number of players on squad.
Age: Average age of players on the squad.
Poss: Percentage of the squad having possession of the ball in a game, on average.
MP: Total matches played by squad.
Starts: Total starts for squad (irrelevant)
Min: Total minutes played by squad.
90s: Total minutes played by squad per 90 minutes (Min/90).
Gls: Total goals scored by squad.
Ast: Total assists by squad.
G-PK: Total non-penalty kick goals by squad.
PK: Goals by penalty kick by squad.
PKatt: Penalty kick attempts by squad.
CrdY: Number of total yellow cards by squad.
CrdR: Number of total red cards by squad.
Gls/90: Goals scored by squad per 90 minutes.
Ast/90: Assists by squad per 90 minutes.
G+A/90: Goals scored plus assists by squad per 90 minutes.
G-PK/90: Total non-penalty kick goals by squad per 90 minutes.
G+A-PK/90: Goals scored plus assists minus goals scored by penalty kick by squad per 90 minutes.
Competition: Name of the league that the squad plays in.
Country: Name of the country that the league is based in.


In [27]:
#Goalkeeper Stats: Player stats
primeiraliga_player_gk_df = pd.read_html(primeiraliga_player_gk_url, header=1)[0]
scottishprem_player_gk_df = pd.read_html(scottishprem_player_gk_url, header=1)[0]
championship_player_gk_df = pd.read_html(championship_player_gk_url, header=1)[0]
belfirstdiva_player_gk_df = pd.read_html(belfirstdiva_player_gk_url, header=1)[0]
eredivisie_player_gk_df = pd.read_html(eredivisie_player_gk_url, header=1)[0]
big5_player_gk_df = pd.read_html(big5_player_gk_url, header=1)[0]
big5_player_advgk_df = pd.read_html(big5_player_advgk_url, header=1)[0]

In [28]:
primeiraliga_squad_gk_df = pd.read_html(primeiraliga_squad_gk_url, header=1)[0]
scottishprem_squad_gk_df = pd.read_html(scottishprem_squad_gk_url, header=1)[0]
championship_squad_gk_df = pd.read_html(championship_squad_gk_url, header=1)[0]
belfirstdiva_squad_gk_df = pd.read_html(belfirstdiva_squad_gk_url, header=1)[0]
eredivisie_squad_gk_df = pd.read_html(belfirstdiva_squad_gk_url, header=1)[0]
big5_squad_gk_df = pd.read_html(big5_squad_gk_url, header=1)[0]
big5_squad_advgk_df = pd.read_html(big5_squad_advgk_url, header=1)[0]


In [29]:
primeiraliga_player_gk_df['Comp'] = "Primeira Liga"
scottishprem_player_gk_df['Comp'] = "Scottish Premiership"
championship_player_gk_df['Comp'] = "EFL Championship"
belfirstdiva_player_gk_df['Comp'] = "Belgian First Division A"
eredivisie_player_gk_df['Comp'] = "Eredivisie"

primeiraliga_squad_gk_df['Comp'] = "Primeira Liga"
scottishprem_squad_gk_df['Comp'] = "Scottish Premiership"
championship_squad_gk_df['Comp'] = "EFL Championship"
belfirstdiva_squad_gk_df['Comp'] = "Belgian First Division A"
eredivisie_squad_gk_df['Comp'] = "Eredivisie"


In [30]:
# Aggregating gk stats dataframe for non-big 5
player_gk_df = pd.concat([primeiraliga_player_gk_df,scottishprem_player_gk_df,championship_player_gk_df,
                          belfirstdiva_player_gk_df,eredivisie_player_gk_df,big5_player_gk_df,big5_player_advgk_df])

player_gk_df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Starts,Min,...,AvgLen,Att.2,Launch%.1,AvgLen.1,Opp,Stp,Stp%,#OPA,#OPA/90,AvgDist
0,1,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,22,22,1980,...,,,,,,,,,,
1,2,Andrew,br BRA,GK,Gil Vicente FC,20-226,2001,2,2,180,...,,,,,,,,,,
2,3,Brian AraÃºjo,fr FRA,GK,Gil Vicente FC,21-289,2000,1,1,90,...,,,,,,,,,,
3,4,Fernando Augusto,br BRA,GK,Arouca,24-319,1997,7,6,619,...,,,,,,,,,,
4,5,Alireza Beiranvand,ir IRN,GK,Boavista,29-144,1992,8,8,675,...,,,,,,,,,,


In [None]:
player_gk_df.tail()

In [31]:
# Aggregating gk stats dataframe for non-big 5
squad_gk_df = pd.concat([primeiraliga_squad_gk_df,scottishprem_squad_gk_df,championship_squad_gk_df,belfirstdiva_squad_gk_df,
                         eredivisie_squad_gk_df,big5_squad_gk_df,big5_squad_advgk_df])

squad_gk_df.head()

Unnamed: 0,Squad,# Pl,MP,Starts,Min,90s,GA,GA90,SoTA,Saves,...,AvgLen,Att.2,Launch%.1,AvgLen.1,Opp,Stp,Stp%,#OPA,#OPA/90,AvgDist
0,Arouca,3,21.0,21.0,1886.0,21.0,38,1.81,90.0,56.0,...,,,,,,,,,,
1,Belenenses,2,21.0,21.0,1890.0,21.0,41,1.95,108.0,72.0,...,,,,,,,,,,
2,Benfica,2,21.0,21.0,1890.0,21.0,19,0.9,62.0,45.0,...,,,,,,,,,,
3,Boavista,2,21.0,21.0,1890.0,21.0,33,1.57,88.0,57.0,...,,,,,,,,,,
4,Braga,1,21.0,21.0,1890.0,21.0,22,1.05,65.0,44.0,...,,,,,,,,,,


In [32]:
player_gk_df = player_gk_df.rename(columns={'Save%.1':'Save%_PK','/90':'PSxG-GA90','Cmp%':'Cmp_pct', 'Launch%':'Launch_pct_pass',
                                            'Att.1':'Att_pass','Launch%.1':'GK_launch_pct','Att.2':'GKA',
                                            'Launch%.1':'GK_launch_pct', 'AvgLen.1':'GK_Avglen','Opp':'Cross_opp',
                                            'Stp':'Cross_opp_stop', 'Stp%':'Cross_opp_stop_pct','#OPA':'Def_OPA',
                                            '#OPA/90':'Def_OPA90','AvgDist':'Def_OPA_AvgDist','On-Off.1':'xGOn-Off90'})


In [33]:
player_gk_df = player_gk_df[['Player','Nation','Pos','Squad','Age','Born','Comp','MP','Starts','Min','90s',
                             'GA','GA90','SoTA','Saves','Save%','W','D','L','CS','CS%','PKA','PKsv','PKm','Save%_PK','FK', 
                             'CK', 'OG', 'PSxG', 'PSxG/SoT','PSxG+/-','PSxG-GA90','Cmp','Att','Cmp_pct','Att_pass','Thr',
                             'Launch_pct_pass', 'AvgLen', 'GKA', 'GK_launch_pct', 'GK_Avglen','Cross_opp','Cross_opp_stop',
                             'Cross_opp_stop_pct', 'Def_OPA','Def_OPA90','Def_OPA_AvgDist']]

player_gk_df.head()

Unnamed: 0,Player,Nation,Pos,Squad,Age,Born,Comp,MP,Starts,Min,...,AvgLen,GKA,GK_launch_pct,GK_Avglen,Cross_opp,Cross_opp_stop,Cross_opp_stop_pct,Def_OPA,Def_OPA90,Def_OPA_AvgDist
0,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,Primeira Liga,22,22,1980,...,,,,,,,,,,
1,Andrew,br BRA,GK,Gil Vicente FC,20-226,2001,Primeira Liga,2,2,180,...,,,,,,,,,,
2,Brian AraÃºjo,fr FRA,GK,Gil Vicente FC,21-289,2000,Primeira Liga,1,1,90,...,,,,,,,,,,
3,Fernando Augusto,br BRA,GK,Arouca,24-319,1997,Primeira Liga,7,6,619,...,,,,,,,,,,
4,Alireza Beiranvand,ir IRN,GK,Boavista,29-144,1992,Primeira Liga,8,8,675,...,,,,,,,,,,


In [None]:
player_gk_df.columns

Description of variables:
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
Born: The year the player was born.
MP: Total matches played by player.
Starts: Total starts for player.
Min: Total minutes played by player.
90s: Total minutes played by player per 90 minutes (Min/90).
GA: Goals against or conceded by player.
GA90: Goals against or conceded by player per 90 minutes.
SoTA: Shots on target against the player.
Saves: Shots that were saved by player.
Save%: Percentage of all shots on target that were saved by player.
W: Wins by the squad when player was goalkeeping.
D: Draws by the squad when player was goalkeeping.
L: Losses by the squad when player was goalkeeping.
CS: Total number of "clean sheets" - not conceding a goal in a game.
CS%: Percentage of games in which the player had a clean sheet.
PKatt: Penalty kick attempts against player.
PKA: Penalty kick goals conceded against player.
PKsv: Penalty kick saves made by player.
PKm: Penalty kicks against player that were missed or off-target from goal.
Save%_PK: Percentage of penalty kicks that were saved by player.
Matches: Link to individual match data for player.
Competition: Name of the league that the player plays in.
Country: Name of the country that the league is based in.


In [34]:
#Shooting Stats:
primeiraliga_player_shoot_df = pd.read_html(primeiraliga_player_shoot_url, header=1)[0]
scottishprem_player_shoot_df = pd.read_html(scottishprem_player_shoot_url, header=1)[0]
championship_player_shoot_df = pd.read_html(championship_player_shoot_url, header=1)[0]
belfirstdiva_player_shoot_df = pd.read_html(belfirstdiva_player_shoot_url, header=1)[0]
eredivisie_player_shoot_df = pd.read_html(eredivisie_player_shoot_url, header=1)[0]
big5_player_shoot_df = pd.read_html(big5_player_shoot_url, header=1)[0]

primeiraliga_squad_shoot_df = pd.read_html(primeiraliga_squad_shoot_url, header=1)[0]
scottishprem_squad_shoot_df = pd.read_html(scottishprem_squad_shoot_url, header=1)[0]
championship_squad_shoot_df = pd.read_html(championship_squad_shoot_url, header=1)[0]
belfirstdiva_squad_shoot_df = pd.read_html(belfirstdiva_squad_shoot_url, header=1)[0]
eredivisie_squad_shoot_df = pd.read_html(eredivisie_squad_shoot_url, header=1)[0]
big5_squad_shoot_df = pd.read_html(big5_squad_shoot_url, header=1)[0]


In [35]:
primeiraliga_player_shoot_df['Comp'] = "Primeira Liga"
scottishprem_player_shoot_df['Comp'] = "Scottish Premiership"
championship_player_shoot_df['Comp'] = "EFL Championship"
belfirstdiva_player_shoot_df['Comp'] = "Belgian First Division A"
eredivisie_player_shoot_df['Comp'] = "Eredivisie"

primeiraliga_squad_shoot_df['Comp'] = "Primeira Liga"
scottishprem_squad_shoot_df['Comp'] = "Scottish Premiership"
championship_squad_shoot_df['Comp'] = "EFL Championship"
belfirstdiva_squad_shoot_df['Comp'] = "Belgian First Division A"
eredivisie_squad_shoot_df['Comp'] = "Eredivisie"


In [36]:
# Aggregating shooting stats dataframe for non-big 5
player_shoot_df = pd.concat([primeiraliga_player_shoot_df,scottishprem_player_shoot_df,championship_player_shoot_df,
                             belfirstdiva_player_shoot_df,eredivisie_player_shoot_df,big5_player_shoot_df])

player_shoot_df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,Gls,Sh,...,PK,PKatt,Matches,Comp,FK,xG,npxG,npxG/Sh,G-xG,np:G-xG
0,1,Rodrigo Abascal,uy URU,DF,Boavista,28-029,1994,14.5,0,6,...,0,0,Matches,Primeira Liga,,,,,,
1,2,Giorgi Aburjania,ge GEO,MF,Gil Vicente FC,27-041,1995,4.1,2,7,...,0,0,Matches,Primeira Liga,,,,,,
2,3,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,22.0,0,0,...,0,0,Matches,Primeira Liga,,,,,,
3,4,JoÃ£o Afonso,pt POR,DF,Santa Clara,31-260,1990,10.4,0,4,...,0,0,Matches,Primeira Liga,,,,,,
4,5,Lucas Ãfrico,br BRA,DF,Estoril,27-007,1995,10.7,0,5,...,0,0,Matches,Primeira Liga,,,,,,


Description of variables:
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
Born: The year the player was born.
90s: Total minutes played by player per 90 minutes (Min/90).
Gls: 
Sh: Total number of shots taken by player.
SoT: Total number of shots taken by player that were on target.
SoT%: The percentage of total number of shots that were on target.
Sh/90: Shots taken per 90 minutes.
SoT/90: Shots that were on target per 90 minutes.
G/Sh: Goals scored per shot taken.
G/SoT: Goals scored per shot on target.
Dist: Average distance per shot.
PK: Goals scored by penalty kick.
PKatt: Total number of penalty kick attempts.
Matches: Link to individual match data for player.

In [37]:
# Aggregating shooting stats dataframe for non-big 5
squad_shoot_df = pd.concat([primeiraliga_squad_shoot_df,scottishprem_squad_shoot_df,championship_squad_shoot_df,
                            belfirstdiva_squad_shoot_df,eredivisie_squad_shoot_df,big5_squad_shoot_df])

squad_shoot_df.head()

Unnamed: 0,Squad,# Pl,90s,Gls,Sh,SoT,SoT%,Sh/90,SoT/90,G/Sh,...,PK,PKatt,Comp,Rk,FK,xG,npxG,npxG/Sh,G-xG,np:G-xG
0,Arouca,32,21.0,19,240,68,28.3,11.43,3.24,0.07,...,2,3,Primeira Liga,,,,,,,
1,Belenenses,34,21.0,12,188,60,31.9,8.95,2.86,0.06,...,1,1,Primeira Liga,,,,,,,
2,Benfica,30,21.0,55,316,133,42.1,15.05,6.33,0.17,...,2,2,Primeira Liga,,,,,,,
3,Boavista,26,21.0,23,232,83,35.8,11.05,3.95,0.09,...,1,1,Primeira Liga,,,,,,,
4,Braga,33,21.0,37,285,108,37.9,13.57,5.14,0.12,...,4,5,Primeira Liga,,,,,,,


Description of variables:
Squad: Name of squad/team.
#Pl : Number of players on squad.
90s: Total minutes played by squad per 90 minutes (Min/90).
Gls: Goals scored.
Sh: Total shots taken by squad.
SoT: Shots that were on target by squad.
SoT%: Percentage of total shots that were on target.
Sh/90: Shots by squad per 90 minutes.
SoT/90: Shots that were on target by squad per 90 minutes.
G/Sh: Goals scored by squad per shot.
G/SoT: Goals scored by squad per shot on target.
PK: Goals scored by penalty kick.
PKatt: Total number of penalty kick attempts.

In [38]:
#Playing Time Stats:
primeiraliga_player_pt_df = pd.read_html(primeiraliga_player_pt_url, header=1)[0]
scottishprem_player_pt_df = pd.read_html(scottishprem_player_pt_url, header=1)[0]
championship_player_pt_df = pd.read_html(championship_player_pt_url, header=1)[0]
belfirstdiva_player_pt_df = pd.read_html(belfirstdiva_player_pt_url, header=1)[0]
eredivisie_player_pt_df = pd.read_html(eredivisie_player_pt_url, header=1)[0]
big5_player_pt_df = pd.read_html(big5_player_pt_url, header=1)[0]

primeiraliga_squad_pt_df = pd.read_html(primeiraliga_squad_pt_url, header=1)[0]
scottishprem_squad_pt_df = pd.read_html(scottishprem_squad_pt_url, header=1)[0]
championship_squad_pt_df = pd.read_html(championship_squad_pt_url, header=1)[0]
belfirstdiva_squad_pt_df = pd.read_html(belfirstdiva_squad_pt_url, header=1)[0]
eredivisie_squad_pt_df = pd.read_html(eredivisie_squad_pt_url, header=1)[0]
big5_squad_pt_df = pd.read_html(big5_squad_pt_url, header=1)[0]


In [39]:
primeiraliga_player_pt_df['Comp'] = "Primeira Liga"
scottishprem_player_pt_df['Comp'] = "Scottish Premiership"
championship_player_pt_df['Comp'] = "EFL Championship"
belfirstdiva_player_pt_df['Comp'] = "Belgian First Division A"
eredivisie_player_pt_df['Comp'] = "Eredivisie"

primeiraliga_squad_pt_df['Comp'] = "Primeira Liga"
scottishprem_squad_pt_df['Comp'] = "Scottish Premiership"
championship_squad_pt_df['Comp'] = "EFL Championship"
belfirstdiva_squad_pt_df['Comp'] = "Belgian First Division A"
eredivisie_squad_pt_df['Comp'] = "Eredivisie"

In [40]:
# Aggregating Playing Time stats dataframe for non-big 5
player_pt_df = pd.concat([primeiraliga_player_pt_df,scottishprem_player_pt_df,championship_player_pt_df,
                          belfirstdiva_player_pt_df,eredivisie_player_pt_df,big5_player_pt_df])

player_pt_df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,MP,Min,Mn/MP,...,+/-,+/-90,On-Off,Matches,Comp,onxG,onxGA,xG+/-,xG+/-90,On-Off.1
0,1,Rodrigo Abascal,uy URU,DF,Boavista,28-029,1994,15,1307.0,87.0,...,-8.0,-0.55,-0.4,Matches,Primeira Liga,,,,,
1,2,Giorgi Aburjania,ge GEO,MF,Gil Vicente FC,27-041,1995,16,371.0,23.0,...,7.0,1.7,1.64,Matches,Primeira Liga,,,,,
2,3,Mohamed Achouri,fr FRA,FW,Estoril,23-002,1999,0,,,...,,,,Matches,Primeira Liga,,,,,
3,4,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,22,1980.0,90.0,...,28.0,1.27,,Matches,Primeira Liga,,,,,
4,5,Emmanuel Adeyemo,ng NGA,MF,FC Vizela,19-267,2002,0,,,...,,,,Matches,Primeira Liga,,,,,


In [None]:
player_pt_df.columns

In [41]:
player_pt_df = player_pt_df.rename(columns={'On-Off.1':'xGOn-Off90'})

Description of variables:
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
Born: The year the player was born.
MP: Matches played.
Min: Minutes played.
Mn/MP: Minutes per matches played.
Min%: Percentage of overall minutes played.
90s: Total minutes played by player per 90 minutes (Min/90).
Starts: Total starts for player.
Mn/Start: Minutes per start.
Compl: Number of matches a player plays all minutes.
Subs: Total matches a player came on as a substitute.
Mn/Sub: Minutes per match as a substitute.
unSub: Number of matches not started but not coming on as a substitute.
PPM: Points per match. Average number of points earned by the team from matches in which the player appeared
Minimum 30 minutes played per squad game to qualify as a leader.
onG: Goals scored by squad when player was on pitch.
onGA: Goals conceded by squad when plyaer was on pitch.
+/-: Goals scored minus goals allowed by the team while the player was on the pitch.
+/-90: Goals scored minus goals allowed by the team while the player was on the pitch per 90 minutes played.
Minimum 30 minutes played per squad game to qualify as a leader.
On-Off: Net goals per 90 minutes by the team while the player was on the pitch minus net goals allowed per 90 minutes by the team while the player was off the pitch.
xGOn-Off90:
Matches: Link to individual match data for player.

In [42]:
squad_pt_df = pd.concat([primeiraliga_squad_pt_df,scottishprem_squad_pt_df,championship_squad_pt_df,
                         belfirstdiva_squad_pt_df,eredivisie_squad_pt_df,big5_squad_pt_df])

squad_pt_df.head()

Unnamed: 0,Squad,# Pl,Age,MP,Min,Mn/MP,Min%,90s,Starts,Mn/Start,...,onG,onGA,+/-,+/-90,Comp,Rk,onxG,onxGA,xG+/-,xG+/-90
0,Arouca,32,27.6,21,1890,90,100,21.0,231,80,...,19,38,-19,-0.9,Primeira Liga,,,,,
1,Belenenses,34,25.5,21,1890,90,100,21.0,229,80,...,13,41,-28,-1.33,Primeira Liga,,,,,
2,Benfica,30,27.8,21,1890,90,100,21.0,231,79,...,56,19,37,1.76,Primeira Liga,,,,,
3,Boavista,26,26.7,21,1890,90,100,21.0,228,83,...,24,33,-9,-0.43,Primeira Liga,,,,,
4,Braga,33,25.9,21,1890,90,100,21.0,231,79,...,37,22,15,0.71,Primeira Liga,,,,,


In [None]:
squad_pt_df.columns

In [43]:
#Miscellaneous Stats: squad stats
primeiraliga_player_misc_df = pd.read_html(primeiraliga_player_misc_url, header=1)[0]
scottishprem_player_misc_df = pd.read_html(scottishprem_player_misc_url, header=1)[0]
championship_player_misc_df = pd.read_html(championship_player_misc_url, header=1)[0]
belfirstdiva_player_misc_df = pd.read_html(belfirstdiva_player_misc_url, header=1)[0]
eredivisie_player_misc_df = pd.read_html(eredivisie_player_misc_url, header=1)[0]
big5_player_misc_df = pd.read_html(big5_player_misc_url, header=1)[0]

primeiraliga_squad_misc_df = pd.read_html(primeiraliga_squad_misc_url, header=1)[0]
scottishprem_squad_misc_df = pd.read_html(scottishprem_squad_misc_url, header=1)[0]
championship_squad_misc_df = pd.read_html(championship_squad_misc_url, header=1)[0]
belfirstdiva_squad_misc_df = pd.read_html(belfirstdiva_squad_misc_url, header=1)[0]
eredivisie_squad_misc_df = pd.read_html(eredivisie_squad_misc_url, header=1)[0]
big5_squad_misc_df = pd.read_html(big5_squad_misc_url, header=1)[0]

In [44]:
primeiraliga_player_misc_df['Comp'] = "Primeira Liga"
scottishprem_player_misc_df['Comp'] = "Scottish Premiership"
championship_player_misc_df['Comp'] = "EFL Championship"
belfirstdiva_player_misc_df['Comp'] = "Belgian First Division A"
eredivisie_player_misc_df['Comp'] = "Eredivisie"

primeiraliga_squad_misc_df['Comp'] = "Primeira Liga"
scottishprem_squad_misc_df['Comp'] = "Scottish Premiership"
championship_squad_misc_df['Comp'] = "EFL Championship"
belfirstdiva_squad_misc_df['Comp'] = "Belgian First Division A"
eredivisie_squad_misc_df['Comp'] = "Eredivisie"

In [45]:
# Aggregating Miscellaneous stats dataframe for non-big 5
player_misc_df = pd.concat([primeiraliga_player_misc_df,scottishprem_player_misc_df,championship_player_misc_df,
                            belfirstdiva_player_misc_df,eredivisie_player_misc_df,big5_player_misc_df])

player_misc_df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Age,Born,90s,CrdY,CrdR,...,TklW,PKwon,PKcon,OG,Matches,Comp,Recov,Won,Lost,Won%
0,1,Rodrigo Abascal,uy URU,DF,Boavista,28-029,1994,14.5,5,1,...,20,,,0,Matches,Primeira Liga,,,,
1,2,Giorgi Aburjania,ge GEO,MF,Gil Vicente FC,27-041,1995,4.1,4,0,...,6,,,0,Matches,Primeira Liga,,,,
2,3,Antonio AdÃ¡n,es ESP,GK,Sporting CP,34-275,1987,22.0,2,0,...,0,,,0,Matches,Primeira Liga,,,,
3,4,JoÃ£o Afonso,pt POR,DF,Santa Clara,31-260,1990,10.4,0,0,...,6,,,0,Matches,Primeira Liga,,,,
4,5,Lucas Ãfrico,br BRA,DF,Estoril,27-007,1995,10.7,3,0,...,4,,,0,Matches,Primeira Liga,,,,


In [46]:
# Aggregating Miscellaneous stats dataframe for non-big 5
squad_misc_df = pd.concat([primeiraliga_squad_misc_df,scottishprem_squad_misc_df,championship_squad_misc_df,
                           belfirstdiva_squad_misc_df,eredivisie_squad_misc_df,big5_squad_misc_df])

squad_misc_df.head()

Unnamed: 0,Squad,# Pl,90s,CrdY,CrdR,2CrdY,Fls,Fld,Off,Crs,...,TklW,PKwon,PKcon,OG,Comp,Rk,Recov,Won,Lost,Won%
0,Arouca,32,21.0,64,5,3,344,327,31,384,...,203,,,2,Primeira Liga,,,,,
1,Belenenses,34,21.0,70,7,3,350,307,39,283,...,165,,,1,Primeira Liga,,,,,
2,Benfica,30,21.0,44,2,1,296,268,53,439,...,207,,,1,Primeira Liga,,,,,
3,Boavista,26,21.0,71,5,3,321,294,34,310,...,205,,,0,Primeira Liga,,,,,
4,Braga,33,21.0,54,2,1,299,273,30,321,...,189,,,0,Primeira Liga,,,,,


In [47]:
big5_player_pass_df = pd.read_html(big5_player_pass_url, header=1)[0]
big5_player_ptype_df = pd.read_html(big5_player_ptype_url, header=1)[0]
big5_player_gca_df = pd.read_html(big5_player_gca_url, header=1)[0]
big5_player_def_df = pd.read_html(big5_player_def_url, header=1)[0]
big5_player_poss_df = pd.read_html(big5_player_poss_url, header=1)[0]


In [48]:
big5_player_pass_df = big5_player_pass_df.rename(columns={'Cmp.1':'Cmp_short', 'Att.1':'Att_short','Cmp%.1':'Cmp%_short',
                                                          'Cmp.2':'Cmp_med', 'Att.2':'Att_med','Cmp%.2':'Cmp%_med',
                                                          'Cmp.3':'Cmp_long', 'Att.3':'Att_long','Cmp%.3':'Cmp%_long','Cmp%':'Cmp%_pass',
                                                          'Cmp':'Cmp_pass','Att':'Att_pass','TotDist':'TotDist_pass','PrgDist':'PrgDist_pass',
                                                          '1/3':'1/3_pass','Prog':'Prog_pass'})
big5_player_ptype_df = big5_player_ptype_df.rename(columns={'In':'CK_In','Out':'CK_Out','Str':'CK_Str','Live':'Liveball_pass',
                                                            'Dead':'Deadball_pass','FK':'FK_pass','Press':'UnderPress_pass','Sw':'40yds<_width_pass',
                                                            'Int':'Passes_int','Blocks':'Passes_blocked'})                                                            
big5_player_gca_df = big5_player_gca_df.rename(columns={'PassLive':'SCA_PassLive','PassDead':'SCA_PassDead',
                                                        'Drib':'SCA_Drib','Sh':'SCA_Sh','Fld':'SCA_Fld','Def':'SCA_Def',
                                                        'PassLive.1':'GCA_PassLive','PassDead.1':'GCA_PassDead',
                                                        'Drib.1':'GCA_Drib','Sh.1':'GCA_Sh','Fld.1':'GCA_Fld','Def.1':'GCA_Def'})
big5_player_poss_df = big5_player_poss_df.rename(columns={'Def Pen':'Def Pen_touch','Def 3rd_x':'Def 3rd_touch','Mid 3rd_x':'Mid 3rd_touch',
                                                          'Att 3rd_x':'Att 3rd_touch','Att Pen':'Att Pen_touch','Live_y':'Live_touch',
                                                          'Succ':'Succ_drib','Att':'Att_drib','Succ%':'Succ%_drib','Prog.1':'Prog_pass_rec',
                                                          'TotDist':'TotDist_carr','PrgDist':'PrgDist_carr','Prog':'Prog_carr','1/3':'1/3_carr'})
big5_player_def_df = big5_player_def_df.rename(columns={'Tkl.1':'Tkl_v_drib','Att':'Tkl_v_drib_Att','Blocks':'Blocks','%':'Succ%_press','Def 3rd':'Def 3rd_Tkl',
                                                        'Mid 3rd':'Mid 3rd_Tkl','Att 3rd':'Att 3rd_Tkl','Succ':'Succ_press','Def 3rd.1':'Def 3rd_press',
                                                        'Mid 3rd.1':'Mid 3rd_press','Att 3rd.1':'Att 3rd_press','Sh':'Blocked_shots','ShSv':'Blocked_SoT'})

Description of variables (Passing):
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
Cmp: Number of passes completed.
Att: Number of passes attempted.
Cmp%: Percentage of passes completed.
TotDist: Total distance, in yards, that passes have traveled.
PrgDist: Total distance, in yards, that completed passes have traveled towards the opponent's goal. Note: Passes away from opponent's goal are counted as zero progressive yards.
Cmp_short: Passes completed between 5 and 15 yards.
Att_short: Passes attempted between 5 and 15 yards.
Cmp%_short: Percentage of passes completed between 5 and 15 yards.
Cmp_med: Passes completed between 15 and 30 yards.
Att_med: Passes attempted between 15 and 30 yards.
Cmp%_med: Percentage of passes completed between 15 and 30 yards.
Cmp_long: Passes completed longer than 30 yards.
Att_long: Passes attempted longer than 30 yards.
Cmp%_long: Percentage of passes completed longer than 30 yards.
Ast: Number of assists.
xA: Expected assists.
A-xA: Assists minus expected assists.
KP: Key passes that directly lead to a shot (assisted shots).
1/3: Completed passes that enter the 1/3 of the pitch closest to the goal, excluding set pieces.
PPA: Completed passes into the 18-yard box (penalty area), excluding set pieces.
Prog: Progressive passes. Completed passes that move the ball towards the opponent's goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch.

Description of variables (Pass types):
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
90s: Total minutes played by player per 90 minutes (Min/90).
Att: Number of passes attempted.
Liveball_pass: Live-ball passes.
Deadball_pass: Dead-ball passes, includes free kicks, corner kicks, kick offs, throw-ins and goal kicks.
FK_pass: Passes attempted from free kicks.
TB: Completed pass sent between back defenders into open space.
UnderPress_pass: Passes made while under pressure from opponent.
40yds<_width_pass: Passes that travel more than 40 yards of the width of the pitch.
Crs: Crosses into the penalty area.
CK: Corner kicks.
CK_In: Inswinging corner kicks.
CK_Out: Outswinging corner kicks.
CK_Str: Straight corner kicks.
Ground: Ground passes.
Low: Passes that leave the ground, but stay below shoulder-level.
High: Passes that are above shoulder-level at the peak height.
Left: Passes attempted using left foot.
Right: Passes attempted using right foot.
Head: Passes attempted using head.
TI: Throw-ins taken.
Other: Passes attempted using body parts other than the player's head or feet.
Cmp: Passes completed.
Off: Passes that resulted in an offsides.
Out: Passes that went out of bounds.
Int: Passes that were intercepted.
Blocks: Passes that were blocked by the opponent standing in the path.


Description of variables (Goal and Shot Creation Actions (GCA/SCA)):
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
90s: Total minutes played by player per 90 minutes (Min/90).
SCA: Shot creation actions. The two offensive actions directly leading to a shot, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit.
SCA90: Shot-creating actions per 90 minutes.
SCA_PassLive: Completed live-ball passes that lead to a shot attempt
SCA_PassDead: Completed dead-ball passes that lead to a shot attempt.
Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks.
SCA_Drib: Successful dribbles that lead to a shot attempt.
SCA_Sh: Shots that lead to another shot attempt.
SCA_Fld: Fouls drawn that lead to a shot attempt.
SCA_Def: Defensive actions that lead to a shot attempt.
GCA: Goal-creating actions. The two offensive actions directly leading to a goal, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit.
GCA90: Goal-creating actions per 90 minutes.
GCA_PassLive: Completed live-ball passes that lead to a goal.
GCA_PassDead: Completed dead-ball passes that lead to a goal. Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks.
GCA_Drib: Successful dribbles that lead to a goal.
GCA_Sh: Shots that lead to another goal-scoring shot.
GCA_Fld: Fouls drawn that lead to a goal.
GCA_Def: Defensive actions that lead to a goal.

Description of variables (Possession):
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
90s: Total minutes played by player per 90 minutes (Min/90).
Touches: Number of times a player touched the ball. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch.
Def Pen_touch: Touches in defensive penalty area.
Def 3rd_touch: Touches in defensive third.
Mid 3rd_touch: Touches in middle third.
Att 3rd_touch: Touches in attacking third.
Att Pen_touch: Touches in attacking penalty area.
Live_touch: Live ball touches.
Succ_drib: Dribbles completed successfully.
Att_drib: Dribbles attempted.
Succ%_drib: Percentage of dribbles completed successfully.
'#Pl: Number of opposing players dribbled past.
Megs: Nutmegs. Number of times a player dribbled the ball through an opposing player's legs.
Carries: Number of times the player controlled the ball with their feet.
TotDist_carr: Total distance, in yards, a player moved the ball while controlling it with their feet, in any direction.
PrgDist_carr: Total distance, in yards, a player moved the ball while controlling it with their feet towards the opponent's goal.
Prog_carr: Carries that move the ball towards the opponent's goal at least 5 yards, or any carry into the penalty area. Excludes carries from the defending 40% of the pitch.
1/3_carr: Carries that enter the 1/3 of the pitch closest to the goal.
CPA: Carries into the 18-yard box (penalty area).
Mis: Number of times a player failed when attempting to gain control of a ball.
Dis: Number of times a player loses control of the ball after being tackled by an opposing player. Does not include attempted dribbles.
Targ: Number of times a player was the target of an attempted pass.
Rec: Number of times a player successfully received a pass.
Rec%: Passes received percentage. Percentage of time a player successfully received a pass.
Prog: Progressive passes received. Completed passes that move the ball towards the opponent's goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch.


Description of variables (Possession):
Rank: Rank of players based on statistical category (This is a variable that is for the web interaction. Not really an essential variable).
Player: Name of player.
Nation: Nationality of player.
Position: Position of player.
Squad: The squad/team the player plays on.
Age: Age of the player in years and days. This is a string variable. To use this variable in analysis, suggest deleting the hyphen and the number of days. Once, those items are deleted, convert this variable into an integer. We do these steps further down in the code.
90s: Total minutes played by player per 90 minutes (Min/90).
Tkl: Number of players tackled.
TklW: Tackles in which the tackler's team won possession of the ball.
Def 3rd_Tkl: Tackles in defensive third.
Mid 3rd_Tkl: Tackles in middle third.
Att 3rd_Tkl: Tackles in attacking third.
Tkl_v_drib: Number of dribblers tackled.
Tkl_v_drib_Att: Number of times dribbled past plus number of tackles.
Tkl%: Percentage of dribblers tackled. Dribblers tackled divided by dribblers tackled plus times dribbled past.
Past: Number of times dribbled past by an opposing player.
Press: Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball.
Succ_press: Number of times the squad gained possession withing five seconds of applying pressure.
Succ%_press: Successful pressure percentage. Percentage of time the squad gained possession withing five seconds of applying pressure.
Def 3rd_press: Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the defensive third.
Mid 3rd_press: Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the middle third.
Blocks: Number of times blocking the ball by standing in its path.
Blocked_shots: Number of times blocking a shot by standing in its path.
Blocked_SoT: Number of times blocking a shot that was on target, by standing in its path.
Pass: Number of times blocking a pass by standing in its path.
Int: Interceptions.
Tkl+Int: Number of players tackled plus number of interceptions.
Clr: Clearances.
Err: Mistakes leading to an opponent's shot.


Changing data types from string to integers and floats. Before we do the data type transformation, we have to cleanse the data. First, we have to delete the rows that have "Player" as values.

In [49]:
player_stand_df = player_stand_df[player_stand_df['Player']!="Player"]
player_shoot_df = player_shoot_df[player_shoot_df['Player']!="Player"]
player_pt_df = player_pt_df[player_pt_df['Player']!="Player"]
player_misc_df = player_misc_df[player_misc_df['Player']!="Player"]
player_gk_df = player_gk_df[player_gk_df['Player']!="Player"]
big5_player_pass_df = big5_player_pass_df[big5_player_pass_df['Player']!="Player"]
big5_player_ptype_df = big5_player_ptype_df[big5_player_ptype_df['Player']!="Player"]
big5_player_gca_df = big5_player_gca_df[big5_player_gca_df['Player']!="Player"]
big5_player_def_df = big5_player_def_df[big5_player_def_df['Player']!="Player"]
big5_player_poss_df = big5_player_poss_df[big5_player_poss_df['Player']!="Player"]
big5_player_advgk_df = big5_player_advgk_df[big5_player_advgk_df['Player']!="Player"]


In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

The data in the "Age" column is in strings, which makes it useless while we're conducting player analysis. We're going to split the Age into two columns: "Age_yrs", "Age_days". We're going to use an lambda function to do this.

In [54]:
player_stand_df[['Age_yrs','Age_days']] = player_stand_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
player_shoot_df[['Age_yrs','Age_days']] = player_shoot_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
player_pt_df[['Age_yrs','Age_days']] = player_pt_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
player_misc_df[['Age_yrs','Age_days']] = player_misc_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
player_gk_df[['Age_yrs','Age_days']] = player_gk_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_pass_df[['Age_yrs','Age_days']] = big5_player_pass_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_ptype_df[['Age_yrs','Age_days']] = big5_player_ptype_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_gca_df[['Age_yrs','Age_days']] = big5_player_gca_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_def_df[['Age_yrs','Age_days']] = big5_player_def_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_poss_df[['Age_yrs','Age_days']] = big5_player_poss_df.Age.apply(lambda x: pd.Series(str(x).split("-")))
big5_player_advgk_df[['Age_yrs','Age_days']] = big5_player_advgk_df.Age.apply(lambda x: pd.Series(str(x).split("-")))

Next, we fill the not-a-number (NaN) values as zeros.

In [59]:
player_stand_df = player_stand_df.fillna(0)
player_shoot_df = player_shoot_df.fillna(0)
player_pt_df = player_pt_df.fillna(0)
player_misc_df = player_misc_df.fillna(0)
player_gk_df = player_gk_df.fillna(0)
big5_player_pass_df = big5_player_pass_df.fillna(0)
big5_player_ptype_df = big5_player_ptype_df.fillna(0)
big5_player_gca_df = big5_player_gca_df.fillna(0)
big5_player_def_df = big5_player_def_df.fillna(0)
big5_player_poss_df = big5_player_poss_df.fillna(0)
big5_player_advgk_df = big5_player_advgk_df.fillna(0)

In [56]:
player_stand_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5288 entries, 0 to 2845
Data columns (total 34 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Player     5288 non-null   object
 1   Nation     5288 non-null   object
 2   Pos        5288 non-null   object
 3   Squad      5288 non-null   object
 4   Comp       5288 non-null   object
 5   Age        5288 non-null   object
 6   Born       5288 non-null   object
 7   MP         5288 non-null   object
 8   Starts     5288 non-null   object
 9   Min        5288 non-null   object
 10  90s        5288 non-null   object
 11  Gls        5288 non-null   object
 12  Ast        5288 non-null   object
 13  G-PK       5288 non-null   object
 14  PK         5288 non-null   object
 15  PKatt      5288 non-null   object
 16  CrdY       5288 non-null   object
 17  CrdR       5288 non-null   object
 18  Gls90      5288 non-null   object
 19  Ast90      5288 non-null   object
 20  G+A90      5288 non-null   obj

In [None]:
#player_stand_df = player_stand_df.drop(['Age'],axis=1)
#player_shoot_df = player_shoot_df.drop(['Age'],axis=1)
#player_pt_df = player_pt_df.drop(['Age'],axis=1)
#player_misc_df = player_misc_df.drop(['Age'],axis=1)
#player_gk_df = player_gk_df.drop(['Age'],axis=1)
#big5_player_pass_df = big5_player_pass_df.drop(['Age'],axis=1)
#big5_player_ptype_df = big5_player_ptype_df.drop(['Age'],axis=1)
#big5_player_gca_df = big5_player_gca_df.drop(['Age'],axis=1)
#big5_player_def_df = big5_player_def_df.drop(['Age'],axis=1)
#big5_player_poss_df = big5_player_poss_df.drop(['Age'],axis=1)
#big5_player_advgk_df = big5_player_advgk_df.drop(['Age'],axis=1)


In [None]:
player_stand_df.head() 

In [60]:
stand_cols_float = ['90s','Gls90','Ast90','G+A90','G-PK90','G+A90','G+A-PK90','xG',
                   'npxG','xA','npxG+xA','xG90','xA90','xG+xA90','npxG90','npxG+xA90']

stand_cols_int = ['Age_yrs','Age_days', 'MP', 'Starts', 'Min', 'Gls', 'Ast', 'G-PK', 'PK', 'PKatt','CrdY','CrdR']
#['Age-yrs','Age-days', 'MP', 'Starts', 'Min', 'Gls', 'Ast', 'G-PK', 'PK', 'PKatt','CrdY','CrdR']

player_stand_df[stand_cols_float] = player_stand_df[stand_cols_float].astype('float')
player_stand_df[stand_cols_int] = player_stand_df[stand_cols_int].astype('int')

player_stand_df.info()  


<class 'pandas.core.frame.DataFrame'>
Int64Index: 5288 entries, 0 to 2845
Data columns (total 34 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Player     5288 non-null   object 
 1   Nation     5288 non-null   object 
 2   Pos        5288 non-null   object 
 3   Squad      5288 non-null   object 
 4   Comp       5288 non-null   object 
 5   Age        5288 non-null   object 
 6   Born       5288 non-null   object 
 7   MP         5288 non-null   int64  
 8   Starts     5288 non-null   int64  
 9   Min        5288 non-null   int64  
 10  90s        5288 non-null   float64
 11  Gls        5288 non-null   int64  
 12  Ast        5288 non-null   int64  
 13  G-PK       5288 non-null   int64  
 14  PK         5288 non-null   int64  
 15  PKatt      5288 non-null   int64  
 16  CrdY       5288 non-null   int64  
 17  CrdR       5288 non-null   int64  
 18  Gls90      5288 non-null   float64
 19  Ast90      5288 non-null   float64
 20  G+A90   

In [63]:
player_gk_df.columns

Index(['Player', 'Nation', 'Pos', 'Squad', 'Age', 'Born', 'Comp', 'MP',
       'Starts', 'Min', '90s', 'GA', 'GA90', 'SoTA', 'Saves', 'Save%', 'W',
       'D', 'L', 'CS', 'CS%', 'PKA', 'PKsv', 'PKm', 'Save%_PK', 'FK', 'CK',
       'OG', 'PSxG', 'PSxG/SoT', 'PSxG+/-', 'PSxG-GA90', 'Cmp', 'Att',
       'Cmp_pct', 'Att_pass', 'Thr', 'Launch_pct_pass', 'AvgLen', 'GKA',
       'GK_launch_pct', 'GK_Avglen', 'Cross_opp', 'Cross_opp_stop',
       'Cross_opp_stop_pct', 'Def_OPA', 'Def_OPA90', 'Def_OPA_AvgDist',
       'Age_yrs', 'Age_days'],
      dtype='object')

In [64]:
gk_cols_float = ['90s', 'GA90', 'Save%', 'CS%', 'Save%_PK','Cmp_pct','Launch_pct_pass','AvgLen','GK_launch_pct',
                'PSxG','PSxG/SoT','PSxG+/-','PSxG-GA90','GK_Avglen','Cross_opp_stop_pct','Def_OPA90', 
                'Def_OPA_AvgDist']
gk_cols_int = ['Age_yrs','Age_days', 'MP', 'Starts', 'Min', 'Saves', 'GA', 'SoTA', 'W', 'L','D','CS','Att',
               'PKA','PKsv','PKm','Thr','Cmp','Att_pass','FK','CK','OG','GKA','Cross_opp', 'Cross_opp_stop',
               'Def_OPA']   
player_gk_df[gk_cols_float] = player_gk_df[gk_cols_float].astype('float')
player_gk_df[gk_cols_int] = player_gk_df[gk_cols_int].astype('int')

player_gk_df.info()  

<class 'pandas.core.frame.DataFrame'>
Int64Index: 558 entries, 0 to 194
Data columns (total 50 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Player              558 non-null    object 
 1   Nation              558 non-null    object 
 2   Pos                 558 non-null    object 
 3   Squad               558 non-null    object 
 4   Age                 558 non-null    object 
 5   Born                558 non-null    object 
 6   Comp                558 non-null    object 
 7   MP                  558 non-null    int64  
 8   Starts              558 non-null    int64  
 9   Min                 558 non-null    int64  
 10  90s                 558 non-null    float64
 11  GA                  558 non-null    int64  
 12  GA90                558 non-null    float64
 13  SoTA                558 non-null    int64  
 14  Saves               558 non-null    int64  
 15  Save%               558 non-null    float64
 16  W       

In [65]:
player_shoot_df.columns

Index(['Rk', 'Player', 'Nation', 'Pos', 'Squad', 'Age', 'Born', '90s', 'Gls',
       'Sh', 'SoT', 'SoT%', 'Sh/90', 'SoT/90', 'G/Sh', 'G/SoT', 'Dist', 'PK',
       'PKatt', 'Matches', 'Comp', 'FK', 'xG', 'npxG', 'npxG/Sh', 'G-xG',
       'np:G-xG', 'Age_yrs', 'Age_days'],
      dtype='object')

In [68]:
shoot_cols_float = ['SoT%','Sh/90','SoT/90','G/Sh','G/SoT','G-xG','np:G-xG','Dist']

shoot_cols_int = ['Age_yrs', 'Age_days', 'Sh', 'SoT', 'FK']

player_shoot_df[shoot_cols_float] = player_shoot_df[shoot_cols_float].astype('float')
player_shoot_df[shoot_cols_int] = player_shoot_df[shoot_cols_int].astype('int')

player_shoot_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5288 entries, 0 to 2845
Data columns (total 29 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Rk        5288 non-null   object 
 1   Player    5288 non-null   object 
 2   Nation    5288 non-null   object 
 3   Pos       5288 non-null   object 
 4   Squad     5288 non-null   object 
 5   Age       5288 non-null   object 
 6   Born      5288 non-null   object 
 7   90s       5288 non-null   object 
 8   Gls       5288 non-null   object 
 9   Sh        5288 non-null   int64  
 10  SoT       5288 non-null   int64  
 11  SoT%      5288 non-null   float64
 12  Sh/90     5288 non-null   float64
 13  SoT/90    5288 non-null   float64
 14  G/Sh      5288 non-null   float64
 15  G/SoT     5288 non-null   float64
 16  Dist      5288 non-null   float64
 17  PK        5288 non-null   object 
 18  PKatt     5288 non-null   object 
 19  Matches   5288 non-null   object 
 20  Comp      5288 non-null   obje

In [70]:
pt_cols_float = ['Min%','90s','PPM','+/-90','On-Off','onxG','onxGA','xG+/-','xG+/-90','xGOn-Off90']

pt_cols_int = ['Age_yrs','Age_days','MP','Min','Mn/MP','Starts','Mn/Start','Compl','Subs',
               'Mn/Sub','unSub','onG','onGA','+/-']

player_pt_df[pt_cols_float] = player_pt_df[pt_cols_float].astype('float')
player_pt_df[pt_cols_int] = player_pt_df[pt_cols_int].astype('int')

player_pt_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6378 entries, 0 to 3526
Data columns (total 33 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Rk          6378 non-null   object 
 1   Player      6378 non-null   object 
 2   Nation      6378 non-null   object 
 3   Pos         6378 non-null   object 
 4   Squad       6378 non-null   object 
 5   Age         6378 non-null   object 
 6   Born        6378 non-null   object 
 7   MP          6378 non-null   int64  
 8   Min         6378 non-null   int64  
 9   Mn/MP       6378 non-null   int64  
 10  Min%        6378 non-null   float64
 11  90s         6378 non-null   float64
 12  Starts      6378 non-null   int64  
 13  Mn/Start    6378 non-null   int64  
 14  Compl       6378 non-null   int64  
 15  Subs        6378 non-null   int64  
 16  Mn/Sub      6378 non-null   int64  
 17  unSub       6378 non-null   int64  
 18  PPM         6378 non-null   float64
 19  onG         6378 non-null  

In [71]:
player_misc_df.columns

Index(['Rk', 'Player', 'Nation', 'Pos', 'Squad', 'Age', 'Born', '90s', 'CrdY',
       'CrdR', '2CrdY', 'Fls', 'Fld', 'Off', 'Crs', 'Int', 'TklW', 'PKwon',
       'PKcon', 'OG', 'Matches', 'Comp', 'Recov', 'Won', 'Lost', 'Won%',
       'Age_yrs', 'Age_days'],
      dtype='object')

In [73]:
misc_cols_float = ['Won%']

misc_cols_int = ['Age_yrs','Age_days','2CrdY','Fls','Fld','Won','Lost','Off','Crs','Int','TklW',
                'PKwon','PKcon','OG','Recov']

player_misc_df[misc_cols_float] = player_misc_df[misc_cols_float].astype('float')
player_misc_df[misc_cols_int] = player_misc_df[misc_cols_int].astype('int')

player_misc_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5288 entries, 0 to 2845
Data columns (total 28 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Rk        5288 non-null   object 
 1   Player    5288 non-null   object 
 2   Nation    5288 non-null   object 
 3   Pos       5288 non-null   object 
 4   Squad     5288 non-null   object 
 5   Age       5288 non-null   object 
 6   Born      5288 non-null   object 
 7   90s       5288 non-null   object 
 8   CrdY      5288 non-null   object 
 9   CrdR      5288 non-null   object 
 10  2CrdY     5288 non-null   int64  
 11  Fls       5288 non-null   int64  
 12  Fld       5288 non-null   int64  
 13  Off       5288 non-null   int64  
 14  Crs       5288 non-null   int64  
 15  Int       5288 non-null   int64  
 16  TklW      5288 non-null   int64  
 17  PKwon     5288 non-null   int64  
 18  PKcon     5288 non-null   int64  
 19  OG        5288 non-null   int64  
 20  Matches   5288 non-null   obje

In [74]:
pass_cols_float = ['Cmp%_pass','Cmp%_short','Cmp%_med','Cmp%_long','A-xA']

pass_cols_int = ['Age_yrs','Age_days','Cmp_pass','Att_pass','Cmp_med','Att_med','Cmp_short','Att_short','Cmp_long',
                 'Att_long','KP','1/3_pass','PPA','CrsPA','Prog_pass']

big5_player_pass_df[pass_cols_float] = big5_player_pass_df[pass_cols_float].astype('float')
big5_player_pass_df[pass_cols_int] = big5_player_pass_df[pass_cols_int].astype('int')

big5_player_pass_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2737 entries, 0 to 2845
Data columns (total 34 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rk            2737 non-null   object 
 1   Player        2737 non-null   object 
 2   Nation        2737 non-null   object 
 3   Pos           2737 non-null   object 
 4   Squad         2737 non-null   object 
 5   Comp          2737 non-null   object 
 6   Age           2737 non-null   object 
 7   Born          2737 non-null   object 
 8   90s           2737 non-null   object 
 9   Cmp_pass      2737 non-null   int64  
 10  Att_pass      2737 non-null   int64  
 11  Cmp%_pass     2737 non-null   float64
 12  TotDist_pass  2737 non-null   object 
 13  PrgDist_pass  2737 non-null   object 
 14  Cmp_short     2737 non-null   int64  
 15  Att_short     2737 non-null   int64  
 16  Cmp%_short    2737 non-null   float64
 17  Cmp_med       2737 non-null   int64  
 18  Att_med       2737 non-null 

In [76]:
ptype_cols_float = ['90s']
ptype_cols_int = ['Age_yrs','Age_days','Liveball_pass','Deadball_pass','FK_pass','TB','UnderPress_pass','40yds<_width_pass',
                  'Crs','CK','CK_In','CK_Out','CK_Str','Ground','Low','High','Left','Right','Head','TI','Other','Off',
                  'Out.1','Passes_int','Passes_blocked','Cmp']

big5_player_ptype_df[ptype_cols_float] = big5_player_ptype_df[ptype_cols_float].astype('float')
big5_player_ptype_df[ptype_cols_int] = big5_player_ptype_df[ptype_cols_int].astype('int')

big5_player_ptype_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2737 entries, 0 to 2845
Data columns (total 37 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Rk                 2737 non-null   object 
 1   Player             2737 non-null   object 
 2   Nation             2737 non-null   object 
 3   Pos                2737 non-null   object 
 4   Squad              2737 non-null   object 
 5   Comp               2737 non-null   object 
 6   Age                2737 non-null   object 
 7   Born               2737 non-null   object 
 8   90s                2737 non-null   float64
 9   Att                2737 non-null   object 
 10  Liveball_pass      2737 non-null   int64  
 11  Deadball_pass      2737 non-null   int64  
 12  FK_pass            2737 non-null   int64  
 13  TB                 2737 non-null   int64  
 14  UnderPress_pass    2737 non-null   int64  
 15  40yds<_width_pass  2737 non-null   int64  
 16  Crs                2737 

In [77]:
poss_cols_float = ['Succ%_drib','Rec%']

poss_cols_int = ['Age_yrs','Age_days','Touches','Def Pen_touch','Def 3rd','Mid 3rd','Att 3rd','Att Pen_touch',
                 'Live','Succ_drib','Att_drib','#Pl','Megs','Carries','TotDist_carr','PrgDist_carr',
                 'Prog_carr','1/3_carr','CPA','Mis','Dis','Targ','Rec','Prog_pass_rec']

big5_player_poss_df[poss_cols_float] = big5_player_poss_df[poss_cols_float].astype('float')
big5_player_poss_df[poss_cols_int] = big5_player_poss_df[poss_cols_int].astype('int')

big5_player_poss_df.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2737 entries, 0 to 2845
Data columns (total 36 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Rk             2737 non-null   object 
 1   Player         2737 non-null   object 
 2   Nation         2737 non-null   object 
 3   Pos            2737 non-null   object 
 4   Squad          2737 non-null   object 
 5   Comp           2737 non-null   object 
 6   Age            2737 non-null   object 
 7   Born           2737 non-null   object 
 8   90s            2737 non-null   object 
 9   Touches        2737 non-null   int64  
 10  Def Pen_touch  2737 non-null   int64  
 11  Def 3rd        2737 non-null   int64  
 12  Mid 3rd        2737 non-null   int64  
 13  Att 3rd        2737 non-null   int64  
 14  Att Pen_touch  2737 non-null   int64  
 15  Live           2737 non-null   int64  
 16  Succ_drib      2737 non-null   int64  
 17  Att_drib       2737 non-null   int64  
 18  Succ%_dr

In [78]:
gca_cols_float = ['90s','SCA90','GCA90']
gca_cols_int = ['Age_yrs','Age_days','SCA','SCA_PassLive','SCA_PassDead','SCA_Drib','SCA_Sh','SCA_Fld','SCA_Def',
                'GCA','GCA_PassLive','GCA_PassDead','GCA_Drib','GCA_Sh','GCA_Fld','GCA_Def']   
big5_player_gca_df[gca_cols_float] = big5_player_gca_df[gca_cols_float].astype('float')
big5_player_gca_df[gca_cols_int] = big5_player_gca_df[gca_cols_int].astype('int')

big5_player_gca_df.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2737 entries, 0 to 2845
Data columns (total 28 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rk            2737 non-null   object 
 1   Player        2737 non-null   object 
 2   Nation        2737 non-null   object 
 3   Pos           2737 non-null   object 
 4   Squad         2737 non-null   object 
 5   Comp          2737 non-null   object 
 6   Age           2737 non-null   object 
 7   Born          2737 non-null   object 
 8   90s           2737 non-null   float64
 9   SCA           2737 non-null   int64  
 10  SCA90         2737 non-null   float64
 11  SCA_PassLive  2737 non-null   int64  
 12  SCA_PassDead  2737 non-null   int64  
 13  SCA_Drib      2737 non-null   int64  
 14  SCA_Sh        2737 non-null   int64  
 15  SCA_Fld       2737 non-null   int64  
 16  SCA_Def       2737 non-null   int64  
 17  GCA           2737 non-null   int64  
 18  GCA90         2737 non-null 

In [80]:
def_cols_float = ['Succ%_press','Tkl%']

def_cols_int = ['Age_yrs','Age_days','Tkl','TklW','Def 3rd_Tkl','Mid 3rd_Tkl','Att 3rd_Tkl','Tkl_v_drib','Tkl_v_drib_Att',
                'Def 3rd_press','Mid 3rd_press','Att 3rd_press','Past','Press','Succ_press','Blocks',
                'Blocked_shots','Blocked_SoT','Pass','Int','Clr','Err','Tkl+Int']
                       

big5_player_def_df[def_cols_float] = big5_player_def_df[def_cols_float].astype('float')
big5_player_def_df[def_cols_int] = big5_player_def_df[def_cols_int].astype('int')

big5_player_def_df.info() 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2737 entries, 0 to 2845
Data columns (total 35 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Rk              2737 non-null   object 
 1   Player          2737 non-null   object 
 2   Nation          2737 non-null   object 
 3   Pos             2737 non-null   object 
 4   Squad           2737 non-null   object 
 5   Comp            2737 non-null   object 
 6   Age             2737 non-null   object 
 7   Born            2737 non-null   object 
 8   90s             2737 non-null   object 
 9   Tkl             2737 non-null   int64  
 10  TklW            2737 non-null   int64  
 11  Def 3rd_Tkl     2737 non-null   int64  
 12  Mid 3rd_Tkl     2737 non-null   int64  
 13  Att 3rd_Tkl     2737 non-null   int64  
 14  Tkl_v_drib      2737 non-null   int64  
 15  Tkl_v_drib_Att  2737 non-null   int64  
 16  Tkl%            2737 non-null   float64
 17  Past            2737 non-null   i

In [None]:
### END OF CODE  ###

In [None]:
### Analysis ###

In [83]:
pd.set_option('display.max_columns', None)

In [99]:
df = player_stand_df[(player_stand_df['Comp'] == 'Primeira Liga') & (player_stand_df['MP'] >= 15)]
df = df.sort_values(by ='Ast90', ascending=False) 
df.head(10)

Unnamed: 0,Player,Nation,Pos,Squad,Comp,Age,Born,MP,Starts,Min,90s,Gls,Ast,G-PK,PK,PKatt,CrdY,CrdR,Gls90,Ast90,G+A90,G-PK90,G+A-PK90,xG,npxG,xA,npxG+xA,xG90,xA90,xG+xA90,npxG90,npxG+xA90,Age_yrs,Age_days
518,Fabio Vieira,pt POR,"MF,FW",Porto,Primeira Liga,21-258,2000,17,8,725,8.1,2,8,2,0,0,2,0,0.25,0.99,1.24,0.25,1.24,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,21,258
470,Rafa Silva,pt POR,"MF,FW",Benfica,Primeira Liga,28-271,1993,20,15,1397,15.5,7,13,7,0,0,2,0,0.45,0.84,1.29,0.45,1.29,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28,271
309,Iuri Medeiros,pt POR,"MF,FW",Braga,Primeira Liga,27-217,1994,16,12,914,10.2,5,6,4,1,1,1,0,0.49,0.59,1.08,0.39,0.98,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27,217
478,Everton Soares,br BRA,"MF,FW",Benfica,Primeira Liga,25-327,1996,17,11,957,10.6,3,5,3,0,0,3,0,0.28,0.47,0.75,0.28,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,25,327
366,OtÃ¡vio,pt POR,MF,Porto,Primeira Liga,27-003,1995,20,20,1632,18.1,3,8,3,0,0,7,0,0.17,0.44,0.61,0.17,0.61,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,27,3
495,Mehdi Taremi,ir IRN,FW,Porto,Primeira Liga,29-209,1992,21,17,1592,17.7,12,7,10,2,2,4,1,0.68,0.4,1.07,0.57,0.96,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29,209
326,Pedro Filipe Barbosa Moreira,pt POR,MF,Gil Vicente FC,Primeira Liga,29-054,1992,20,20,1749,19.4,1,7,1,0,0,5,0,0.05,0.36,0.41,0.05,0.41,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29,54
451,Pablo Sarabia,es ESP,"MF,FW",Sporting CP,Primeira Liga,29-277,1992,18,17,1287,14.3,6,5,5,1,1,4,0,0.42,0.35,0.77,0.35,0.7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,29,277
448,Yan Santos,br BRA,"FW,MF",Moreirense,Primeira Liga,23-161,1998,20,14,1120,12.4,3,4,1,2,2,2,0,0.24,0.32,0.56,0.08,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23,161
215,Pedro GonÃ§alves,pt POR,"MF,FW",Sporting CP,Primeira Liga,23-229,1998,17,17,1399,15.5,6,5,6,0,1,4,0,0.39,0.32,0.71,0.39,0.71,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23,229


In [112]:
df2 = player_stand_df[player_stand_df['Squad'] == 'Tottenham']
df2 = df2.sort_values(by ='G+A90', ascending=False) 
df2

Unnamed: 0,Player,Nation,Pos,Squad,Comp,Age,Born,MP,Starts,Min,90s,Gls,Ast,G-PK,PK,PKatt,CrdY,CrdR,Gls90,Ast90,G+A90,G-PK90,G+A-PK90,xG,npxG,xA,npxG+xA,xG90,xA90,xG+xA90,npxG90,npxG+xA90,Age_yrs,Age_days
1151,Son Heung-min,kr KOR,"FW,MF",Tottenham,eng Premier League,29-219,1992,18,18,1573,17.5,9,3,9,0,0,0,0,0.51,0.17,0.69,0.51,0.69,7.4,7.4,3.5,10.9,0.42,0.2,0.63,0.42,0.63,29,219
286,Steven Bergwijn,nl NED,FW,Tottenham,eng Premier League,24-127,1997,11,4,422,4.7,2,1,2,0,0,1,0,0.43,0.21,0.64,0.43,0.64,1.9,1.9,0.5,2.4,0.41,0.11,0.52,0.41,0.52,24,127
730,Matt Doherty,ie IRL,"DF,MF",Tottenham,eng Premier League,30-027,1992,6,1,237,2.6,0,1,0,0,0,0,0,0.0,0.38,0.38,0.0,0.38,1.1,1.1,0.3,1.4,0.43,0.11,0.54,0.43,0.54,30,27
1818,Lucas Moura,br BRA,"FW,MF",Tottenham,eng Premier League,29-183,1992,20,16,1409,15.7,2,4,2,0,0,2,0,0.13,0.26,0.38,0.13,0.38,3.3,3.3,3.0,6.3,0.21,0.19,0.4,0.21,0.4,29,183
1312,Harry Kane,eng ENG,FW,Tottenham,eng Premier League,28-199,1993,20,19,1702,18.9,5,2,4,1,1,3,0,0.26,0.11,0.37,0.21,0.32,9.1,8.4,3.9,12.2,0.48,0.2,0.69,0.44,0.65,28,199
1866,Tanguy Ndombele,fr FRA,MF,Tottenham,eng Premier League,25-046,1996,9,6,484,5.4,1,1,1,0,0,1,0,0.19,0.19,0.37,0.19,0.37,0.5,0.5,0.4,0.9,0.09,0.07,0.16,0.09,0.16,25,46
2178,Sergio ReguilÃ³n,es ESP,DF,Tottenham,eng Premier League,25-058,1996,19,18,1444,16.0,1,3,1,0,0,3,0,0.06,0.19,0.25,0.06,0.25,2.0,2.0,3.6,5.5,0.12,0.22,0.35,0.12,0.35,25,58
2778,Harry Winks,eng ENG,MF,Tottenham,eng Premier League,26-010,1996,10,7,616,6.8,0,1,0,0,0,2,0,0.0,0.15,0.15,0.0,0.15,0.1,0.1,1.5,1.5,0.01,0.22,0.23,0.01,0.23,26,10
2323,Davinson SÃ¡nchez,co COL,DF,Tottenham,eng Premier League,25-245,1996,15,13,1201,13.3,2,0,2,0,0,4,0,0.15,0.0,0.15,0.15,0.15,1.6,1.6,0.1,1.7,0.12,0.01,0.13,0.12,0.13,25,245
1185,Pierre HÃ¸jbjerg,dk DEN,MF,Tottenham,eng Premier League,26-191,1995,20,20,1756,19.5,2,1,2,0,0,1,0,0.1,0.05,0.15,0.1,0.15,1.6,1.6,1.0,2.5,0.08,0.05,0.13,0.08,0.13,26,191


In [108]:
df1 = squad_stand_df[(squad_stand_df['Comp'] == 'eng Premier League')]
df1 = df1.sort_values(by ='G+A90', ascending=False) 
df1

Unnamed: 0,Squad,# Pl,Age,Comp,MP,Starts,Min,Poss,90s,Gls,Ast,G-PK,PK,PKatt,CrdY,CrdR,Gls90,Ast90,G+A90,G-PK.1,G+A-PK90,xG,npxG,xA,npxG+xA,xG90,xA90,xG+xA90,npxG90,npxG+xA90
50,Liverpool,27,28.3,eng Premier League,23,253,2070,62.5,23.0,60,45,57,3,4,30,1,2.61,1.96,4.57,57,4.43,57.0,54.2,38.9,93.1,2.48,1.69,4.17,2.36,4.05
56,Manchester City,23,27.5,eng Premier League,24,264,2160,67.9,24.0,55,35,50,5,5,30,1,2.29,1.46,3.75,50,3.54,53.5,49.7,36.3,86.0,2.23,1.51,3.74,2.07,3.58
22,Chelsea,25,27.9,eng Premier League,24,264,2160,58.8,24.0,47,32,40,7,7,39,1,1.96,1.33,3.29,40,3.0,40.8,35.8,29.0,64.8,1.7,1.21,2.91,1.49,2.7
95,West Ham,24,28.7,eng Premier League,24,264,2160,48.6,24.0,40,33,37,3,5,32,2,1.67,1.37,3.04,37,2.92,34.0,30.7,24.8,55.5,1.42,1.03,2.45,1.28,2.31
57,Manchester Utd,25,27.8,eng Premier League,23,253,2070,54.2,23.0,36,29,34,2,3,47,2,1.57,1.26,2.83,34,2.74,34.0,31.7,24.5,56.2,1.48,1.07,2.54,1.38,2.45
45,Leicester City,24,27.2,eng Premier League,21,231,1890,48.5,21.0,34,25,33,1,1,26,1,1.62,1.19,2.81,33,2.76,31.4,30.6,20.1,50.8,1.49,0.96,2.45,1.46,2.42
3,Arsenal,27,24.9,eng Premier League,22,242,1980,51.0,22.0,34,25,33,1,4,38,3,1.55,1.14,2.68,33,2.64,32.7,29.6,20.4,50.0,1.49,0.93,2.41,1.35,2.27
4,Aston Villa,29,26.5,eng Premier League,22,242,1980,45.4,22.0,29,24,27,2,2,48,2,1.32,1.09,2.41,27,2.32,23.9,22.6,18.0,40.6,1.09,0.82,1.91,1.03,1.84
24,Crystal Palace,24,27.9,eng Premier League,23,253,2070,51.7,23.0,32,21,29,3,4,44,1,1.39,0.91,2.3,29,2.17,28.7,25.7,20.0,45.7,1.25,0.87,2.11,1.12,1.99
87,Tottenham,24,27.1,eng Premier League,21,231,1890,51.3,21.0,25,19,23,2,2,38,1,1.19,0.9,2.1,23,2.0,33.1,31.5,24.4,56.0,1.57,1.16,2.74,1.5,2.67


In [None]:
#AM: Fabio Vieira, John Swift,  

In [102]:
big5_squad_stand_df.head(10)

Unnamed: 0,Rk,Squad,Comp,# Pl,Age,Poss,MP,Starts,Min,90s,Gls,Ast,G-PK,PK,PKatt,CrdY,CrdR,Gls.1,Ast.1,G+A,G-PK.1,G+A-PK,xG,npxG,xA,npxG+xA,xG.1,xA.1,xG+xA,npxG.1,npxG+xA.1
0,1,AlavÃ©s,es La Liga,30,28.2,41.8,23,253,2070,23.0,16,9,11,5,5,49,2,0.7,0.39,1.09,0.48,0.87,21.5,17.7,11.9,29.7,0.93,0.52,1.45,0.77,1.29
1,2,Angers,fr Ligue 1,27,28.2,49.1,23,249,2070,23.0,27,16,20,7,7,33,1,1.17,0.7,1.87,0.87,1.57,26.9,21.6,15.0,36.6,1.17,0.65,1.82,0.94,1.59
2,3,Arminia,de Bundesliga,25,26.6,39.0,21,231,1890,21.0,21,13,19,2,2,40,2,1.0,0.62,1.62,0.9,1.52,19.7,18.2,13.2,31.3,0.94,0.63,1.56,0.86,1.49
3,4,Arsenal,eng Premier League,27,24.9,51.0,22,242,1980,22.0,34,25,33,1,4,38,3,1.55,1.14,2.68,1.5,2.64,32.7,29.6,20.4,50.0,1.49,0.93,2.41,1.35,2.27
4,5,Aston Villa,eng Premier League,29,26.5,45.4,22,242,1980,22.0,29,24,27,2,2,48,2,1.32,1.09,2.41,1.23,2.32,23.9,22.6,18.0,40.6,1.09,0.82,1.91,1.03,1.84
5,6,Atalanta,it Serie A,31,28.2,54.7,23,253,2070,23.0,42,26,39,3,4,50,1,1.83,1.13,2.96,1.7,2.83,38.6,35.8,29.7,65.5,1.68,1.29,2.97,1.56,2.85
6,7,Athletic Club,es La Liga,25,27.5,48.3,23,253,2070,23.0,21,18,20,1,3,47,2,0.91,0.78,1.7,0.87,1.65,29.3,27.1,20.4,47.5,1.28,0.89,2.16,1.18,2.07
7,8,AtlÃ©tico Madrid,es La Liga,25,28.7,53.0,22,242,1980,22.0,36,27,35,1,1,61,3,1.64,1.23,2.86,1.59,2.82,30.8,30.0,22.2,52.2,1.4,1.01,2.41,1.36,2.37
8,9,Augsburg,de Bundesliga,26,27.0,40.0,21,231,1890,21.0,22,13,20,2,2,42,0,1.05,0.62,1.67,0.95,1.57,22.7,21.2,14.2,35.4,1.08,0.68,1.76,1.01,1.69
9,10,Barcelona,es La Liga,37,26.6,64.9,22,242,1980,22.0,36,26,32,4,5,51,4,1.64,1.18,2.82,1.45,2.64,34.7,30.9,24.1,54.9,1.58,1.09,2.67,1.4,2.5


In [120]:
big5_player_pass_df[(big5_player_pass_df['Player'] == 'Rodrigo Bentancur')]

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Comp,Age,Born,90s,Cmp_pass,Att_pass,Cmp%_pass,TotDist_pass,PrgDist_pass,Cmp_short,Att_short,Cmp%_short,Cmp_med,Att_med,Cmp%_med,Cmp_long,Att_long,Cmp%_long,Ast,xA,A-xA,KP,1/3_pass,PPA,CrsPA,Prog_pass,Matches,Age_yrs,Age_days
274,265,Rodrigo Bentancur,uy URU,MF,Tottenham,eng Premier League,24-252,1997,0.3,20,25,80.0,444,87,9,11,81.8,6,7,85.7,5,7,71.4,0,0.0,0.0,0,3,0,0,3,Matches,24,252
275,266,Rodrigo Bentancur,uy URU,MF,Juventus,it Serie A,24-252,1997,12.8,661,756,87.4,12392,2859,256,279,91.8,317,347,91.4,76,100,76.0,1,0.3,0.7,8,79,5,0,56,Matches,24,252


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
big5_def_df = big5_player_nongk_df.loc[big5_player_nongk_df['Pos'].isin(['DF'])]
big5_def_df.head()

In [None]:
AerialduelsWon_90 = big5_def_df['Aerialduels_Won']/big5_def_df['90s']
Clearances_90 = big5_def_df['Clr']/big5_def_df['90s']

In [None]:

plt.figure(figsize=(8,5))
sns.scatterplot(data=big5_def_df, x=AerialduelsWon_90,y=Clearances_90)
plt.text(AerialduelsWon_90[big5_def_df.Player=='Cristian Romero'],Clearances_90[big5_def_df.Player=='Cristian Romero'],"Cristian Romero", color='red')
plt.text(AerialduelsWon_90[big5_def_df.Player=='Eric Dier'],Clearances_90[big5_def_df.Player=='Eric Dier'],"Eric Dier", color='red')
#plt.text(AerialduelsWon_90[big5_def_df.Player=='Davinson SÃ¡nchez'],Clearances_90[big5_def_df.Player=='Davinson SÃ¡nchez'],"Davinson SÃ¡nchez", color='red')


plt.title('Big 5 Leagues: Aerial Duels and Clearances per 90 Minutes') #title
plt.xlabel('Aerial Duels') #x label
plt.ylabel('Clearances') #y label
plt.show()

In [None]:
big5_def_df.info()

In [None]:
big5_def_df.loc[big5_def_df['Squad'].isin(['Tottenham'])]

In [None]:
big5_def_df['Player'] = big5_def_df['Player'].replace(['Davinson SÃ¡nchez','Davinson Sánchez'])

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [None]:
#Columns to drop: Att_y,Cmp_y,Matches,CrdY_y,CrdR_y,2CrdY_y,'Off_y','Crs_y','TklW_y','Int'
big5_player_nongk_df.info()

In [None]:
#Player Radial Plots

In [None]:
#Striker 
#Params: xG per 90 (xG90), Shots per 90 (Sh/90), Touches in Box per 90 (construct: Att 3rd/90s),
        #Shot Touch% (construct: Shots per 90/Touches per 90), xA per 90, Pressure Regains per 90
        #(Succ(misc)/90s), Pressures (Press/90s), Aerial Wins per 90 (Won/90s), Turnovers per 90 ((Mis+Dis)/90s),
        #Successful Dribbles (Succ(poss)/90s), xG per 90/Shot per 90 (xG90/ Sh/90)

In [None]:
#Construct new variables:
Touch_in_Box_90 = big5_player_nongk_df3['Att 3rd']/big5_player_nongk_df3['90s']
Touch_90 = big5_player_nongk_df3['Touches']/big5_player_nongk_df3['90s']
Shot_Touch_90 = big5_player_nongk_df1['Sh/90']/Touch_90
Press_Regain_90 = big5_player_nongk_df4['Succ']/big5_player_nongk_df4['90s']
Press_90 = big5_player_nongk_df4['Press']/big5_player_nongk_df4['90s']
Aerial_Wins_90 = big5_player_nongk_df4['Won']/big5_player_nongk_df4['90s']
Turnovers_90 = (big5_player_nongk_df3['Mis']+big5_player_nongk_df3['Dis'])/big5_player_nongk_df3['90s']
Succ_Dribs_90 = big5_player_nongk_df3['Succ']/big5_player_nongk_df3['90s']
xG_Sh_90 = big5_player_nongk_df1['xG_90']/big5_player_nongk_df1['Sh/90']


In [None]:
xG_90 = big5_player_nongk_df1['xG_90']
Shots_90 = big5_player_nongk_df1['Sh/90']
xA_90 = big5_player_nongk_df1['xA_90']

In [None]:
Player = big5_player_nongk_df1['Player']

In [None]:
from soccerplots.radar_chart import Radar

In [None]:
df_striker = pd.DataFrame()
#df_striker['Player'] = big5_player_nongk_df1['Player']

In [None]:
df_striker['Player'] = Player
df_striker['xG'] = xG_90
df_striker['Shots'] = Shots_90
df_striker['Touches in Box'] = Touch_in_Box_90
df_striker['Shot Touch %'] = Shot_Touch_90
df_striker['xA'] = xA_90
df_striker['Pressure Regains'] = Press_Regain_90
df_striker['Pressures'] = Press_90
df_striker['Aerial Wins'] = Aerial_Wins_90
df_striker['Turnovers'] = Turnovers_90
df_striker['Successful Dribbles'] = Succ_Dribs_90
df_striker['xG/Shot'] = xG_Sh_90

df_striker.head()

In [None]:
df_striker.head()

In [None]:
df_striker.set_index('Player', inplace=True)
df_striker.head()

In [None]:
params_striker = df_striker.columns.tolist()


In [None]:
ranges = [(0.21,0.60),(1.7,3.9),(7,15),(10,2),(0.05,0.23),(0.8,3.0),(11,22),(0.5,4.1),(5.0,1.6),(0.5,2.3),(0.08,0.21)]

In [None]:
values = [0.60,3.92,18.0,9.6,0.22,2.89,11.4,2.6,3.86,1.35,0.13]

In [None]:
values

In [None]:
title = dict(
    title_name='Harry Kane',
    title_color='#000000',
    subtitle_name='Age 27, Tottenham Hotspur, 2020/2021',
    subtitle_color='#132257',
    title_fontsize=18,
    subtitle_fontsize=15,
    )

## endnote 
endnote = "Visualization made by: Aaron Woodward\nAll units are in per 90 minutes\nSource: Fbref.com"


In [None]:
#radar = Radar()

In [None]:
#df_striker['Player'] = 'Harry Kane'

In [None]:
## plotting the radar chart
fig, ax = radar.plot_radar(ranges=ranges, params=params_striker, values=values, radar_color=['#132257', '#FFF200'],
                           title=title,endnote=endnote)



In [None]:
fig.savefig('Harry_Kane_radplot.png')

In [None]:
cd '/Users/aaronwoodward/Desktop'

In [None]:
from soccerplots.radar_chart import Radar

## parameter names
params = ['xAssist', 'Key Passes', 'Crosses Into Box', 'Cross Completion %', 'Deep Completions',
          'Progressive Passes', 'Prog. Pass Accuracy', 'Dribbles', 'Progressive Runs',
          'PADJ Interceptions', 'Succ. Def. Actions', 'Def Duel Win %']

## range values
ranges = [(0.0, 0.15), (0.0, 0.67), (0.06, 6.3), (19.51, 50.0), (0.35, 1.61),
          (6.45, 11.94), (62.9, 79.4), (0.43, 4.08), (0.6, 2.33),
          (4.74, 7.2), (8.59, 12.48), (50.66, 66.67)]

## parameter value
values = [0.11, 0.53, 0.70, 27.66, 1.05, 6.84, 84.62, 4.56, 2.22, 5.93, 8.88, 64.29]

## title values 
title = dict(
    title_name='Sergiño Dest',
    title_color='#000000',
    subtitle_name='AFC Ajax',
    subtitle_color='#B6282F',
    title_fontsize=18,
    subtitle_fontsize=15,
)

## instantiate object
radar = Radar()

## plot radar -- title
fig, ax = radar.plot_radar(ranges=ranges, params=params, values=values, 
                           radar_color=['#B6282F', '#FFFFFF'], title=title)


In [None]:
#Centre-Back
#Params: Passing %, Pressures, Fouls, Tackle %, 
        #Shot Touch% (construct: Shots per 90/Touches per 90), xA per 90, Pressure Regains per 90
        #(Succ(misc)/90s), Pressures (Press/90s), Aerial Wins per 90 (Won/90s), Turnovers per 90 ((Mis+Dis)/90s),
        #Successful Dribbles (Succ(poss)/90s), xG per 90/Shot per 90 (xG90/ Sh/90)

In [None]:
nonbig5_player_nongk_merged2.loc[nonbig5_player_nongk_merged2['Player'].isin(['Michael Zetterer'])]

In [None]:
nonbig5_player_gk_df['Squad'] = nonbig5_player_gk_df['Squad'].replace(['PaÃ§os','MarÃ-timo'],['Paços de Ferreira','Marítimo'])

In [None]:
#Plots 

In [None]:
#Radial plot Example: Harry Kane vs. Mohammed Salah

In [None]:
#Dimensions:


#Strikers: xG90 (stand), Shots (shoot), Touches in Box (poss), Shot Touch% (shoot & poss), xG Assisted (stand), Pressure Regains (def), Pressures (def),
#Aerial Wins (misc), Turnovers (poss), Successful Dribbles (poss), xG/Shot (shoot)

In [None]:
big5_player_poss_df.info()

In [None]:
### Per 90 stats (Big 5 only) ###

In [None]:
player_stand_stats_master_df.loc[player_stand_stats_master_df['Player'].isin(['Bruno Fernandes','Marcel Sabitzer', 'Giovani Lo Celso', 'Tanguy Ndombele'])]