<a href="https://colab.research.google.com/github/Owenp25/Top-5-Football-Leagues-Clustering-Analysis/blob/main/Top_5_Leagues_Clustering_Teams_by_Play_Style.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Top 5 Leagues: Clustering Teams by Play Style**

For this project my goal is to cluster teams in the top 5 leagues by their passing, goal & shot creation, pass types, possession, and defensive stats to determine teams that are similar in playing style. My initial plan is to do this with a combination of the aforementioned stats to best represent the team’s overall play style: this includes type of ball progression from defense to attack, tempo of play, and most prominent type of shot creation. I plan on using K-means clustering in python to do this. Often in the soccer world we classify teams on a scale somewhere between controlling possession and direct counter attack. I think this project is useful because it can show how these play styles can be quantified and how distinct the groups of tactical approaches are. If there are more distinct clusters, then it is evident that certain general patterns of play are effective across all of Europe’s best leagues. If there is more of a sliding scale of play styles, then we will have insight into how important individual managers’ philosophies are. Looking at contemporary literature, “K-means cluster analysis was also used by Gollan, Ferrar, and Norton [36] to recognize playing styles. Three game style clusters were identified: (1) moderately favoring established defense, (2) dominant in transition offense and transition defense, and (3) strong in established offense and set pieces. The disadvantage of this method is that it does neither recognize playing styles, nor is it capable of quantifying them; instead, it categorizes the teams based on the phases in which they excel” (Plakias S et al.,  Identifying Soccer Teams’ Styles of Play: A Scoping and Critical Review.) After reading this I still want to proceed with K-means clustering, but I will keep in mind the known downsides. I think my next step would be to use factor analysis to work backwards from individual statistics in an effort to see what overarching play styles there are.

References

Plakias S, Moustakidis S, Kokkotis C, Tsatalas T, Papalexi M, Plakias D, Giakas G, Tsaopoulos D. Identifying Soccer Teams’ Styles of Play: A Scoping and Critical Review. Journal of Functional Morphology and Kinesiology. 2023; 8(2):39. https://doi.org/10.3390/jfmk8020039


**Scraping Data**

In [1]:
# My first step is scraping data from FBref using code courtesy of this article by Paul Corcoran: https://levelup.gitconnected.com/quickly-and-easily-scrape-fbref-using-just-pandas-773b294f86a0

In [2]:
import pandas as pd

# read in standard data for top 5 leagues
top5_std = pd.read_html('https://fbref.com/en/comps/Big5/stats/squads/Big-5-European-Leagues-Stats')

# read in shooting data for top 5 leagues
top5_shooting = pd.read_html('https://fbref.com/en/comps/Big5/shooting/squads/Big-5-European-Leagues-Stats')

# read in goalkeeping data for top5 leagues
top5_goalkeeping = pd.read_html('https://fbref.com/en/comps/Big5/keepers/squads/Big-5-European-Leagues-Stats')

# read in advanced goalkeeping data for top 5 leagues
top5_advgoalkeeping = pd.read_html('https://fbref.com/en/comps/Big5/keepersadv/squads/Big-5-European-Leagues-Stats')

# read in passing data for top5 leagues
top5_passing = pd.read_html('https://fbref.com/en/comps/Big5/passing/squads/Big-5-European-Leagues-Stats')

# read in passing type data for top 5 leagues
top5_passtypes = pd.read_html('https://fbref.com/en/comps/Big5/passing_types/squads/Big-5-European-Leagues-Stats')

# read in goal creation data for top 5 leagues
top5_gca = pd.read_html('https://fbref.com/en/comps/Big5/gca/squads/Big-5-European-Leagues-Stats')

# read in defensive data for top 5 leagues
top5_def = pd.read_html('https://fbref.com/en/comps/Big5/defense/squads/Big-5-European-Leagues-Stats')

# read in possession data for top 5 leagues
top5_poss = pd.read_html('https://fbref.com/en/comps/Big5/possession/squads/Big-5-European-Leagues-Stats')

# read in misc data for top5 leagues
top5_misc = pd.read_html('https://fbref.com/en/comps/Big5/misc/squads/Big-5-European-Leagues-Stats')



In [3]:
# clean up formatting of each table's df, then save as dataframe

for idx,table in enumerate(top5_std):
 print('***************************')
 print(idx)
 print(table)
t5_std = top5_std[0]

for idx,table in enumerate(top5_shooting):
 print('***************************')
 print(idx)
 print(table)
t5_shooting = top5_shooting[0]

for idx,table in enumerate(top5_goalkeeping):
 print('***************************')
 print(idx)
 print(table)
t5_keepers = top5_goalkeeping[0]

for idx,table in enumerate(top5_advgoalkeeping):
 print('***************************')
 print(idx)
 print(table)
t5_advkeepers = top5_advgoalkeeping[0]

for idx,table in enumerate(top5_passing):
 print('***************************')
 print(idx)
 print(table)
t5_passing = top5_passing[0]

for idx,table in enumerate(top5_passtypes):
 print('***************************')
 print(idx)
 print(table)
t5_passtypes = top5_passtypes[0]

for idx,table in enumerate(top5_gca):
 print('***************************')
 print(idx)
 print(table)
t5_gca = top5_gca[0]

for idx,table in enumerate(top5_def):
 print('***************************')
 print(idx)
 print(table)
t5_def = top5_def[0]

for idx,table in enumerate(top5_poss):
 print('***************************')
 print(idx)
 print(table)
t5_poss = top5_poss[0]

for idx,table in enumerate(top5_misc):
 print('***************************')
 print(idx)
 print(table)
t5_misc = top5_misc[0]




***************************
0
   Unnamed: 0_level_0 Unnamed: 1_level_0  Unnamed: 2_level_0  \
                   Rk              Squad                Comp   
0                   1            Ajaccio          fr Ligue 1   
1                   2            Almería          es La Liga   
2                   3             Angers          fr Ligue 1   
3                   4            Arsenal  eng Premier League   
4                   5        Aston Villa  eng Premier League   
..                ...                ...                 ...   
93                 94         Villarreal          es La Liga   
94                 95      Werder Bremen       de Bundesliga   
95                 96           West Ham  eng Premier League   
96                 97          Wolfsburg       de Bundesliga   
97                 98             Wolves  eng Premier League   

   Unnamed: 3_level_0 Unnamed: 4_level_0 Unnamed: 5_level_0 Playing Time  \
                 # Pl                Age               Poss  

In [None]:
t5_std

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Expected,Expected,Expected,Expected,Expected
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,Gls,Sh,SoT,SoT%,Sh/90,...,G/SoT,Dist,FK,PK,PKatt,xG,npxG,npxG/Sh,G-xG,np:G-xG
0,1,Ajaccio,fr Ligue 1,36,38.0,22,311,81,26.0,8.18,...,0.20,18.1,12,6,9,36.1,29.4,0.10,-14.1,-13.4
1,2,Almería,es La Liga,29,38.0,49,439,155,35.3,11.55,...,0.30,18.1,21,2,3,45.5,43.2,0.10,3.5,3.8
2,3,Angers,fr Ligue 1,33,38.0,31,367,121,33.0,9.66,...,0.22,17.5,18,4,4,40.9,37.8,0.11,-9.9,-10.8
3,4,Arsenal,eng Premier League,26,38.0,84,589,194,32.9,15.50,...,0.42,16.0,16,3,4,71.9,69.1,0.12,12.1,11.9
4,5,Aston Villa,eng Premier League,26,38.0,49,427,145,34.0,11.24,...,0.32,18.0,18,3,4,50.2,47.2,0.11,-1.2,-1.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,57,499,188,37.7,13.13,...,0.28,18.0,22,4,7,61.7,56.1,0.12,-4.7,-3.1
94,95,Werder Bremen,de Bundesliga,25,34.0,50,363,136,37.5,10.68,...,0.33,17.9,18,5,6,41.1,36.4,0.10,8.9,8.6
95,96,West Ham,eng Premier League,25,38.0,41,466,133,28.5,12.26,...,0.26,17.3,13,6,8,49.2,43.7,0.10,-8.2,-8.7
96,97,Wolfsburg,de Bundesliga,28,34.0,55,401,138,34.4,11.79,...,0.37,16.9,11,4,7,51.3,45.8,0.12,3.7,5.2


In [None]:
t5_shooting

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Standard,Expected,Expected,Expected,Expected,Expected
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,Gls,Sh,SoT,SoT%,Sh/90,...,G/SoT,Dist,FK,PK,PKatt,xG,npxG,npxG/Sh,G-xG,np:G-xG
0,1,Ajaccio,fr Ligue 1,36,38.0,22,311,81,26.0,8.18,...,0.20,18.1,12,6,9,36.1,29.4,0.10,-14.1,-13.4
1,2,Almería,es La Liga,29,38.0,49,439,155,35.3,11.55,...,0.30,18.1,21,2,3,45.5,43.2,0.10,3.5,3.8
2,3,Angers,fr Ligue 1,33,38.0,31,367,121,33.0,9.66,...,0.22,17.5,18,4,4,40.9,37.8,0.11,-9.9,-10.8
3,4,Arsenal,eng Premier League,26,38.0,84,589,194,32.9,15.50,...,0.42,16.0,16,3,4,71.9,69.1,0.12,12.1,11.9
4,5,Aston Villa,eng Premier League,26,38.0,49,427,145,34.0,11.24,...,0.32,18.0,18,3,4,50.2,47.2,0.11,-1.2,-1.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,57,499,188,37.7,13.13,...,0.28,18.0,22,4,7,61.7,56.1,0.12,-4.7,-3.1
94,95,Werder Bremen,de Bundesliga,25,34.0,50,363,136,37.5,10.68,...,0.33,17.9,18,5,6,41.1,36.4,0.10,8.9,8.6
95,96,West Ham,eng Premier League,25,38.0,41,466,133,28.5,12.26,...,0.26,17.3,13,6,8,49.2,43.7,0.10,-8.2,-8.7
96,97,Wolfsburg,de Bundesliga,28,34.0,55,401,138,34.4,11.79,...,0.37,16.9,11,4,7,51.3,45.8,0.12,3.7,5.2


In [None]:
t5_keepers

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Playing Time,Playing Time,Playing Time,Unnamed: 7_level_0,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Penalty Kicks,Penalty Kicks,Penalty Kicks,Penalty Kicks,Penalty Kicks
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,MP,Starts,Min,90s,GA,GA90,...,W,D,L,CS,CS%,PKatt,PKA,PKsv,PKm,Save%
0,1,Ajaccio,fr Ligue 1,3,38,38,3420,38.0,74,1.95,...,7,5,26,6,15.8,10,9,0,1,0.0
1,2,Almería,es La Liga,3,38,38,3420,38.0,65,1.71,...,11,8,19,4,10.5,6,4,1,1,20.0
2,3,Angers,fr Ligue 1,2,38,38,3420,38.0,81,2.13,...,4,6,28,4,10.5,11,9,1,1,10.0
3,4,Arsenal,eng Premier League,1,38,38,3420,38.0,43,1.13,...,26,6,6,14,36.8,5,3,0,2,0.0
4,5,Aston Villa,eng Premier League,2,38,38,3420,38.0,46,1.21,...,18,7,13,12,31.6,6,5,1,0,16.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,3,38,38,3420,38.0,40,1.05,...,19,7,12,12,31.6,4,4,0,0,0.0
94,95,Werder Bremen,de Bundesliga,2,34,34,3060,34.0,64,1.88,...,10,6,18,4,11.8,3,2,1,0,33.3
95,96,West Ham,eng Premier League,2,38,38,3420,38.0,55,1.45,...,11,7,20,9,23.7,6,4,1,1,20.0
96,97,Wolfsburg,de Bundesliga,1,34,34,3060,34.0,48,1.41,...,13,10,11,12,35.3,7,4,1,2,20.0


In [None]:
t5_advkeepers

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Goals,Goals,Goals,Goals,Goals,...,Passes,Goal Kicks,Goal Kicks,Goal Kicks,Crosses,Crosses,Crosses,Sweeper,Sweeper,Sweeper
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,GA,PKA,FK,CK,OG,...,AvgLen,Att,Launch%,AvgLen,Opp,Stp,Stp%,#OPA,#OPA/90,AvgDist
0,1,Ajaccio,fr Ligue 1,3,38.0,74,9,1,11,2,...,39.0,260,82.7,51.8,507,26,5.1,60,1.58,15.9
1,2,Almería,es La Liga,3,38.0,65,4,1,5,1,...,37.0,350,52.0,42.9,642,40,6.2,28,0.74,12.4
2,3,Angers,fr Ligue 1,2,38.0,81,9,1,9,4,...,34.3,254,53.9,42.3,500,38,7.6,27,0.71,11.6
3,4,Arsenal,eng Premier League,1,38.0,43,3,0,6,1,...,33.3,153,59.5,49.2,381,22,5.8,43,1.13,16.1
4,5,Aston Villa,eng Premier League,2,38.0,46,5,1,9,4,...,33.4,240,51.7,41.4,528,65,12.3,76,2.00,16.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,3,38.0,40,4,0,4,1,...,28.6,221,34.4,35.7,548,37,6.8,45,1.18,14.9
94,95,Werder Bremen,de Bundesliga,2,34.0,64,2,0,6,2,...,36.5,291,50.5,43.2,496,25,5.0,42,1.24,15.0
95,96,West Ham,eng Premier League,2,38.0,55,4,0,4,1,...,38.9,279,58.1,43.5,559,31,5.5,26,0.68,12.9
96,97,Wolfsburg,de Bundesliga,1,34.0,48,4,0,5,3,...,33.7,258,57.0,47.5,489,29,5.9,51,1.50,16.5


In [None]:
t5_passing

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Total,Total,Total,Total,Total,...,Long,Unnamed: 19_level_0,Unnamed: 20_level_0,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0,Unnamed: 25_level_0,Unnamed: 26_level_0,Unnamed: 27_level_0
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,Cmp,Att,Cmp%,TotDist,PrgDist,...,Cmp%,Ast,xAG,xA,A-xAG,KP,1/3,PPA,CrsPA,PrgP
0,1,Ajaccio,fr Ligue 1,36,38.0,11216,15286,73.4,206467,81586,...,50.1,12,22.7,23.4,-10.7,234,954,188,71,1169
1,2,Almería,es La Liga,29,38.0,12151,15855,76.6,225475,89276,...,51.9,33,32.5,28.1,0.5,316,846,218,81,1090
2,3,Angers,fr Ligue 1,33,38.0,14103,17463,80.8,252709,88748,...,56.6,18,28.0,27.6,-10.0,276,1124,214,65,1393
3,4,Arsenal,eng Premier League,26,38.0,18281,21969,83.2,310206,100635,...,57.1,64,53.8,46.8,10.2,443,1637,459,62,2049
4,5,Aston Villa,eng Premier League,26,38.0,13782,17396,79.2,244084,87581,...,52.2,35,38.8,31.6,-3.8,318,964,268,80,1242
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,16634,19862,83.7,295552,101196,...,63.7,33,43.1,38.3,-10.1,374,1304,349,52,1622
94,95,Werder Bremen,de Bundesliga,25,34.0,12443,16554,75.2,236998,93050,...,55.0,37,29.0,34.6,8.0,274,922,274,66,1140
95,96,West Ham,eng Premier League,25,38.0,12369,16358,75.6,219013,84296,...,50.2,25,32.7,26.8,-7.7,340,990,212,80,1274
96,97,Wolfsburg,de Bundesliga,28,34.0,12571,16129,77.9,245765,94282,...,57.0,44,39.2,40.4,4.8,303,1009,272,67,1306


In [None]:
t5_passtypes

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Pass Types,Pass Types,Pass Types,Pass Types,Pass Types,Pass Types,Pass Types,Pass Types,Corner Kicks,Corner Kicks,Corner Kicks,Outcomes,Outcomes,Outcomes
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,Att,Live,Dead,FK,TB,Sw,Crs,TI,CK,In,Out,Str,Cmp,Off,Blocks
0,1,Ajaccio,fr Ligue 1,36,38.0,15286,13368,1855,562,39,57,631,795,126,42,65,0,11216,63,349
1,2,Almería,es La Liga,29,38.0,15855,13865,1907,552,49,132,605,750,147,58,59,1,12151,83,308
2,3,Angers,fr Ligue 1,33,38.0,17463,15689,1727,483,40,92,584,705,164,98,42,0,14103,47,294
3,4,Arsenal,eng Premier League,26,38.0,21969,20293,1620,516,86,97,674,615,223,154,8,3,18281,56,379
4,5,Aston Villa,eng Premier League,26,38.0,17396,15544,1794,599,54,101,561,684,162,97,19,4,13782,58,325
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,19862,17980,1795,574,86,186,520,642,210,96,39,4,16634,87,347
94,95,Werder Bremen,de Bundesliga,25,34.0,16554,14903,1586,449,39,164,514,639,109,27,53,1,12443,65,324
95,96,West Ham,eng Premier League,25,38.0,16358,14625,1671,347,41,162,769,722,207,130,57,1,12369,62,390
96,97,Wolfsburg,de Bundesliga,28,34.0,16129,14311,1768,544,38,133,568,696,153,107,38,0,12571,50,315


In [None]:
t5_gca

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,SCA,SCA,SCA Types,SCA Types,SCA Types,SCA Types,SCA Types,SCA Types,GCA,GCA,GCA Types,GCA Types,GCA Types,GCA Types,GCA Types,GCA Types
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,SCA,SCA90,PassLive,PassDead,TO,...,Fld,Def,GCA,GCA90,PassLive,PassDead,TO,Sh,Fld,Def
0,1,Ajaccio,fr Ligue 1,36,38.0,539,14.18,350,72,22,...,51,10,38,1.00,23,0,3,5,5,2
1,2,Almería,es La Liga,29,38.0,758,19.95,533,73,51,...,33,17,76,2.00,51,7,8,7,3,0
2,3,Angers,fr Ligue 1,33,38.0,647,17.03,471,56,44,...,44,7,53,1.39,35,5,2,4,7,0
3,4,Arsenal,eng Premier League,26,38.0,1045,27.50,776,78,57,...,43,18,150,3.95,121,6,5,13,3,2
4,5,Aston Villa,eng Premier League,26,38.0,751,19.76,565,50,48,...,40,10,84,2.21,55,2,6,12,6,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,884,23.26,665,45,73,...,40,8,89,2.34,59,2,13,11,4,0
94,95,Werder Bremen,de Bundesliga,25,34.0,649,19.09,502,46,30,...,41,5,90,2.65,71,6,3,4,6,0
95,96,West Ham,eng Premier League,25,38.0,817,21.50,586,81,39,...,35,15,67,1.76,41,6,5,9,6,0
96,97,Wolfsburg,de Bundesliga,28,34.0,713,20.97,510,63,40,...,47,7,102,3.00,75,7,6,5,9,0


In [None]:
t5_def

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Tackles,Tackles,Tackles,Tackles,Tackles,...,Challenges,Challenges,Challenges,Blocks,Blocks,Blocks,Unnamed: 17_level_0,Unnamed: 18_level_0,Unnamed: 19_level_0,Unnamed: 20_level_0
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,Tkl,TklW,Def 3rd,Mid 3rd,Att 3rd,...,Att,Tkl%,Lost,Blocks,Sh,Pass,Int,Tkl+Int,Clr,Err
0,1,Ajaccio,fr Ligue 1,36,38.0,637,382,321,236,80,...,693,45.2,380,359,85,274,420,1057,670,10
1,2,Almería,es La Liga,29,38.0,557,322,292,202,63,...,623,44.9,343,435,122,313,322,879,855,13
2,3,Angers,fr Ligue 1,33,38.0,652,386,311,268,73,...,610,50.8,300,402,99,303,375,1027,605,13
3,4,Arsenal,eng Premier League,26,38.0,568,343,238,212,118,...,506,49.2,257,362,86,276,237,805,599,22
4,5,Aston Villa,eng Premier League,26,38.0,633,338,305,251,77,...,646,48.5,333,438,118,320,324,957,714,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,578,370,276,201,101,...,534,50.6,264,372,125,247,262,840,691,16
94,95,Werder Bremen,de Bundesliga,25,34.0,595,349,269,273,53,...,617,48.6,317,371,103,268,321,916,789,7
95,96,West Ham,eng Premier League,25,38.0,607,335,289,226,92,...,558,48.2,289,415,139,276,408,1015,810,10
96,97,Wolfsburg,de Bundesliga,28,34.0,543,324,255,206,82,...,521,50.9,256,443,116,327,292,835,701,9


In [None]:
t5_poss

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Touches,Touches,Touches,Touches,...,Carries,Carries,Carries,Carries,Carries,Carries,Carries,Carries,Receiving,Receiving
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,Poss,90s,Touches,Def Pen,Def 3rd,Mid 3rd,...,Carries,TotDist,PrgDist,PrgC,1/3,CPA,Mis,Dis,Rec,PrgR
0,1,Ajaccio,fr Ligue 1,36,43.2,38.0,19271,1979,6134,9380,...,12563,73525,32335,487,406,92,625,324,11114,1150
1,2,Almería,es La Liga,29,45.1,38.0,20070,2715,7746,8468,...,10790,56980,29134,569,432,131,580,281,12047,1080
2,3,Angers,fr Ligue 1,33,46.9,38.0,21515,2041,6588,10633,...,15521,89958,41416,632,596,149,593,349,13974,1374
3,4,Arsenal,eng Premier League,26,59.3,38.0,25909,1994,6632,11452,...,15923,84011,44201,824,583,281,526,378,18101,2024
4,5,Aston Villa,eng Premier League,26,49.3,38.0,21501,2783,7912,9077,...,12294,65583,33156,637,434,174,565,371,13587,1227
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,56.9,38.0,23815,2621,7659,10213,...,14391,81455,45472,846,570,239,552,343,16363,1617
94,95,Werder Bremen,de Bundesliga,25,49.3,34.0,20465,2390,7294,9382,...,10736,52169,25197,412,317,104,503,246,12329,1130
95,96,West Ham,eng Premier League,25,42.1,38.0,20412,2362,6899,8702,...,10539,57433,28201,614,413,165,460,355,12256,1266
96,97,Wolfsburg,de Bundesliga,28,50.6,34.0,20053,2304,7013,8781,...,11350,55720,28117,455,301,103,560,272,12444,1294


In [None]:
t5_misc

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Performance,Aerial Duels,Aerial Duels,Aerial Duels
Unnamed: 0_level_1,Rk,Squad,Comp,# Pl,90s,CrdY,CrdR,2CrdY,Fls,Fld,...,Crs,Int,TklW,PKwon,PKcon,OG,Recov,Won,Lost,Won%
0,1,Ajaccio,fr Ligue 1,36,38.0,86,10,2,541,495,...,631,420,382,8,10,2,2053,602,711,45.8
1,2,Almería,es La Liga,29,38.0,98,4,2,443,401,...,605,322,322,2,6,1,1890,482,530,47.6
2,3,Angers,fr Ligue 1,33,38.0,69,5,3,492,422,...,584,375,386,4,11,4,2025,436,441,49.7
3,4,Arsenal,eng Premier League,26,38.0,51,0,0,373,435,...,674,237,343,3,5,1,1984,486,559,46.5
4,5,Aston Villa,eng Premier League,26,38.0,80,1,0,417,498,...,561,324,338,4,6,4,1847,442,465,48.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,38.0,91,4,2,450,482,...,520,262,370,5,4,1,1819,357,375,48.8
94,95,Werder Bremen,de Bundesliga,25,34.0,76,1,0,426,369,...,514,321,349,3,3,2,1987,631,622,50.4
95,96,West Ham,eng Premier League,25,38.0,44,0,0,362,316,...,769,408,335,6,6,1,2023,620,587,51.4
96,97,Wolfsburg,de Bundesliga,28,34.0,63,0,0,390,451,...,568,292,324,6,7,3,1800,525,546,49.0


In [None]:
# now I have the individual tables of each group of team statistics. I will now merge these so that I have one big data set with each row containing
# all necessary statistics for a unique team

In [4]:
t5_std.columns

MultiIndex([('Unnamed: 0_level_0',       'Rk'),
            ('Unnamed: 1_level_0',    'Squad'),
            ('Unnamed: 2_level_0',     'Comp'),
            ('Unnamed: 3_level_0',     '# Pl'),
            ('Unnamed: 4_level_0',      'Age'),
            ('Unnamed: 5_level_0',     'Poss'),
            (      'Playing Time',       'MP'),
            (      'Playing Time',   'Starts'),
            (      'Playing Time',      'Min'),
            (      'Playing Time',      '90s'),
            (       'Performance',      'Gls'),
            (       'Performance',      'Ast'),
            (       'Performance',      'G+A'),
            (       'Performance',     'G-PK'),
            (       'Performance',       'PK'),
            (       'Performance',    'PKatt'),
            (       'Performance',     'CrdY'),
            (       'Performance',     'CrdR'),
            (          'Expected',       'xG'),
            (          'Expected',     'npxG'),
            (          'Expected',      

In [9]:
# remove overarching multi index level, just want to keep column specific names
t5_std = t5_std.droplevel(0, axis = 1)
t5_shooting = t5_shooting.droplevel(0, axis = 1)
t5_advkeepers = t5_advkeepers.droplevel(0, axis = 1)
t5_keepers = t5_keepers.droplevel(0, axis = 1)
t5_passing = t5_passing.droplevel(0, axis = 1)
t5_passtypes = t5_passtypes.droplevel(0, axis = 1)
t5_gca = t5_gca.droplevel(0, axis = 1)
t5_def = t5_def.droplevel(0, axis = 1)
t5_poss = t5_poss.droplevel(0, axis = 1)
t5_misc = t5_misc.droplevel(0, axis = 1)

In [12]:
# merge all data frames on team name

top5_all = t5_std.merge(t5_shooting,on='Squad').merge(t5_advkeepers,on='Squad').merge(t5_keepers,on='Squad').merge(t5_def,on='Squad').merge(t5_shooting,on='Squad').merge(t5_passing,on='Squad').merge(t5_passtypes,on='Squad').merge(t5_poss,on='Squad').merge(t5_misc,on='Squad')
top5_all

  top5_all = t5_std.merge(t5_shooting,on='Squad').merge(t5_advkeepers,on='Squad').merge(t5_keepers,on='Squad').merge(t5_def,on='Squad').merge(t5_shooting,on='Squad').merge(t5_passing,on='Squad').merge(t5_passtypes,on='Squad').merge(t5_poss,on='Squad').merge(t5_misc,on='Squad')
  top5_all = t5_std.merge(t5_shooting,on='Squad').merge(t5_advkeepers,on='Squad').merge(t5_keepers,on='Squad').merge(t5_def,on='Squad').merge(t5_shooting,on='Squad').merge(t5_passing,on='Squad').merge(t5_passtypes,on='Squad').merge(t5_poss,on='Squad').merge(t5_misc,on='Squad')
  top5_all = t5_std.merge(t5_shooting,on='Squad').merge(t5_advkeepers,on='Squad').merge(t5_keepers,on='Squad').merge(t5_def,on='Squad').merge(t5_shooting,on='Squad').merge(t5_passing,on='Squad').merge(t5_passtypes,on='Squad').merge(t5_poss,on='Squad').merge(t5_misc,on='Squad')
  top5_all = t5_std.merge(t5_shooting,on='Squad').merge(t5_advkeepers,on='Squad').merge(t5_keepers,on='Squad').merge(t5_def,on='Squad').merge(t5_shooting,on='Squad').

Unnamed: 0,Rk_x,Squad,Comp_x,# Pl_x,Age,Poss_x,MP_x,Starts_x,Min_x,90s_x,...,Crs_y,Int_y,TklW_y,PKwon,PKcon,OG_y,Recov,Won,Lost_y,Won%
0,1,Ajaccio,fr Ligue 1,36,29.1,43.2,38,418,3420,38.0,...,631,420,382,8,10,2,2053,602,711,45.8
1,2,Almería,es La Liga,29,26.4,45.1,38,418,3420,38.0,...,605,322,322,2,6,1,1890,482,530,47.6
2,3,Angers,fr Ligue 1,33,25.7,46.9,38,418,3420,38.0,...,584,375,386,4,11,4,2025,436,441,49.7
3,4,Arsenal,eng Premier League,26,24.7,59.3,38,418,3420,38.0,...,674,237,343,3,5,1,1984,486,559,46.5
4,5,Aston Villa,eng Premier League,26,27.0,49.3,38,418,3420,38.0,...,561,324,338,4,6,4,1847,442,465,48.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,Villarreal,es La Liga,32,28.4,56.9,38,418,3420,38.0,...,520,262,370,5,4,1,1819,357,375,48.8
94,95,Werder Bremen,de Bundesliga,25,27.4,49.3,34,374,3060,34.0,...,514,321,349,3,3,2,1987,631,622,50.4
95,96,West Ham,eng Premier League,25,28.2,42.1,38,418,3420,38.0,...,769,408,335,6,6,1,2023,620,587,51.4
96,97,Wolfsburg,de Bundesliga,28,24.8,50.6,34,374,3060,34.0,...,568,292,324,6,7,3,1800,525,546,49.0


In [15]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [16]:
print(top5_all.columns.tolist())

['Rk_x', 'Squad', 'Comp_x', '# Pl_x', 'Age', 'Poss_x', 'MP_x', 'Starts_x', 'Min_x', '90s_x', 'Gls_x', 'Ast_x', 'G+A', 'G-PK', 'PK_x', 'PKatt_x', 'CrdY_x', 'CrdR_x', 'xG_x', 'npxG_x', 'xAG_x', 'npxG+xAG', 'PrgC_x', 'PrgP_x', 'Gls_x', 'Ast_x', 'G+A', 'G-PK', 'G+A-PK', 'xG_x', 'xAG_x', 'xG+xAG', 'npxG_x', 'npxG+xAG', 'Rk_y', 'Comp_y', '# Pl_y', '90s_y', 'Gls_y', 'Sh_x', 'SoT_x', 'SoT%_x', 'Sh/90_x', 'SoT/90_x', 'G/Sh_x', 'G/SoT_x', 'Dist_x', 'FK_x', 'PK_y', 'PKatt_y', 'xG_y', 'npxG_y', 'npxG/Sh_x', 'G-xG_x', 'np:G-xG_x', 'Rk_x', 'Comp_x', '# Pl_x', '90s_x', 'GA_x', 'PKA_x', 'FK_y', 'CK_x', 'OG_x', 'PSxG', 'PSxG/SoT', 'PSxG+/-', '/90', 'Cmp_x', 'Att_x', 'Cmp%_x', 'Att_x', 'Thr', 'Launch%', 'AvgLen', 'Att_x', 'Launch%', 'AvgLen', 'Opp', 'Stp', 'Stp%', '#OPA', '#OPA/90', 'AvgDist', 'Rk_y', 'Comp_y', '# Pl_y', 'MP_y', 'Starts_y', 'Min_y', '90s_y', 'GA_y', 'GA90', 'SoTA', 'Saves', 'Save%', 'W', 'D', 'L', 'CS', 'CS%', 'PKatt_x', 'PKA_y', 'PKsv', 'PKm', 'Save%', 'Rk_x', 'Comp_x', '# Pl_x', '90s_