# SoccerData
## Understat

Advanced statistics such as xG, xGBuildup and xGChain, and shot events with associated xG values for the top European leagues.

[Source URL](https://understat.com/) | 
[Example usage](https://soccerdata.readthedocs.io/en/latest/datasources/Understat.html) |
[API reference](https://soccerdata.readthedocs.io/en/latest/reference/understat.html) |
[Py Code](https://github.com/probberechts/soccerdata/blob/master/soccerdata/understat.py)

In [2]:
import soccerdata as sd
import pandas as pd

# Show all cols and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [3]:
# Create scraper class instance
understat = sd.Understat(leagues="ITA-Serie A", seasons="2023")
print(understat.__doc__)

Provides pd.DataFrames from data at https://understat.com.

    Data will be downloaded as necessary and cached locally in
    ``~/soccerdata/data/Understat``.

    Parameters
    ----------
    proxy : 'tor' or dict or list(dict) or callable, optional
        Use a proxy to hide your IP address. Valid options are:
            - "tor": Uses the Tor network. Tor should be running in
              the background on port 9050.
            - dict: A dictionary with the proxy to use. The dict should be
              a mapping of supported protocols to proxy addresses. For example::

                  {
                      'http': 'http://10.10.1.10:3128',
                      'https': 'http://10.10.1.10:1080',
                  }

            - list(dict): A list of proxies to choose from. A different proxy will
              be selected from this list after failed requests, allowing rotating
              proxies.
            - callable: A function that returns a valid proxy. This funct

### Leagues

In [3]:
leagues = understat.read_leagues()
leagues.head()

Unnamed: 0_level_0,league_id,url
league,Unnamed: 1_level_1,Unnamed: 2_level_1
ITA-Serie A,2,https://understat.com/league/Serie_A


In [5]:
#hist.reset_index(inplace=True)
leagues.to_json("./data/Understat/leagues.json")

### Seasons

In [7]:
seasons = understat.read_seasons()
seasons.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,league_id,season_id,url
league,season,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ITA-Serie A,2324,2,2023,https://understat.com/league/Serie_A/2023


### Schedule

In [8]:
schedule = understat.read_schedule()
schedule.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,league_id,season_id,game_id,date,home_team_id,away_team_id,home_team,away_team,away_team_code,home_team_code,home_goals,away_goals,home_xg,away_xg,is_result,has_data,url
league,season,game,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
ITA-Serie A,2324,2023-08-19 Empoli-Verona,2,2023,22305,2023-08-19 16:30:00,108,94,Empoli,Verona,VER,EMP,0,1,0.844727,0.690842,True,True,https://understat.com/match/22305
ITA-Serie A,2324,2023-08-19 Frosinone-Napoli,2,2023,22306,2023-08-19 16:30:00,112,105,Frosinone,Napoli,NAP,FRO,1,3,0.871157,1.31464,True,True,https://understat.com/match/22306
ITA-Serie A,2324,2023-08-19 Genoa-Fiorentina,2,2023,22308,2023-08-19 18:45:00,101,110,Genoa,Fiorentina,FIO,GEN,1,4,0.79732,1.1197,True,True,https://understat.com/match/22308
ITA-Serie A,2324,2023-08-19 Inter-Monza,2,2023,22307,2023-08-19 18:45:00,106,271,Inter,Monza,MON,INT,2,0,2.49549,0.581247,True,True,https://understat.com/match/22307
ITA-Serie A,2324,2023-08-20 Lecce-Lazio,2,2023,22312,2023-08-20 18:45:00,243,96,Lecce,Lazio,LAZ,LEC,2,1,1.75411,2.22215,True,True,https://understat.com/match/22312


In [10]:
schedule.to_json("./data/Understat/schedule.json")

### Team match stats

In [9]:
team_match_stats = understat.read_team_match_stats()
team_match_stats.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,league_id,season_id,game_id,date,home_team_id,away_team_id,home_team,away_team,away_team_code,home_team_code,away_points,away_expected_points,away_goals,away_xg,away_np_xg,away_np_xg_difference,away_ppda,away_deep_completions,home_points,home_expected_points,home_goals,home_xg,home_np_xg,home_np_xg_difference,home_ppda,home_deep_completions
league,season,game,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
ITA-Serie A,2324,2023-08-19 Empoli-Verona,2,2023,22305,2023-08-19 16:30:00,108,94,Empoli,Verona,VER,EMP,3,1.1689,1,0.690842,0.690842,-0.153885,6.695652,6,0,1.4686,0,0.844727,0.844727,0.153885,10.166667,3
ITA-Serie A,2324,2023-08-19 Frosinone-Napoli,2,2023,22306,2023-08-19 16:30:00,112,105,Frosinone,Napoli,NAP,FRO,3,1.6737,3,1.31464,1.31464,1.204782,11.1,10,0,0.9918,1,0.871157,0.109858,-1.204782,20.785714,4
ITA-Serie A,2324,2023-08-19 Genoa-Fiorentina,2,2023,22308,2023-08-19 18:45:00,101,110,Genoa,Fiorentina,FIO,GEN,3,1.6427,4,1.1197,1.1197,0.32238,10.1875,6,0,1.0316,1,0.79732,0.79732,-0.32238,16.08,5
ITA-Serie A,2324,2023-08-19 Inter-Monza,2,2023,22307,2023-08-19 18:45:00,106,271,Inter,Monza,MON,INT,0,0.19,0,0.581247,0.581247,-1.914243,14.111111,10,3,2.7046,2,2.49549,2.49549,1.914243,17.666667,9
ITA-Serie A,2324,2023-08-20 Lecce-Lazio,2,2023,22312,2023-08-20 18:45:00,243,96,Lecce,Lazio,LAZ,LEC,0,1.7201,1,2.22215,2.22215,0.46804,9.96,9,3,1.0589,2,1.75411,1.75411,-0.46804,15.714286,13


In [12]:
team_match_stats.tail()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,league_id,season_id,game_id,date,home_team_id,away_team_id,home_team,away_team,away_team_code,home_team_code,away_points,away_expected_points,away_goals,away_xg,away_np_xg,away_np_xg_difference,away_ppda,away_deep_completions,home_points,home_expected_points,home_goals,home_xg,home_np_xg,home_np_xg_difference,home_ppda,home_deep_completions
league,season,game,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
ITA-Serie A,2324,2024-05-06 Salernitana-Atalanta,2,2023,22647,2024-05-06 16:00:00,264,107,Salernitana,Atalanta,ATA,SAL,3,2.6061,2,2.26499,2.26499,1.767923,7.157895,12,0,0.2637,1,0.497067,0.497067,-1.767923,15.954545,4
ITA-Serie A,2324,2024-05-06 Udinese-Napoli,2,2023,22654,2024-05-06 18:45:00,99,105,Udinese,Napoli,NAP,UDI,1,1.8862,1,1.1157,1.1157,0.593839,6.608696,7,1,0.796,1,0.521861,0.521861,-0.593839,21.875,4
ITA-Serie A,2324,2024-05-10 Frosinone-Inter,2,2023,22657,2024-05-10 18:45:00,112,106,Frosinone,Inter,INT,FRO,3,1.4947,5,2.05666,2.05666,0.16857,16.5,5,0,1.2292,0,1.88809,1.88809,-0.16857,44.111111,8
ITA-Serie A,2324,2024-05-11 AC Milan-Cagliari,2,2023,22663,2024-05-11 18:45:00,111,116,AC Milan,Cagliari,CAG,MIL,0,0.1948,1,1.33911,1.33911,-2.22394,25.352941,3,3,2.7178,5,3.56305,3.56305,2.22394,9.863636,8
ITA-Serie A,2324,2024-05-11 Napoli-Bologna,2,2023,22662,2024-05-11 16:00:00,105,97,Napoli,Bologna,BOL,NAP,3,1.5969,2,1.74055,1.74055,1.018688,16.55,3,0,1.0779,0,1.48316,0.721862,-1.018688,5.891892,3


In [14]:
team_match_stats.to_json("./data/Understat/team_match_stats.json")

### Player season stats

In [4]:
player_season_stats = understat.read_player_season_stats()
player_season_stats.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,league_id,season_id,team_id,player_id,position,matches,minutes,goals,xg,np_goals,np_xg,assists,xa,shots,key_passes,yellow_cards,red_cards,xg_chain,xg_buildup
league,season,team,player,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ITA-Serie A,2324,AC Milan,Alejandro Jiménez,2,2023,111,12168,S,3,64,0,0.0,0,0.0,0,0.096809,0,1,1,0,0.177403,0.177403
ITA-Serie A,2324,AC Milan,Alessandro Florenzi,2,2023,111,1254,D S,29,1607,1,1.509824,1,1.509824,3,2.240234,31,36,5,0,8.196851,6.849538
ITA-Serie A,2324,AC Milan,Antonio Mirante,2,2023,111,1609,GK,1,90,0,0.0,0,0.0,0,0.0,0,0,0,0,0.0,0.0
ITA-Serie A,2324,AC Milan,Chaka Traorè,2,2023,111,9260,S,2,13,1,0.114,1,0.114,0,0.0,1,0,0,0,0.114,0.0
ITA-Serie A,2324,AC Milan,Christian Pulisic,2,2023,111,2662,F M S,34,2478,12,7.700121,12,7.700121,7,4.986092,58,41,2,0,18.770952,7.69354


In [5]:
player_season_stats.to_json("./data/Understat/player_season_stats.json")

### Player match stats

In [6]:
player_match_stats = understat.read_player_match_stats()
player_match_stats.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,league_id,season_id,game_id,team_id,player_id,position,position_id,minutes,goals,own_goals,shots,xg,xa,xg_chain,xg_buildup
league,season,game,team,player,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Alberto Grassi,2,2023,22305,108,1504,DMC,7,89,0,0,0,0.0,0.021415,0.473683,0.452268
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Ardian Ismajli,2,2023,22305,108,9090,DC,3,90,0,0,0,0.0,0.0,0.05781,0.05781
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Elia Caprile,2,2023,22305,108,7087,GK,1,90,0,0,0,0.0,0.0,0.05781,0.05781
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Emmanuel Ekong,2,2023,22305,108,9856,Sub,17,1,0,0,0,0.0,0.016783,0.016783,0.0
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Emmanuel Gyasi,2,2023,22305,108,8913,AML,13,89,0,0,1,0.452268,0.0,0.519963,0.067695


In [8]:
player_match_stats.to_json("./data/Understat/player_match_stats.json")

### Shot events

In [9]:
shot_events = understat.read_shot_events()
shot_events.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,league_id,season_id,game_id,date,shot_id,team_id,player_id,assist_player_id,assist_player,xg,location_x,location_y,minute,body_part,situation,result
league,season,game,team,player,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Emmanuel Gyasi,2,2023,22305,2023-08-19 16:30:00,533365,108,8913,605284.0,Tommaso Baldanzi,0.452268,0.923,0.571,41,Right Foot,Open Play,Missed Shot
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Francesco Caputo,2,2023,22305,2023-08-19 16:30:00,533360,108,6980,,,0.067695,0.937,0.672,12,Right Foot,Open Play,Saved Shot
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Matteo Cancellieri,2,2023,22305,2023-08-19 16:30:00,533359,108,9071,605281.0,Alberto Grassi,0.021415,0.725,0.516,10,Left Foot,Open Play,Missed Shot
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Matteo Cancellieri,2,2023,22305,2023-08-19 16:30:00,533361,108,9071,,,0.064669,0.916,0.352,12,Right Foot,Open Play,Missed Shot
ITA-Serie A,2324,2023-08-19 Empoli-Verona,Empoli,Razvan Marin,2,2023,22305,2023-08-19 16:30:00,533358,108,8856,605280.0,Liberato Cacace,0.024054,0.79,0.677,7,Right Foot,Open Play,Saved Shot


In [11]:
shot_events.reset_index(inplace=True)
shot_events.to_json("./data/Understat/shot_events.json")