### Provant de llegir i entendre els datasets

Tracking data can be combined with event data with timeelapsed and current_phase

#### The tracking data
The tracking data contains the following columns:

+ 'current_phase': the current period
+ 'timeelapsed': the time in seconds of the current period 
+ 'team_id_opta': Opta team id
+ 'player_id': Opta player id
+ 'jersey_no': jersey number of the player
+ 'pos_x': x-coordinate on the pitch; pitch coordinates in [-52.5, 52.5]
+ 'pos_y': y-coordinate on the pitch; pitch coordinates in [-34, 34]
+ 'frame_count': unique identifier for each frame
+ 'team_id': inidicates home(=1)/away(=2); team_id 4 is the ball
+ 'speed': speed
+ 'acc': acceleration
+ 'speed_x': speed regarding x-axis
+ 'speed_y': speed regarding y-axis
+ 'ball_x': x location of the ball
+ 'ball_y': y location of the ball
+ 'ball_speed': ball speed
+ 'ball_acc': ball acceleration
+ 'dop': direction of play of the team ('L'--> 'Left-to-Right; 'R' --> 'Right-to-Left'

#### The event data (crec que cada json de events és un partit)
This event data is the Opta event data and contains the following columns:
+ 'event_type_id': the Opta event type identifier; see 'event_description' for an explanation
+ 'contestantId': id of the team
+ 'playerId': id of the player
+ 'current_phase': the current period
+ 'timeelapsed': the time in seconds of the current period
+ 'period_minute': the minute in which the game is currently
+ 'period_second': the second of the minute in which the game is currently
+ 'outcome': outcome of the event, 1=successful, 0=otherwise
+ 'event_description': descriptions of 'event_type_id' (see below)

In [1]:
import pandas as pd
import json
import os
import numpy as np

pd.options.display.max_columns = 999

Llegim dataset tracking i els noms de cada event amb el seu id

In [2]:
# load tracking data
current_directory = os.getcwd()
path_tracking = os.path.join(os.path.join(os.path.dirname(current_directory),'data'),"tracking_set_0")
print(path_tracking)
game_id = 1

df_tracking = pd.read_parquet(f'{path_tracking}/{game_id}_tracking.parquet')

#           ------------------------------------------------------------        

# load events names
path_event_csv = os.path.join(os.path.dirname(current_directory),'data')
df_event_names = pd.read_csv(os.path.join(path_event_csv,'event_names.csv'))
dict_event_names = df_event_names.set_index('event_type_id').to_dict()['event_description']


c:\Users\Gabriel\OneDrive\Escritorio\SportsAnalyticsCourse\OptaForum\OptaChallenge_Clustering_Player_Styles\data\tracking_set_0


Llegim el dataset de event, ho relacionem amb el diccionari dels noms de cada event i afegim columna timeelapsed que es la que es relaciona amb tracking

In [8]:
# load event data
def load_event_data(file_name, base_path):
    # read in event file
    with open(f'{base_path}/{file_name}') as f:
        data=json.loads(f.read())

    f.close()
    
    # transform data into pandas dataframe
    df_events = pd.json_normalize(data['liveData']['event'])
    
    # preprocess event data and keep relevant information only

    # add timeelapsed to each event
    df_events['timestamp'] = pd.to_datetime(df_events.timeStamp).apply(lambda x: x.timestamp())

    df_events = df_events.query('periodId in [1,2]')

    def add_timeelapsed_to_events(df):
        start_time = df.query('typeId==32')['timestamp'].iloc[0]
        df['timestamp_new'] = np.int64((df['timestamp'] - start_time)*1000)

        df['timeelapsed'] = df['timestamp_new'].apply(lambda x: (40 * round(x/40))/1000)

        return df

    df_events = df_events.groupby('periodId').apply(add_timeelapsed_to_events)

    df_events = df_events.drop(columns=['timeStamp','timestamp','timestamp_new'])
    
    # rename some columns
    df_events = df_events.rename(columns=
        {
            'periodId':'current_phase',
            'typeId':'event_type_id',
            'timeMin':'period_minute',
            'timeSec':'period_second'
        }
    )
    
    return df_events

path_events = os.path.join(os.path.join(os.path.dirname(current_directory),'data'),"first_10_events")
print(path_events)

event_file = f'{game_id}.json'

df_events = load_event_data(
    base_path=path_events,
    file_name=event_file
)

# add event descriptions
df_events['event_description'] = df_events['event_type_id'].map(dict_event_names)

# make a copy of it for later usage
events_all = df_events.copy()

c:\Users\Gabriel\OneDrive\Escritorio\SportsAnalyticsCourse\OptaForum\OptaChallenge_Clustering_Player_Styles\data\first_10_events


To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)


	>>> .groupby(..., group_keys=True)
  df_events = df_events.groupby('periodId').apply(add_timeelapsed_to_events)


En el dataset event, hi ha una columna que es qualifier. Aquesta columna es un nested diccionary que si fem un merge amb el qualifier_names.csv podrem veure informació més detallada de l'event.

A fer:
- Agafar un event i veure quina informació tinc amb els qualifiers. Provar-ho amb diferents events.

In [26]:
display(df_events.head())
print(df_events['event_description'].unique())

Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
2,2423549045,2,32,1,0,0,3c3jcs7vc1t6vz5lev162jyv7,1,0.0,0.0,2022-05-22T03:17:52Z,"[{'id': 3586084711, 'qualifierId': 127, 'value...",,,,,,,,,,,,0.0,Period start
3,2423549041,2,32,1,0,0,bx0cdmzr2gwr70ez72dorx82p,1,0.0,0.0,2022-05-21T18:59:34Z,"[{'id': 3586084701, 'qualifierId': 127, 'value...",,,,,,,,,,,,0.0,Period start
4,2423549063,3,1,1,0,0,bx0cdmzr2gwr70ez72dorx82p,1,49.9,50.0,2022-05-22T03:34:41Z,"[{'id': 3586084825, 'qualifierId': 56, 'value'...",6u2ob6fv950r1qve8uejkq2uh,,,,,,,,,,,0.04,Pass
5,2423549097,4,1,1,0,2,bx0cdmzr2gwr70ez72dorx82p,1,31.5,57.2,2022-05-22T06:37:07Z,"[{'id': 3586085043, 'qualifierId': 213, 'value...",azuc3tma44xyrbgf5y279o1xx,0.0,"[{'playerId': 'e3kdoxu1kwn2w3wwi1rqhvr9x', 'sh...","[{'playerId': '7sep6mx2s67mh5fr3raxu7aei', 'sh...",0.0029771626,,,,,,,2.84,Pass
6,2423549113,5,1,1,0,7,bx0cdmzr2gwr70ez72dorx82p,1,49.2,95.4,2022-05-22T06:37:06Z,"[{'id': 3586085129, 'qualifierId': 212, 'value...",7sep6mx2s67mh5fr3raxu7aei,1.0,"[{'playerId': '5qgc6zjc38a5xjl35gs7h3vu1', 'sh...","[{'playerId': 'e3kdoxu1kwn2w3wwi1rqhvr9x', 'sh...",0.0309752524,secondToLast,high,"[{'playerId': 'e6ok0deqkoe80184iu509gzu2', 'sh...",,,,7.88,Pass


['Period start' 'Pass' 'Take On' 'Challenge' 'Blocked Pass'
 'Ball recovery' 'Attempted Tackle' 'Out' 'Ball touch' '50/50'
 'Dispossessed' 'Tackle' 'Corner Awarded' 'Clearance' 'Offside Pass'
 'Offside provoked' 'Foul' 'Aerial' 'Keeper pick-up' 'Deleted event'
 'Interception' 'Error' 'Goal' 'Attempt Saved' 'Save' 'Miss' 'Claim'
 'Card' 'Start delay' 'End delay' 'Referee Drop Ball' nan 'End'
 'Player Off' 'Player on' 'Formation change' 'Keeper Sweeper'
 'Shield ball opp']


In [11]:

# read in qualifier list
path_data = os.path.join(os.path.dirname(current_directory),'data')
qualifier_names = pd.read_csv(os.path.join(path_data,"qualifier_names.csv"))

# explode coverts each element in each list to a separate row
cols = ['id', 'qualifier']
qualifiers = events_all[cols].explode('qualifier')
display(qualifiers.head())

print("------------")

qualifiers = qualifiers[qualifiers.qualifier.notna()].reset_index(drop=True)
print(qualifiers.shape)
print("------------")
display(qualifiers.head())
print("------------")

# save corresponding event ids for each qualifier
event_ids = qualifiers.id.tolist()

qualifiers = pd.json_normalize(qualifiers[qualifiers.qualifier.notna()]['qualifier'])
print(qualifiers.shape)
print("------------")
display(qualifiers.head())
print("------------")

qualifiers['event_id'] = event_ids
display(qualifiers.head())
print("------------")
qualifiers = qualifiers.merge(qualifier_names, how='left', on='qualifierId')
display(qualifiers.head())

Unnamed: 0,id,qualifier
2,2423549045,"{'id': 3586084711, 'qualifierId': 127, 'value'..."
3,2423549041,"{'id': 3586084701, 'qualifierId': 127, 'value'..."
4,2423549063,"{'id': 3586084825, 'qualifierId': 56, 'value':..."
4,2423549063,"{'id': 3586084833, 'qualifierId': 213, 'value'..."
4,2423549063,"{'id': 3586084827, 'qualifierId': 140, 'value'..."


------------
(9430, 2)
------------


Unnamed: 0,id,qualifier
0,2423549045,"{'id': 3586084711, 'qualifierId': 127, 'value'..."
1,2423549041,"{'id': 3586084701, 'qualifierId': 127, 'value'..."
2,2423549063,"{'id': 3586084825, 'qualifierId': 56, 'value':..."
3,2423549063,"{'id': 3586084833, 'qualifierId': 213, 'value'..."
4,2423549063,"{'id': 3586084827, 'qualifierId': 140, 'value'..."


------------
(9430, 3)
------------


Unnamed: 0,id,qualifierId,value
0,3586084711,127,Right to Left
1,3586084701,127,Left to Right
2,3586084825,56,Back
3,3586084833,213,2.7
4,3586084827,140,28.5


------------


Unnamed: 0,id,qualifierId,value,event_id
0,3586084711,127,Right to Left,2423549045
1,3586084701,127,Left to Right,2423549041
2,3586084825,56,Back,2423549063
3,3586084833,213,2.7,2423549063
4,3586084827,140,28.5,2423549063


------------


Unnamed: 0,id,qualifierId,value,event_id,qualifier
0,3586084711,127,Right to Left,2423549045,Direction of Play
1,3586084701,127,Left to Right,2423549041,Direction of Play
2,3586084825,56,Back,2423549063,Zone
3,3586084833,213,2.7,2423549063,Angle
4,3586084827,140,28.5,2423549063,Pass End X


In [15]:
# Exemple de dades d'un pase

display(df_events[df_events['id']==2423549063])

display(qualifiers[qualifiers['event_id']==2423549063])

Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
4,2423549063,3,1,1,0,0,bx0cdmzr2gwr70ez72dorx82p,1,49.9,50.0,2022-05-22T03:34:41Z,"[{'id': 3586084825, 'qualifierId': 56, 'value'...",6u2ob6fv950r1qve8uejkq2uh,,,,,,,,,,,0.04,Pass


Unnamed: 0,id,qualifierId,value,event_id,qualifier
2,3586084825,56,Back,2423549063,Zone
3,3586084833,213,2.7,2423549063,Angle
4,3586084827,140,28.5,2423549063,Pass End X
5,3586084835,279,S,2423549063,Kick Off
6,3586084831,212,24.7,2423549063,Length
7,3586084829,141,65.0,2423549063,Pass End Y
8,3587583957,178,,2423549063,Standing


In [17]:
# exemple de dades d'un gol

display(df_events[df_events['event_description']=='Goal'])

display(qualifiers[qualifiers['event_id']==2423557685])

Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
184,2423557685,90,16,1,8,1,3c3jcs7vc1t6vz5lev162jyv7,1,77.3,38.5,2022-05-22T06:35:22Z,"[{'id': 3586133347, 'qualifierId': 20}, {'id':...",3vx94h32ahujciraspdayj9t6,,,,,,medium,"[{'playerId': 'azuc3tma44xyrbgf5y279o1xx', 'sh...",,,,481.16,Goal
349,2423568337,185,16,1,16,51,3c3jcs7vc1t6vz5lev162jyv7,1,76.7,57.0,2022-05-22T06:33:34Z,"[{'id': 3586920981, 'qualifierId': 395, 'value...",8gkexxgf3pypshhqwg6ibp7o4,,,,,,high,"[{'playerId': '4u281v53ges3kimtgac0tidm2', 'sh...",,,,1011.52,Goal
1517,2423644617,774,16,2,74,4,bx0cdmzr2gwr70ez72dorx82p,1,88.5,50.0,2022-05-22T04:00:16Z,"[{'id': 3586609837, 'qualifierId': 56, 'value'...",e3kdoxu1kwn2w3wwi1rqhvr9x,,,,,,,,,,,1744.84,Goal
1552,2423647403,792,16,2,76,33,bx0cdmzr2gwr70ez72dorx82p,1,85.4,49.9,2022-05-22T06:20:45Z,"[{'id': 3586625857, 'qualifierId': 22}, {'id':...",e3kdoxu1kwn2w3wwi1rqhvr9x,,,,,,medium,"[{'playerId': '8gkexxgf3pypshhqwg6ibp7o4', 'sh...",,,,1893.96,Goal
1617,2423653225,832,16,2,81,51,bx0cdmzr2gwr70ez72dorx82p,1,96.1,58.1,2022-05-22T06:44:01Z,"[{'id': 3586664227, 'qualifierId': 214}, {'id'...",e3kdoxu1kwn2w3wwi1rqhvr9x,,,,,,high,"[{'playerId': 'afymbx9eo87zau8mo99pakbu', 'shi...",,,,2211.04,Goal


Unnamed: 0,id,qualifierId,value,event_id,qualifier
948,3586133347,20,,2423557685,Right footed
949,3586904481,396,52.8,2423557685,GK Y Coordinate time of goal
950,3586136429,231,48.6,2423557685,GK Y Coordinate
951,3586133351,374,2022-05-21 20:07:34.824,2423557685,Goal shot timestamp
952,3586136409,80,,2423557685,Low Right
953,3586136411,102,45.8,2423557685,Goal Mouth Y Coordinate
954,3586136421,458,,2423557685,Not assisted
955,3586136425,230,94.7,2423557685,GK X Coordinate
956,3586136415,215,,2423557685,Individual play
957,3586131605,18,,2423557685,Out of box-centre


In [24]:
display(df_events[df_events['event_description']=='Challenge'].head())

display(df_events[df_events['eventId']==28].head())

display(qualifiers[qualifiers['event_id']==2423551705].head())

display(qualifiers[qualifiers['id']==3586099381].head())

Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
9,2423551705,47,45,1,0,10,3c3jcs7vc1t6vz5lev162jyv7,0,35.2,10.7,2022-05-22T06:37:05Z,"[{'id': 3586099111, 'qualifierId': 285}, {'id'...",fvd7y3f6948713acbas7w3u2,,,,,,,,,,,10.44,Challenge
111,2423553563,60,45,1,4,53,3c3jcs7vc1t6vz5lev162jyv7,0,77.0,92.5,2022-05-22T06:35:59Z,"[{'id': 3586109463, 'qualifierId': 233, 'value...",3vx94h32ahujciraspdayj9t6,,,,,,,,,,,293.28,Challenge
231,2423561407,121,45,1,11,2,3c3jcs7vc1t6vz5lev162jyv7,0,23.6,11.2,2022-05-22T06:34:53Z,"[{'id': 3586152025, 'qualifierId': 56, 'value'...",fvd7y3f6948713acbas7w3u2,,,,,,,,,,,662.44,Challenge
235,2423563507,140,45,1,11,6,3c3jcs7vc1t6vz5lev162jyv7,0,16.9,12.5,2022-05-22T06:34:52Z,"[{'id': 3586163367, 'qualifierId': 56, 'value'...",fvd7y3f6948713acbas7w3u2,,,,,,,,,,,666.16,Challenge
294,2423564439,155,45,1,13,37,3c3jcs7vc1t6vz5lev162jyv7,0,45.9,84.8,2022-05-22T06:34:18Z,"[{'id': 3586168433, 'qualifierId': 285}, {'id'...",3vx94h32ahujciraspdayj9t6,,,,,,,,,,,817.12,Challenge


Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
8,2423550707,28,3,1,0,10,bx0cdmzr2gwr70ez72dorx82p,1,64.8,89.3,2022-05-22T06:37:05Z,"[{'id': 3586099381, 'qualifierId': 233, 'value...",6u2ob6fv950r1qve8uejkq2uh,,,,,,high,"[{'playerId': 'fvd7y3f6948713acbas7w3u2', 'shi...",,,,10.44,Take On
57,2423550459,28,1,1,1,50,3c3jcs7vc1t6vz5lev162jyv7,0,48.7,3.4,2022-05-22T06:36:32Z,"[{'id': 3586092309, 'qualifierId': 213, 'value...",afymbx9eo87zau8mo99pakbu,,"[{'playerId': '6ekdnbnk56xlxforb5owt3dn9', 'sh...","[{'playerId': 'e6ok0deqkoe80184iu509gzu2', 'sh...",,,medium,"[{'playerId': '5qgc6zjc38a5xjl35gs7h3vu1', 'sh...",0.011734426,,,110.76,Pass


Unnamed: 0,id,qualifierId,value,event_id,qualifier
32,3586099111,285,,2423551705,Defensive
33,3586099107,56,Back,2423551705,Zone
34,3586099393,233,28,2423551705,Opposite related event ID


Unnamed: 0,id,qualifierId,value,event_id,qualifier
28,3586099381,233,47,2423550707,Opposite related event ID


#### Nombre de pases completats per playerID

Primer filtrem per quedar-nos amb els events que siguin passes, que el jugador sigui identificat i el outcome sigui 1 (succesfull)

In [52]:
# Not NaN playerId and event description = pass

df_pass_events = df_events[(df_events['playerId'].notna()) & (df_events['event_description']=='Pass') &
                           (df_events['outcome']==1)]
display(df_pass_events.head())
print(df_pass_events.shape)

Unnamed: 0,id,eventId,event_type_id,current_phase,period_minute,period_second,contestantId,outcome,x,y,lastModified,qualifier,playerId,lineBreakingPass.linesBroken.value,passOption.player,passTarget.player,xThreat.applied,lineBreakingPass.lastLineBroken.value,pressure.pressureReceived.value,pressure.player,xThreat.removed,keyPass,assist,timeelapsed,event_description
4,2423549063,3,1,1,0,0,bx0cdmzr2gwr70ez72dorx82p,1,49.9,50.0,2022-05-22T03:34:41Z,"[{'id': 3586084825, 'qualifierId': 56, 'value'...",6u2ob6fv950r1qve8uejkq2uh,,,,,,,,,,,0.04,Pass
5,2423549097,4,1,1,0,2,bx0cdmzr2gwr70ez72dorx82p,1,31.5,57.2,2022-05-22T06:37:07Z,"[{'id': 3586085043, 'qualifierId': 213, 'value...",azuc3tma44xyrbgf5y279o1xx,0.0,"[{'playerId': 'e3kdoxu1kwn2w3wwi1rqhvr9x', 'sh...","[{'playerId': '7sep6mx2s67mh5fr3raxu7aei', 'sh...",0.0029771626,,,,,,,2.84,Pass
6,2423549113,5,1,1,0,7,bx0cdmzr2gwr70ez72dorx82p,1,49.2,95.4,2022-05-22T06:37:06Z,"[{'id': 3586085129, 'qualifierId': 212, 'value...",7sep6mx2s67mh5fr3raxu7aei,1.0,"[{'playerId': '5qgc6zjc38a5xjl35gs7h3vu1', 'sh...","[{'playerId': 'e3kdoxu1kwn2w3wwi1rqhvr9x', 'sh...",0.0309752524,secondToLast,high,"[{'playerId': 'e6ok0deqkoe80184iu509gzu2', 'sh...",,,,7.88,Pass
7,2423549127,6,1,1,0,9,bx0cdmzr2gwr70ez72dorx82p,1,72.1,88.0,2022-05-22T06:37:05Z,"[{'id': 3586085187, 'qualifierId': 56, 'value'...",e3kdoxu1kwn2w3wwi1rqhvr9x,,"[{'playerId': '7cp51c8zn7y08iyk0hc9ix5nt', 'sh...","[{'playerId': '6u2ob6fv950r1qve8uejkq2uh', 'sh...",0.0338825583,,high,"[{'playerId': '8qmm84tue6kuz8e5nhhdhmz8p', 'sh...",,,,9.16,Pass
10,2423549153,7,1,1,0,11,bx0cdmzr2gwr70ez72dorx82p,1,63.6,94.3,2022-05-22T06:37:04Z,"[{'id': 3586085373, 'qualifierId': 140, 'value...",6u2ob6fv950r1qve8uejkq2uh,,"[{'playerId': '5qgc6zjc38a5xjl35gs7h3vu1', 'sh...","[{'playerId': '6j0ogojh2b7poyceg7i3k09yi', 'sh...",0.0111802518,,high,"[{'playerId': 'fvd7y3f6948713acbas7w3u2', 'shi...",,,,11.56,Pass


(843, 25)


Aleshores fem un group by per playerId per veure el total de pases completats per jugador

In [55]:
passes_per_player = df_pass_events.groupby('playerId').size().reset_index(name='total passes')
print(passes_per_player)

                     playerId  total passes
0   2lvit204llltk13iglsa2tjah             1
1   3sc349yey596xp2j6xlyt0frp            44
2   3vx94h32ahujciraspdayj9t6            17
3   4u281v53ges3kimtgac0tidm2            42
4   5ak9fwtqlr2pll0nsv5br7p7u            12
5   5qgc6zjc38a5xjl35gs7h3vu1            26
6   6ekdnbnk56xlxforb5owt3dn9            39
7   6j0ogojh2b7poyceg7i3k09yi            58
8   6u2ob6fv950r1qve8uejkq2uh            50
9   72d5uxwcmvhd6mzthxuvev1sl            36
10  7cp51c8zn7y08iyk0hc9ix5nt            56
11  7sep6mx2s67mh5fr3raxu7aei            33
12  8f3bhiy6r5eei1n25exhbwr8p            17
13  8gkexxgf3pypshhqwg6ibp7o4            32
14  8qmm84tue6kuz8e5nhhdhmz8p            32
15  96wcx761pzv5ub4sfwsynp51x            48
16  976riwm0dz0e74d4l28y3ttcl            45
17  a56woizbe4g6jpl3fg4tlgno5            20
18   afymbx9eo87zau8mo99pakbu            25
19  agwvouyocx93y39g7tmwaojx1             5
20  azuc3tma44xyrbgf5y279o1xx            38
21  bvbebtykj45j3luvemk8yc4ph   

#### Mitjana de distancia de passes per playerId

Relacionem el qualifier amb el qualifier_names per tal de tenir un dataframe amb cada data de cada event

In [57]:
# explode coverts each element in each list to a separate row
cols = ['id','playerId','qualifier']
qualifiers = df_pass_events[cols].explode('qualifier')
qualifiers = qualifiers[qualifiers.qualifier.notna()].reset_index(drop=True)
display(qualifiers.head())

print("------------")

# save corresponding event ids for each qualifier
event_ids = qualifiers.id.tolist()
player_ids = qualifiers.playerId.tolist()

qualifiers = pd.json_normalize(qualifiers[qualifiers.qualifier.notna()]['qualifier'])
print(qualifiers.shape)

print("------------")

qualifiers['event_id'] = event_ids
qualifiers['player_id'] = player_ids
qualifiers = qualifiers.merge(qualifier_names, how='left', on='qualifierId')
display(qualifiers.head())

Unnamed: 0,id,playerId,qualifier
0,2423549063,6u2ob6fv950r1qve8uejkq2uh,"{'id': 3586084825, 'qualifierId': 56, 'value':..."
1,2423549063,6u2ob6fv950r1qve8uejkq2uh,"{'id': 3586084833, 'qualifierId': 213, 'value'..."
2,2423549063,6u2ob6fv950r1qve8uejkq2uh,"{'id': 3586084827, 'qualifierId': 140, 'value'..."
3,2423549063,6u2ob6fv950r1qve8uejkq2uh,"{'id': 3586084835, 'qualifierId': 279, 'value'..."
4,2423549063,6u2ob6fv950r1qve8uejkq2uh,"{'id': 3586084831, 'qualifierId': 212, 'value'..."


------------
(5439, 3)
------------


Unnamed: 0,id,qualifierId,value,event_id,player_id,qualifier
0,3586084825,56,Back,2423549063,6u2ob6fv950r1qve8uejkq2uh,Zone
1,3586084833,213,2.7,2423549063,6u2ob6fv950r1qve8uejkq2uh,Angle
2,3586084827,140,28.5,2423549063,6u2ob6fv950r1qve8uejkq2uh,Pass End X
3,3586084835,279,S,2423549063,6u2ob6fv950r1qve8uejkq2uh,Kick Off
4,3586084831,212,24.7,2423549063,6u2ob6fv950r1qve8uejkq2uh,Length


Filtrem per nomes quedar-nos amb files on el qualifier == Length i després fem groupby per player_id i agafan la mitja del value

In [70]:
qualifiers_length = qualifiers[qualifiers['qualifier']=='Length']

qualifiers_length['value']=qualifiers_length['value'].astype(float)

avg_length_per_player = qualifiers_length.groupby('player_id')['value'].mean().reset_index(name='avg_length').sort_values(by='avg_length',ascending=False)

display(avg_length_per_player)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  qualifiers_length['value']=qualifiers_length['value'].astype(float)


Unnamed: 0,player_id,avg_length
22,ccu7hw3wrcspl1a18g2ldnsh5,29.6
0,2lvit204llltk13iglsa2tjah,25.6
17,a56woizbe4g6jpl3fg4tlgno5,23.705
6,6ekdnbnk56xlxforb5owt3dn9,23.405128
19,agwvouyocx93y39g7tmwaojx1,22.74
14,8qmm84tue6kuz8e5nhhdhmz8p,21.79375
16,976riwm0dz0e74d4l28y3ttcl,19.682222
10,7cp51c8zn7y08iyk0hc9ix5nt,19.505357
18,afymbx9eo87zau8mo99pakbu,19.276
9,72d5uxwcmvhd6mzthxuvev1sl,19.038889


Quants events hi ha i per cada event, quants qualifierId hi ha i si son els mateixos

In [59]:
print(len(qualifiers['event_id'].unique()))

print(qualifiers.groupby('event_id').size().reset_index(name='total qualifiers'))

print(qualifiers['qualifier'].value_counts())

843
       event_id  total qualifiers
0    2423549063                 7
1    2423549097                 6
2    2423549113                 6
3    2423549127                 7
4    2423549153                 6
..          ...               ...
838  2423664983                 7
839  2423665291                 8
840  2423665377                 6
841  2423665895                 9
842  2423668971                 7

[843 rows x 2 columns]
Zone                  843
Pass End X            843
Angle                 843
Length                843
Pass End Y            843
Standing              768
Long ball              79
Chipped                55
Not visible            35
Throw In               34
Head pass              30
Jumping                27
Free kick taken        20
Direct                 19
Keeper Throw           15
Goal Kick              13
Assist                 13
Lay-off                12
Intentional Assist     11
Low                    11
Switch of play         11
Cross             