__Method__

Soccer is played by 11 players on each side, therefore, the best method for capturing the pattern of teams from optical tracking data seems to use players’ individual attributes. Given the data for every player, we initialize the players to some canonical order and the order remains fixed throughout the match. For $N = 22$ players on the pitch, the data representation at a frame t can be set as $X_t = [x1, y1, x2, y2, x3, y3, ..., x22, y22]^T$. Directly using players’ individual jersey numbers for the analysis seems to be a good approach. However, it lacks important contextual concepts such as when a substitution occurs, players receive medical treatment outside of the pitch, a player is sent away, or when analyzing different teams (i.e., different jersey numbers or identities). To deal with all these problems and players’ positional interchanges, a role based analysis is conducted to make dynamic representation possible while initializing the players into the order that is based on their roles. This approach allows each player to be assigned to different roles, only role per frame while changing their positions throughout the match. After grouping players to particular roles, we use players’ $(x, y)$ coordinates and speed attributes to generate features, such as average coordinates and the fastest player attributes in the group. Soccer has its own well-established game strategies such as the tendency of teams to keep their formation structure when opposition team owns the ball. To carry out a role assignment, a cumulative average position of a team is calculated at the frames when the rival team has the ball. A cumulative average position is calculated for the window of the previous 15 minutes and updated at each minute. For each frame, player’s average $(x, y)$ positions is scaled  based on the dispersion of the players’ distribution.

__How we assign role to each player?__
1. Define which player's in the game.
2. Generate players' average positions based on their position in last 15 minutes.
3. Each minute update their average position.
4. Scale average positions to the range $\left [\delta^{1} + \frac{1}{N} \sum_{i}^{N} \left ( p_i-\bar{p} \right )^{2}, \delta^{2} - \frac{1}{N} \sum_{i}^{N} \left ( p_i-\bar{p} \right )^{2} \right ]$ based on their standard deviation of averge position data $\eqref{eq:pythagoras}$.
5. Seperate pitch into different role-based segments.
6. Assign role to player based on given pitch segments.

$$
\begin{equation}
p^{'}_{k} = \delta_{2}- \frac{\left( \left( \delta_{2}-\frac{1}{N} \sum_{i}^{N} \left ( p_i-\bar{p} \right )^{2} \right) - \left(\delta_{1}+\frac{1}{N} \sum_{i}^{N} \left ( p_i-\bar{p} \right )^{2}  \right)\right)\left(\alpha-p_{k}\right) }{\alpha  - \beta}
\label{eq:pythagoras}
\tag{1}
\end{equation}
$$ where $\delta^{1}$ are $\delta^{2}$ respectively start and end points of a segment for a given axis, $\alpha = \underset{1\leq i \leq N }{\max p_i}$, and $\beta = \underset{1\leq i \leq N }{\min p_i}$.
7. Calculate the features for each player



<img src="../assets/scaling_exmp.png" width="400"/>

***note:*** *Above image depicts the role assigment to players after sclaing their avg pos for the span of last 15 min.*

Detailed info on features and their calculations can be found on classes.FeatureBuilder. Target data is derived from a player who owns the ball. On raw data, we have no information on ball location or its states. We are only able to determine the ball location if a specific player who has the ball at the moment. Then we set player's location as a ball location. Therefore, we have no information on a significant amount of data frames when the ball passes from one player to another and when game stops. We only going to use data frames when some player has the ball.

**Raw data-dict structure for a single frame** (structure of per_sec_data_id.json):

{<br/>
 &emsp; **'distance'**: 3.1426183970482264, <br/>
 &emsp; **'half'**: 1, <br/>
 &emsp; **'hasballJerseyNumber'**: 6, <br/>
 &emsp; **'hasballTeamId'**: 101, <br/>
 &emsp; **'id'**: 280999, <br/>
 &emsp; **'jerseyNumber'**: 6, <br/>
 &emsp; **'matchId'**: 60568, <br/>
 &emsp; **'minute'**: 19, <br/>
 &emsp; **'second'**: 19, <br/>
 &emsp; **'speed'**: 3.139151535835957, <br/>
 &emsp; **'teamId'**: 101, <br/>
 &emsp; **'xpos'**: 34.366599999999686,<br/>
 &emsp; **'ypos'**: 23.508333333333262
 <br/>
 }

In [1]:
import json
import pickle
import pandas as pd
import numpy as np
import csv
import math
from scipy.ndimage.interpolation import shift
import operator
import scipy as sc

# our common models and utility functions
from classes.DataModel import *
from classes.PlayersRoleSet import *
from classes.FeatureBuilder import *
from classes.Utils import load_json_content

pd.options.display.max_columns = None
pd.set_option('compute.use_bottleneck', True)
pd.set_option('compute.use_numexpr', True)

### Use different segment division and roles assignment

In [16]:
# first we should decide what kind of segment division and role assignment we're going to employ.
# Then based on role group, we will calculate all features combinations in next steps.

# BELOW IS AN TO EXAMPLE TO HOW TO OVERRIDE FUNCTIONS OF PLAYERSROLESET: YOU CAN TRY YOUR OWN STUFF

# new_role_segment_coords = {'LB':[-1,  40, -1, 19],    # Left Back 
#                       'CB':[-1,  35,  19, 49],   # Center Back
#                       'RB':[-1,  40,  49, 69],   # Right Back
#                       'LW':[ 40, 106, -1, 19],  # Left Winger
#                       'CM':[ 35,  75,  19, 49],  # Center Midfielder  
#                       'RW':[ 40, 106,  49, 69], # Right Winger
#                       'CF':[ 75, 106,  19, 49]}  # Center Forward/Attacking Middle

# we need to ovverride our find_segments function in order to do 2nd role assigmnet
# class PlayersRoleSet_New(PlayersRoleSet):
#     def find_segments(self, x, y):

#         for role, segment in self.role_segment_coords.items():
#             if (x>=segment[0] and x<=segment[1] and y>=segment[2] and y<=segment[3]):
#                 return role
            
#         return None

# playersRoleSet_New = PlayersRoleSet_New()
# playersRoleSet.set_role_segment_coords(new_role_segment_coords)

In [2]:
# INIT PlayersRoleSet class
playersRoleSet = PlayersRoleSet()

In [17]:
# runing this you can see the list of all feature combinations we're going to use
featureBuilder = FeatureBuilder(list(playersRoleSet.get_role_segment_coords().keys()))
feature_list = featureBuilder.get_feature_labels()

feature_list[:10] # print the first 10 features

['home_LB_all_avrg_x',
 'home_LB_all_avrg_y',
 'home_LB_all_avrg_speed',
 'home_LB_all_avrg_direction_x',
 'home_LB_all_avrg_direction_y',
 'home_LB_all_inner_dis_to_avrg_pos',
 'home_LB_slow_avrg_x',
 'home_LB_slow_avrg_y',
 'home_LB_slow_avrg_speed',
 'home_LB_hir_avrg_x']

#### More detailed infor on features list

group features (home&away) | Seperate Team features (home&away) | Teams features (all together) | Refree features
--- | --- | --- | ---
avrg x,y | $dbscan^{5*}$ avrg x,y | avrg x,y | x,y 
avrg speed | dbscan avrg speed | avrg speed | speed 
avrg speed direction on x,y | inner distance to dbscan | inner distance to avrg pos | direction on x,y
avrg pos | gk x,y | dbscan avrg x,y |
$inner distance^{1*}$ to avrg pos | gk speed | dbscan avrg speed | 
$slow^{2*}$ players' avrg x,y | gk direction on x,y | inner distance to dbscan pos | 
slow players' avrg x,y | min x,y | | 
$hir^{3*}$ players' avrg x,y | min (x,y)'s speed | | 
hir players' avrg x,y | min speed | | 
$sprint^{4*}$ players' avrg x,y | min speed's x,y | | 
sprint players' avrg x,y | max x,y | | 
 | max (x,y)' speed  | |
 | max speed  | |
 | max speed's x,y | |
 
$^{1*}$Inner distance shows the sum of distance of players' from the average position of the traget cluster.

$^{*2, *3, *4}$Analyzing the strategic context of soccer, three speed groups were selected and their attributes were calculated. These groups are Slow, Hir (high-intensity run) and Sprint, where 
$$
\left\{\begin{matrix}
\vec{V}_{slow}\leq1.5\frac{m}{s},\\
1.5\frac{m}{s}<\vec{V}_{hir}\leq 3\frac{m}{s},\\
3\frac{m}{s}<\vec{V}_{sprint}.
\end{matrix}\right.
$$

Each speed category of any group shows different characters in their distance to the ball's coordinates.


The average, max, min (x, y) coordinates of players were used to recognize
the moments when same team players were approaching each
other, and eventually approaching to the ball as well, to gain
more control of the ball. However, the average position for
any role group did not give the desired values when some
players in that group were nested and the remaining players
were relatively remote. 

For this reason, clusters were found by
applying density-based clustering (DBSCAN)$^{*5}$ method on
all role groups. For both axes, we set the minimum number of
samples in a neighborhood required to form a cluster to k (we set this number to 7 meters when calculating teams-altogether features, to 4 when calculating separate team features ) and
the maximum distance between two samples in a neighborhood
to 20 meters. Consequently, the average (x, y) coordinates of
each role group were calculated for the given clusters.

### Calculate features

In [3]:
# CONTANTS
match_start_minute={1:0, 2:45} # indicate starting minutes of matches for each half
error_list = [60706, 60795, 60798, 60845, 116001326] # match IDs with data error  

# MATCH INFO LABELS
match_info =  {'match_id':None, 'half':None, 'minute':None, 'second':None, 'game_state':None, 'x':None, 'y':None}

In [4]:
# GET GENERAL INFO ABOUT MATCHES
matches = load_json_content('../data/general/matches_2017_2018_v1.json')

In [9]:
# FEATURE EXTRACTION

for match_data in matches:
    
    match_id = int(match_data['id'])
    
    if match_id not in error_list:
        

        match_info['match_id'] = match_id # set match id        
        
        # get roaster data
        match_squad_info = load_json_content('../data/match_' + str(match_id) + '/roster_data_' + str(match_id) + '.json')
        # get player data
        match_palyer_info = load_json_content('../data/match_' + str(match_id) + '/players_data_' + str(match_id) + '.json')
        # get per second data   
        data_persec = load_json_content('../data/match_' + str(match_id) + '/per_sec_data_' + str(match_id) + '.json')
        
        
        # INIT CLASSES
        
        # INIT DataModel class
        dataModel = DataModel(minutes_step=15)

        # init players info
        dataModel.init_players_info(match_data, match_squad_info, match_palyer_info)
        
        # INIT FeatureBuilder class
        featureBuilder = FeatureBuilder(list(playersRoleSet.get_role_segment_coords().keys())) # get rollist and init feature builder 
        

        del match_squad_info, match_palyer_info
        
    
        # list to store features 
        features_set = []
        
        # reset ball position info: x-pos, y-pos, game state to define if game stopped or not
        ball_position = {'x':0, 'y':0, 'game_state':False}
    
        # var for define half, second
        match_start_half = 0
        prev_second = 0
        
        # ITERATE THROUGH EACH SECOND
        for data_t in data_persec:

            # AT THE BEGNING OF EACH HALF GET PLAYER INFO FOR SELECTED TIMEWINDOW FOR ROLE ASSIGNING, THEN START ITERATION
            if data_t['half']!=match_start_half:

                dataModel.reset_positional_info()

                match_start_half=data_t['half']

                dataModel.calculate_fist_time_step_data(data_persec=data_persec, match_half=match_start_half, threshold_min=5)

                #----------  assign role  -------
                # get scaled values
                scale_values=playersRoleSet.get_scaled_values(dataModel=dataModel, posCounIndex=-2)

                # iterate through each player
                for team, _ in dataModel.player_data.items():
                    for ID, playerdata in dataModel.player_data[team].items():

                        # set role to player if sum of rolePosCount and last min count of posCount is equal is nonzero
                        dataModel.player_data[team][ID]['role'] = playersRoleSet.set_role(playerdata=playerdata, posCounIndex=-2, scale_values=scale_values, team=team)
                #----------  assign role  -------

                prev_minute = match_start_minute[match_start_half]



            # EACH MINUTE ASSIGN ROLE TO PLAYERS
            if prev_minute!=data_t['minute']:

                prev_minute=data_t['minute']

                #----------  assign role  -------
                # get scaled values
                scale_values=playersRoleSet.get_scaled_values(dataModel=dataModel, posCounIndex=-1)
            
            
                # iterate through each player
                for team, _ in dataModel.player_data.items():
                    for ID, playerdata in dataModel.player_data[team].items():

                        # set role to player if sum of rolePosCount and last min count of posCount is equal is nonzero
                        dataModel.player_data[team][ID]['role'] = playersRoleSet.set_role(playerdata=playerdata, posCounIndex=-1, scale_values=scale_values, team=team)
                #----------  assign role  -------

                # shif time-step data to right after role assignment
                dataModel.shift_tms_data_right()



            # EACH SECOND CALCULATE FEATURES HERE
            # after geting all players features for a second calculate all at the end of second
            
            if prev_second!=data_t['second']:

                prev_second=data_t['second']

                # make sure we have more than six players on the pitch for each team at the time
                if len(featureBuilder.role_data['home']['Team']['all_x'])>6 and len(featureBuilder.role_data['away']['Team']['all_x'])>6:

                        
                    # get match info and Y
                    match_info['half'] = data_t['half']
                    match_info['minute'] = data_t['minute']-1 if (data_t['second']==0) else data_t['minute']
                    match_info['second'] = 59 if (data_t['second']==0) else data_t['second']-1
                    match_info['game_state'] = 1 if ball_position['game_state']==True else 0
                    match_info['x'] = -1 if (ball_position['game_state']==False and ball_position['x']==0) else ball_position['x']
                    match_info['y'] = -1 if (ball_position['game_state']==False and ball_position['y']==0) else ball_position['y']

                    # get features and match info then add to main list
                    features_set.append(list(match_info.values()) + featureBuilder.calculate_features())


                # at the end, reset current_data dict-array
                featureBuilder.reset_referee_data()
                featureBuilder.reset_goalkeeper_data()
                featureBuilder.reset_role_data()
                
                # reset ball position info
                ball_position = {'x':0, 'y':0, 'game_state':False}

            if (data_t['xpos']>-1 and data_t['xpos']<106) and (data_t['ypos']>-1 and data_t['ypos']<69):
                
                # for role assignment get players data when the rival team has the ball
                dataModel.add_player_activity(data_t['teamId'], data_t['hasballTeamId'], data_t['jerseyNumber'], data_t['xpos'], data_t['ypos'])
                
                # round data to 2 in order
                xpos = round(data_t['xpos'], 2)
                ypos = round(data_t['ypos'], 2)
                speed = round(data_t['speed'], 2)
                    
            
                if (data_t['hasballTeamId']==data_t['teamId'] and data_t['hasballJerseyNumber']==data_t['jerseyNumber'] 
                    and data_t['teamId']!=0 and data_t['jerseyNumber']!=0):
                    ball_position['x'] = xpos
                    ball_position['y'] = ypos
                    ball_position['game_state'] = True
                

                if data_t['teamId']!=dataModel.referee_id:

                    team = dataModel.team_id_dict[data_t['teamId']]
                    role = dataModel.player_data[team][data_t['jerseyNumber']]['role']

                    if (role != None and np.sum(dataModel.player_data[team][data_t['jerseyNumber']]['rolePosCount'])!=0):

                        if role!='GK':

                            # define speed categorgy
                            speed_prefix='sprint' if speed>3 else 'slow' if speed<1.5 else 'hir'

                            for tmp_role in [role, 'Team']:
                                for tmp_speed in [speed_prefix, 'all']:

                                    # add player values
                                    featureBuilder.role_data[team][tmp_role][tmp_speed+'_x'].append(xpos)
                                    featureBuilder.role_data[team][tmp_role][tmp_speed+'_y'].append(ypos)
                                    featureBuilder.role_data[team][tmp_role][tmp_speed+'_speed'].append(speed)

                        else:

                            # add goalkeeper values
                            featureBuilder.goalkeeper_data[team]['x'].append(xpos)
                            featureBuilder.goalkeeper_data[team]['y'].append(ypos)
                            featureBuilder.goalkeeper_data[team]['speed'].append(speed)

                else:

                    # add referee values
                    featureBuilder.referee_data['x'].append(xpos)
                    featureBuilder.referee_data['y'].append(ypos)
                    featureBuilder.referee_data['speed'].append(speed)

        # SAVE FEATURES SET
        with open('../data/general/feature-set/match_{}.csv'.format(match_info['match_id']),'w') as file:
            writer = csv.writer(file)
            writer.writerow(list(match_info.keys())+featureBuilder.get_feature_labels())
            writer.writerows(features_set)

        file.close()
        
        del dataModel
        del featureBuilder
        del data_persec