# Data Exploration and first metrics computation
From Laurie Shaw:

Thee first step is simply to get the example script to run (it should produce a bunch of plots, obviously you'll have to adjust the file paths to wherever you saved the tracking data).

Once you've got it to run, try to understand what each line in the example script is doing. 

- The data is mostly stored in 'frames_tb' which is a list of individual frames, as defined by the class with the same name. Each frame instance contains the positions and velocities of the players and ball at a given instant in time. The data is sampled at 25Hz, so there are 25 frames/second, and about 140,000 for the match. 
- The example code gives you some idea of how to extract positions and velocities over some range of frames. 
- The Tracab.py module describes how the data is organized: take a look at the 'tracab_frame' class to see the structure.
- Tracking_Visuals contains plotting routines, and Tracking_Velocities contains the code that calculates player and ball velocities from the positions (which could probably be done better).

In [73]:
import os
import Tracab as tracab
import Tracking_Visuals as vis
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import datetime
import seaborn as sns

import importlib
importlib.reload(tracab)

# import importlib
# import foo #import the module here, so that it can be reloaded.
# importlib.reload(foo)

<module 'Tracab' from '/Users/jeffbalkanski/research_soccer/SoccerTrackingData3/Tracab.py'>

In [3]:
# config
current_dir = os.path.dirname(os.getcwd())
fpath = os.path.join(current_dir, 'TrackingSample') # path to directory of Tracab data
LEAGUE = 'DSL'

# Read Tracking data

We read the data:
* frames is a list of the individual match snapshots (positions, velocities)
* match contains some metadata (pitch dimensions, etc)
* team1_players is a dictionary of the home team players (containing arrays of their positions/velocities over the match)
* team0_players is a dictionary of the away team players (containing arrays of their positions/velocities over the match)

In [52]:
# data
fname = '984628'

# read frames, match meta data, and data for individual players
frames_tb, match_tb, team1_players, team0_players = tracab.read_tracab_match_data(LEAGUE, fpath, fname, verbose=True)

Reading match metadata
Reading match tracking data
Timestamping frames
Measuring velocities
home goalkeeper(s):  [1]
away goalkeeper(s):  [73]
0 67615
67616 139808


In [53]:
print('there are {} frames'.format(len(frames_tb)))

there are 139810 frames


# Read physical summary

In [12]:
split_players = pd.read_csv(os.path.join(fpath, '984628_Physical_Summary_1_clean_players.csv'))
split_agg = pd.read_csv(os.path.join(fpath, '984628_Physical_Summary_1_clean_agg.csv'), index_col=0)

In [13]:
display(split_agg)

Unnamed: 0,Total,First,Second
Game time,93:13:00,45:05:00,48:08:00
Ball in play,57:51:00,27:12:00,30:39:00
Home TIP,27:23:00,12:36,14:47
Away TIP,29:37:00,13:57,15:40


In [14]:
split_players.head()

Unnamed: 0,ID,team_id,Player,Minutes,Distance,Standing,Walking,Jogging,Running,High Speed Running,...,Sprint Distance TIP,No. of High Intensity Runs TIP,Distance OTIP,HSR Distance OTIP,Sprint Distance OTIP,No. of High Intensity Runs OTIP,Distance BOP,HSR Distance BOP,Sprint Distance BOP,No. of High Intensity Runs BOP
0,182413,1,Jacob Rinne,93:12:00,4528.57,20.3,3361.22,1029.15,96.67,21.23,...,0.0,2,1641.41,13.59,0.0,2,1536.91,0.0,0.0,0
1,155453,1,Kasper Pedersen,93:12:00,9532.75,9.14,3549.7,4441.56,1081.65,322.31,...,1.02,5,4216.26,271.79,127.37,34,2271.82,38.31,0.0,3
2,80502,1,Jores Okore,93:12:00,9691.12,7.04,3179.93,4849.27,1262.01,327.06,...,0.0,8,4065.08,266.05,65.81,29,2413.78,6.52,0.0,4
3,180169,1,Philipp Ochs,93:12:00,10420.81,5.36,3619.81,4643.77,1455.0,563.26,...,59.4,15,4308.79,392.9,74.21,36,2569.26,18.36,0.0,1
4,48601,1,Patrick Kristensen,93:12:00,10907.62,8.13,3216.4,4905.82,2093.52,569.47,...,64.62,25,4381.58,376.43,49.66,33,2546.22,19.84,0.0,4


In [16]:
# # add info
# split_players['sub'] =  split_players['Minutes'] < '93:12:00'
# split_players['avg_speed'] =  split_players['Distance'].divide(pd.Series(minutes))

## Plot physical summary


In [None]:
team1 = split_players[split_players['team_id'] == 1]
team2 = split_players[split_players['team_id'] == 2]

# Read Type IDs

In [23]:
QualID_Descriptions = pd.read_csv('QualID_Descriptions.csv', index_col=0)
QualID_Descriptions.head()

Unnamed: 0_level_0,Short,Long
Qual_ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Long ball,Long pass over 35 yards
2,Cross,A ball played in from wide areas into the box
3,Head pass,Pass made with a players head
4,Through ball,Ball played through for player making an attac...
5,Free kick taken,Any free kick; direct or indirect


In [50]:
TypeID_Descriptions = pd.read_csv('TypeID_Descriptions.csv', index_col=0)
TypeID_Descriptions.head()

Unnamed: 0,Event_id,Short,Long
0,1,Pass,Any pass attempted from one player to another ...
1,2,Offside Pass,Attempted pass made to a player who is in an o...
2,3,Take On,Attempted dribble past an opponent (excluding ...
3,4,Foul,This event is shown when a foul is committed r...
4,5,Out,Shown each time the ball goes out of play for ...


# Understanding the code

In [67]:
# config
verbose = True

In [74]:
fmetadata, fdata = tracab.get_tracabdata_paths(fpath, fname, league=LEAGUE)
print(fmetadata, fdata)

/Users/jeffbalkanski/research_soccer/TrackingSample/984628_metadata.xml /Users/jeffbalkanski/research_soccer/TrackingSample/984628.dat


In [75]:
match = tracab.read_tracab_match(fmetadata)

In [76]:
match.match_attributes

{'iId': '984628',
 'dtDate': '2019-03-17 17:00:00',
 'iFrameRateFps': '25',
 'fPitchXSizeMeters': '105.00',
 'fPitchYSizeMeters': '68.00',
 'fTrackingAreaXSizeMeters': '111.00',
 'fTrackingAreaYSizeMeters': '88.00'}

In [77]:
# read in tracking data
if verbose:
    print("Reading match tracking data")
frames = []
with open(fdata, "r") as fp:
    # go through line by line and break down data in individual players and the ball
    for f in fp:             
        # each line is a single frame
        chunks = f.split(':')[:-1] # last element is carriage return \n
        if len(chunks) > 3:
            print(chunks)
            raise Exception('More than 3 chunks in line of data: %s', chunks)

        frameid = int(chunks[0])
        frame = tracab.tracab_frame(frameid)  

        # Get players
        targets = chunks[1].split(';')
        assert(targets[-1] == '')
        for target in targets[:-1]:
            target = target.split(',')
            team = int(target[0])
            if team in [1, 0, 3]:
                frame.add_frame_target(target)

        # Is this never the case?
        if len(chunks) > 2:
            frame.add_frame_ball(chunks[2].split(';')[0].split(','))
        frames.append(frame)
            

Reading match tracking data


In [92]:
# chunks of data area read
print(chunks[0], '\n')
for x in chunks[1].split(';'):
    print(x)
print('\n', chunks[2])

1705810 

-1,1,-1,-3502,502,0.00
-1,2,-1,-3149,-116,0.00
-1,3,-1,-3200,412,0.00
-1,4,-1,-3025,-448,0.00
-1,5,-1,-3091,568,0.00
-1,6,-1,-3273,537,0.00
-1,7,-1,-3160,497,0.00
-1,8,-1,-3040,395,0.00
4,9,-1,5550,4400,0.00
-1,10,-1,-2642,-375,0.00
-1,11,-1,483,-215,0.00
-1,12,-1,-2868,-415,0.00
-1,13,-1,-3230,579,0.00
-1,14,-1,-3429,781,0.00
-1,15,-1,-2904,-475,0.00
-1,16,-1,509,-499,0.00
-1,17,-1,357,-3266,0.00
4,18,-1,5550,4400,0.00
-1,19,-1,-3197,259,0.00
4,20,-1,5550,4400,0.00
-1,21,-1,-3417,-33,0.00
-1,22,-1,-4245,-70,0.00
-1,23,-1,-250,-3419,0.00
-1,24,-1,-3092,164,0.00
-1,25,-1,-2596,-289,0.00
4,26,-1,5550,4400,0.00
4,27,-1,5550,4400,0.00
4,28,-1,5550,4400,0.00
4,29,-1,5550,4400,0.00


 5260,-3393,15,148.83,A,Dead;


# Reproducing split metrics 

In [93]:
import Tracab as tracab

# # read frames, match meta data, and data for individual players
# frames_tb, match_tb, team1_players, team0_players

In [94]:
match_tb, frames_tb[:2]

(<Tracab.tracab_match at 0x1c5096cef0>,
 [Frame id: 1536511, nplayers: 22, nrefs: 0, nballs: 1,
  Frame id: 1536512, nplayers: 22, nrefs: 0, nballs: 1])

In [95]:
tracab.get_goalkeeper_numbers(frames_tb)

home goalkeeper(s):  [1]
away goalkeeper(s):  [73]


([1], [73])

In [96]:
tracab.get_players(frames_tb[:100])

({32: <Tracab.tracab_player at 0x1c7d2989e8>,
  1: <Tracab.tracab_player at 0x1c7d298160>,
  2: <Tracab.tracab_player at 0x1c7d2981d0>,
  5: <Tracab.tracab_player at 0x1c7d298710>,
  7: <Tracab.tracab_player at 0x1c7d298f98>,
  8: <Tracab.tracab_player at 0x1c7d298240>,
  9: <Tracab.tracab_player at 0x1c7d298358>,
  10: <Tracab.tracab_player at 0x1c7d298a58>,
  11: <Tracab.tracab_player at 0x1c7d298cc0>,
  17: <Tracab.tracab_player at 0x1c7d2986a0>,
  25: <Tracab.tracab_player at 0x1c7d298be0>},
 {35: <Tracab.tracab_player at 0x1c7d298780>,
  5: <Tracab.tracab_player at 0x1c7d298518>,
  7: <Tracab.tracab_player at 0x1c7d2984a8>,
  8: <Tracab.tracab_player at 0x1c7d298208>,
  73: <Tracab.tracab_player at 0x1c7d298f28>,
  11: <Tracab.tracab_player at 0x1c7d298898>,
  13: <Tracab.tracab_player at 0x1c7d2989b0>,
  16: <Tracab.tracab_player at 0x1c7d2985c0>,
  18: <Tracab.tracab_player at 0x1c7d298ef0>,
  20: <Tracab.tracab_player at 0x1c7d2984e0>,
  22: <Tracab.tracab_player at 0x1c7d298fd

In [97]:
# frames 
frame_ex = frames_tb[1]

## Minutes on the field

In [105]:
# find number of minutes on the field
player = team0_players[8]

start, end = player.frame_timestamps[0], player.frame_timestamps[-1]
print(end - start)

48.12866666666667


In [110]:
for player in team0_players.values():
    # find number of minutes on the field
#     player = team0_players[8]

    start, end = player.frame_timestamps[0], player.frame_timestamps[-1]
    print(end - start)

48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667
48.12866666666667


In [87]:
match_tb.fPitchXSizeMeters

105.0

In [88]:
team0_players[73].jersey_num

73