# Explore kickoff frames to inform replay segmentation strategy

(need to add some words here about why replay segmentation is useful/necessary)

In [11]:
# Imports
from impulse import ReplayDataset
from pathlib import Path
import matplotlib.pyplot as plt
import pandas as pd

In [5]:
project_root = Path('/Users/david/dev/impulse')
data_dir = project_root / 'replays' / 'parsed'
db_path = project_root / 'impulse.db'

replays = ReplayDataset(db_path=str(db_path), data_dir=str(data_dir))

First, let's look at the first 20 frames of every replay in our dataset. 

In [13]:
frames_list = []
for replay in replays:
    frames_list.append(replay.frames.iloc[:20][['Ball - position x', 'Ball - position y', 'Ball - position z', 'Ball - linear velocity x', 'Ball - linear velocity y', 'Ball - linear velocity z']])

In [14]:
all_frames = pd.concat(frames_list)
all_frames 

Unnamed: 0,Ball - position x,Ball - position y,Ball - position z,Ball - linear velocity x,Ball - linear velocity y,Ball - linear velocity z
0,-65.029999,-51.560001,104.410004,-1362.949951,-105.309998,113.160004
1,-201.100006,-62.099998,112.190002,-1358.750000,-104.949997,47.849998
2,-314.170013,-70.800003,113.690002,-1355.250000,-104.650002,-6.390000
3,-471.980011,-82.980003,108.230003,-1350.349976,-104.230003,-82.019997
4,-584.349976,-91.680000,98.930000,-1346.849976,-103.930000,-135.880005
...,...,...,...,...,...,...
15,3048.169922,1532.510010,836.900024,2602.610107,365.420013,283.100006
16,3308.000000,1568.989990,861.640015,2594.689941,364.220001,217.279999
17,3567.040039,1605.349976,879.809998,2586.770020,363.019989,151.669998
18,3782.300049,1635.550049,889.950012,2580.169922,362.019989,97.139999


It appears that either the parser (subtr-actor) itself or the sub-sampling of the raw frames is cutting off the very first initial kickoff setup state at the start of most games. Only one game above displays the pattern of having `0.0` in the `Ball - position x`, `Ball - position y`, and ball linear velocity columns in its first 20 frames that indicate an initial kickoff setup state, while the rest have non-zero values in these columns in the first 20 rows.

This should be fine for segmenting replays into individual generations of play, however, as long as kickoff reset events are reliably detectable mid-game. If so, then we can just segment from the start of the array to the first kickoff reset, then from the first kickoff reset to the second kickoff reset, ..., then finally from the last kickoff reset to the end of the array. 

Let's check a handful of replays to verify that they follow this pattern.

To be extra careful, let's make sure to check games played on different maps to be sure that origin coordinates and kickoff configurations do not differ from map to map. 

We start by creating a dictionary mapping each `replay_id` to the `map` name that the corresponding game was played on:

In [16]:
replay_id_to_map = {replay.replay_id: replay.map for replay in replays}
replay_id_to_map

{'0328fc07-13fb-4cb6-9a86-7d608196ddbd': 'NeoTokyo_Standard_P',
 '960a9f35-821d-4c43-8c94-2d52697ecdeb': 'Underwater_GRS_P',
 '3b823061-0ae0-486a-933a-dfe229c17c3d': 'UtopiaStadium_Dusk_P',
 '960b8433-fafd-4e0d-9370-571190e44dae': 'Stadium_P',
 '1e62a0d0-6c47-4b12-a784-b60d9b883632': 'CHN_Stadium_P',
 'eacc54f9-06cf-49c0-a9b1-238aed98eca3': 'EuroStadium_Night_P',
 'f597731e-4fd0-445d-a00c-03f867508bc3': 'cs_p',
 'd1b3c0c8-466e-4855-89b5-793b3a127ca5': 'NeoTokyo_Standard_P',
 '583e67ad-7b6f-4163-9cf1-81b0d21accf8': 'Underwater_GRS_P',
 '3037096d-17ff-41a5-83d9-ce0cdd271740': 'UtopiaStadium_Dusk_P',
 'b3c8af60-0e68-4ab8-af8e-0b72b60b4e68': 'Stadium_P',
 '4f08f62b-69cc-4452-b26f-2fe437f50ab7': 'CHN_Stadium_P',
 '14ddbdfb-2d50-4da7-bfa2-9e26004b51fd': 'EuroStadium_Night_P',
 '7582bdce-4d45-43c7-9370-da766b1bad3a': 'cs_p',
 '7653defe-924e-45be-a581-39f15f498301': 'NeoTokyo_Standard_P',
 '373599ae-9f3d-4432-adbf-7592a04edcd8': 'Underwater_GRS_P',
 'e23bd48e-260b-4d4c-975c-0f624ec19ac3': 'Uto

For our purposes, we want a dictionary that connects each unique `map` name to all the `replay_id`s played on that map, so that for each different map we can randomly select a replay played on that map. 

In [17]:
map_set = set(replay_id_to_map.values())
map_set

{'CHN_Stadium_P',
 'EuroStadium_Night_P',
 'NeoTokyo_Standard_P',
 'Stadium_P',
 'Underwater_GRS_P',
 'UtopiaStadium_Dusk_P',
 'cs_p'}

In [18]:
len(map_set)

7

The fact that we have seven different options for maps makes sense given the fact that our replays are RLCS games played in best-of-seven series (at most), and the order of maps in a BO7 is the same from series to series. 

In [19]:
map_to_replay_ids = {map_name: [replay_id for replay_id, map in replay_id_to_map.items() if map == map_name] for map_name in map_set}
map_to_replay_ids

{'Underwater_GRS_P': ['960a9f35-821d-4c43-8c94-2d52697ecdeb',
  '583e67ad-7b6f-4163-9cf1-81b0d21accf8',
  '373599ae-9f3d-4432-adbf-7592a04edcd8',
  '7a74acad-5a28-40c3-9fcb-b01a3bd12e4a',
  'b911b75f-7cd3-4625-b545-687153786631',
  '32beb85e-0d3b-4b34-9010-72d2ff3321ba',
  'e1576578-ccb6-4848-ac41-a4e55313d836',
  'c3ddc3e1-838b-4da3-bf8c-6cc2f49f355e',
  'f912a587-1bb9-46a8-b0bc-23d248e4fe09'],
 'cs_p': ['f597731e-4fd0-445d-a00c-03f867508bc3',
  '7582bdce-4d45-43c7-9370-da766b1bad3a',
  '44a1d0bf-c067-4132-aebf-c9726ce2809e',
  'bfa8bcd8-5be8-4960-b8fb-6367cb38bbd9',
  'cb2c9df7-ca13-4e2c-979e-e61b9c6f4b96',
  '5628ffa1-d532-4c6c-aec0-d7a7c9ebe3d4',
  '67403856-e654-4bfd-8751-8a8ae9740d34',
  '6bd0c1f6-6984-4a57-88ea-ab7cb49b6af4',
  '76e610ac-47f4-428f-ab49-bc4ed8d13a70',
  '0733c3c6-08b0-47ba-8e5f-a27ab5517441'],
 'NeoTokyo_Standard_P': ['0328fc07-13fb-4cb6-9a86-7d608196ddbd',
  'd1b3c0c8-466e-4855-89b5-793b3a127ca5',
  '7653defe-924e-45be-a581-39f15f498301',
  'e287ba49-b1d9-44ac-9

Let's verify we covered all the replays in our dataset:

In [21]:
count = 0
for key in map_to_replay_ids:
    count+= len(map_to_replay_ids[key])
count

192

In [22]:
len(replays)

192

These numbers match, so we're good. 

Now, for each unique map name, we can randomly select a replay played on that map. 

In [27]:
import random

random.seed(123)

sample_replay_list = []
for map_name in map_to_replay_ids:
    sample_replay_list.append(random.choice(map_to_replay_ids[map_name]))

sample_replay_list

['960a9f35-821d-4c43-8c94-2d52697ecdeb',
 'cb2c9df7-ca13-4e2c-979e-e61b9c6f4b96',
 'd1b3c0c8-466e-4855-89b5-793b3a127ca5',
 '186752c5-2adf-4a89-a9c4-333e967d7df9',
 '73e0d71a-4325-4461-92a8-487445b75cc4',
 'd06ce16e-a69a-4410-bfb8-554aa2a9f6da',
 '0c5ee496-bbc0-44b0-9d6b-1fae67fa12fe']

Perfect, now we have a list of replays to inspect that covers all possible maps. 

Let's look at the first replay in this list. 

In [28]:
replay0_id = sample_replay_list[0]
replay0 = replays.load_replay(replay0_id)
replay0.frames.shape

(3187, 161)

In [29]:
replay0.goals

[{'PlayerName': 'M0nkey M00n', 'PlayerTeam': 0, 'frame': 2983},
 {'PlayerName': 'ExoTiiK', 'PlayerTeam': 0, 'frame': 3889},
 {'PlayerName': 'ExoTiiK', 'PlayerTeam': 0, 'frame': 5332},
 {'PlayerName': 'Atomic', 'PlayerTeam': 1, 'frame': 5866},
 {'PlayerName': 'M0nkey M00n', 'PlayerTeam': 0, 'frame': 6748},
 {'PlayerName': 'dralii', 'PlayerTeam': 0, 'frame': 8176},
 {'PlayerName': 'BeastMode', 'PlayerTeam': 1, 'frame': 9110}]

There are a certain number of goals in this particular replay, so we should see enough distinct regions in the `replay0.frames` array corresponding to these goals (depending on whether there's overtime or not). Note that the frame number given in the above dictionary does not correspond to the same frame numbers that appear in the `replay0.frames` array, since those numbers are in terms of the *original* frame indices from the *raw* replay which we sub-sampled during parsing. 

In [60]:
# all columns indicating whether ball is stationary at the origin
pos_cols = ['Ball - position x', 'Ball - position y']
vel_cols = ['Ball - linear velocity x', 'Ball - linear velocity y']
z_cols = ['Ball - position z', 'Ball - linear velocity z']
x_y_cols = pos_cols + vel_cols
relevant_cols = x_y_cols + z_cols

df_subset = replay0.frames[relevant_cols]
df_subset.shape

(3187, 6)

In [61]:
df_boolean = df_subset[x_y_cols] == 0.0
df_boolean.head()

Unnamed: 0,Ball - position x,Ball - position y,Ball - linear velocity x,Ball - linear velocity y
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False


In [62]:
df_true = df_boolean.all(axis=1)
kickoff_setup_frames = df_subset.loc[df_subset.index[df_true]]
kickoff_setup_frames

Unnamed: 0,Ball - position x,Ball - position y,Ball - linear velocity x,Ball - linear velocity y,Ball - position z,Ball - linear velocity z
936,0.0,0.0,0.0,0.0,99.529999,-32.459999
937,0.0,0.0,0.0,0.0,92.750000,-97.209999
938,0.0,0.0,0.0,0.0,92.750000,0.000000
939,0.0,0.0,0.0,0.0,92.750000,0.000000
940,0.0,0.0,0.0,0.0,92.750000,0.000000
...,...,...,...,...,...,...
2587,0.0,0.0,0.0,0.0,92.750000,0.000000
2588,0.0,0.0,0.0,0.0,92.750000,0.000000
2589,0.0,0.0,0.0,0.0,92.750000,0.000000
2590,0.0,0.0,0.0,0.0,92.750000,0.000000


At this point, `kickoff_setup_frames` is a `DataFrame` containing the rows where the ball's x- and y-coordinates are both 0.0 *and* its linear velocity in x and y is also 0.0. This physical condition is what we can use to determine whether the game is in a kickoff setup state or not, with the ball sitting motionless at the center of the field and the players sitting still in some configuration waiting for the kickoff. 

One might notice that `Ball - position z` is never equal to `0.0`, but is typically `92.75`(for these frames) and occasionally is a little higher; moreover, for the rows where `Ball - position z > 92.75`, we have that `Ball - linear velocity z` is nonzero. The z-coordinate of `92.75` is explained by the radius of the ball and the fact that its coordinates are the coordinates for its center. The nonzero values of the z-component of linear velocity and the values of the ball's z-coordinates greater than its radius are explained by the fact when the game resets to the kickoff setup configuration, it places the ball (and each of the players' cars) ever so slightly above the ground so that they have a subtle drop-in effect. 

Thus, we can't filter out rows where the z-coordinate of either the position or linear velocity of the ball are equal to zero. This observation motivates the kickoff-setup filtering conditions as being those rows where the ball's x- and y-coordinates in both position *and* linear velocity are zero. 

The last thing to do is to count the number of continuous frame ranges satisfying these conditions and correspond this count with the number of goals scored and the presence of overtime in the replay. 

In [67]:
cts_frame_ranges = []
length = len(kickoff_setup_frames)
i = 0
for index in kickoff_setup_frames.index:
    if i == 0:
        start = index
    elif index != kickoff_setup_frames.index[i-1] + 1:
        cts_frame_ranges.append((start, int(kickoff_setup_frames.index[i-1])))
        start = index
    i+=1
cts_frame_ranges.append((start, int(kickoff_setup_frames.index[i-1])))
cts_frame_ranges

[(936, 957),
 (1169, 1189),
 (1583, 1604),
 (1695, 1715),
 (1921, 1941),
 (2329, 2349),
 (2571, 2591)]

In [68]:
len(cts_frame_ranges)

7

Finally, we need to repeat this for each of the replays in `sample_replay_list`. For simplicity, we should wrap the above checks in a reusable function that we can easily apply to any given replay. 