# 4.0.3 Sampling Around Event Boundaries

An approach to handling the variability in videos lengths is to sample clips around event boundaries and show the events there.

## Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [1]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p torch,torchvision,cv2,h5py,pandas,matplotlib,seaborn,jupyterlab,lab

Mon Feb 24 2020 11:18:49 

CPython 3.6.10
IPython 7.12.0

torch 1.2.0
torchvision 0.1.8
cv2 3.4.2
h5py 2.8.0
pandas 1.0.1
matplotlib 3.1.3
seaborn 0.10.0
jupyterlab 1.2.6
lab 0+untagged.21.g9d80dae.dirty

compiler   : GCC 7.3.0
system     : Linux
release    : 4.4.0-173-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 16
interpreter: 64bit
Git hash   : 9d80daec6156826fef6a7472507201013d8c7801
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [2]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [3]:
# Set the matplotlib mode
%matplotlib inline

## Set the GPU

Make sure we aren't greedy.

In [4]:
!nvidia-smi

Mon Feb 24 11:19:20 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  TITAN X (Pascal)    Off  | 00000000:04:00.0 Off |                  N/A |
| 23%   22C    P8     7W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:05:00.0 Off |                  N/A |
| 29%   50C    P2    58W / 250W |   2885MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:08:00.0 Off |                  N/A |
| 32%   

In [5]:
%env CUDA_VISIBLE_DEVICES=0

env: CUDA_VISIBLE_DEVICES=0


## Imports

In [6]:
from pathlib import Path

import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import pickle
import matplotlib.pyplot as plt
from tqdm import tqdm

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [38]:
# Constants to be used throughout the package
%aimport lab.index
from lab.index import DIR_DATA_INT, DIR_DATA_RAW
%aimport lab.breakfast.constants
from lab.breakfast.constants import SEED
# Import the data subdirectories
%aimport lab.breakfast.index
from lab.breakfast.index import (DIR_BREAKFAST, 
                                 DIR_I3D_FVS,
                                 DIR_BREAKFAST_DATA, 
                                 DIR_COARSE_SEG, 
                                 DIR_FINE_SEG,
                                )

## Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [8]:
sns.set()

## Getting Start Frames and Number of Segments

In [9]:
df = pd.read_csv(str(DIR_BREAKFAST / 'video_lengths.csv'),
                 index_col=0,
                 header=0)

In [10]:
df.head()

Unnamed: 0,Name,Length
0,P27_cam01_P27_scrambledegg,3561
1,P11_cam01_P11_scrambledegg,2649
2,P27_stereo01_P27_scrambledegg,3560
3,P21_cam01_P21_scrambledegg,5281
4,P46_cam02_P46_scrambledegg,3135


In [11]:
with open(str(DIR_COARSE_SEG / 'pancake/P53_webcam01_P53_pancake.txt'), 'r') as f:
    for l in f:
        print(l)

1-1 SIL  

2-302 spoon_flour  

303-501 crack_egg  

502-727 pour_milk  

728-1077 butter_pan  

1078-1844 stir_dough  

1845-1948 pour_dough2pan  

1949-3649 fry_pancake  

3650-3736 take_plate  

3737-3826 put_pancake2plate  

3827-3874 SIL  



In [12]:
with open(str(DIR_COARSE_SEG / 'pancake/P53_webcam01_P53_pancake.txt'), 'r') as f:
    starts = [line.split('-')[0] for line in f][1:]
    print(starts)

['2', '303', '502', '728', '1078', '1845', '1949', '3650', '3737', '3827']


In [13]:
n_coarse_segments, coarse_segment_starts = [], []

for name in df.Name:
    action = name.split('_')[-1]
    path = DIR_COARSE_SEG / f'{action}/{name}.txt'
    
    with open(str(path), 'r') as file:
        coarse_segment_starts.append([int(line.split('-')[0]) for line in file][1:])
        n_coarse_segments.append(len(coarse_segment_starts[-1]))

In [14]:
df['n_coarse_segments'] = n_coarse_segments
df['coarse_segment_starts'] = coarse_segment_starts

In [15]:
df.head()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts
0,P27_cam01_P27_scrambledegg,3561,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3..."
1,P11_cam01_P11_scrambledegg,2649,6,"[39, 293, 897, 1928, 2410, 2620]"
2,P27_stereo01_P27_scrambledegg,3560,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3..."
3,P21_cam01_P21_scrambledegg,5281,9,"[234, 624, 1120, 1408, 1577, 1975, 4189, 4451,..."
4,P46_cam02_P46_scrambledegg,3135,7,"[2, 272, 754, 1226, 1421, 2617, 2682]"


In [16]:
n_fine_segments, fine_segment_starts = [], []

for name in df.Name:
    action = name.split('_')[-1]
    path = DIR_FINE_SEG / f'{action}/{name}.txt'
    
    try:
        with open(str(path), 'r') as file:
            fine_segment_starts.append([int(line.split('-')[0]) for line in file][1:-1])
            n_fine_segments.append(len(fine_segment_starts[-1]))
    except FileNotFoundError:
        fine_segment_starts.append(None)
        n_fine_segments.append(None)
        
df['n_fine_segments'] = n_fine_segments
df['fine_segment_starts'] = fine_segment_starts

df.tail()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts
1024,P36_webcam02_P36_pancake,7049,8,"[2, 279, 548, 1044, 1958, 2243, 2844, 6903]",,
1025,P54_stereo01_P54_pancake,3340,11,"[2, 225, 506, 692, 1018, 1336, 1709, 1837, 206...",66.0,"[11, 27, 55, 77, 115, 132, 191, 223, 235, 251,..."
1026,P13_cam01_P13_pancake,4436,12,"[31, 611, 717, 907, 1036, 1316, 1443, 1493, 17...",111.0,"[32, 47, 93, 138, 212, 318, 333, 348, 377, 467..."
1027,P21_cam01_P21_pancake,5841,10,"[95, 415, 699, 960, 1634, 2125, 2437, 5367, 54...",,
1028,P12_webcam01_P12_pancake,2870,8,"[2, 136, 324, 446, 603, 744, 2680, 2805]",54.0,"[32, 47, 62, 76, 77, 122, 123, 152, 213, 287, ..."


In [17]:
len([n for n in n_fine_segments if n is None])

505

In [18]:
len(n_fine_segments)

1029

## Counting Actions

In [19]:
actions = []
for name in df.Name:
    actions.append(name.split('_')[-1])
df['actions'] = actions

In [20]:
df.head()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts,actions
0,P27_cam01_P27_scrambledegg,3561,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3...",71.0,"[38, 73, 83, 150, 183, 213, 259, 308, 314, 337...",scrambledegg
1,P11_cam01_P11_scrambledegg,2649,6,"[39, 293, 897, 1928, 2410, 2620]",74.0,"[23, 34, 56, 67, 87, 107, 116, 122, 128, 132, ...",scrambledegg
2,P27_stereo01_P27_scrambledegg,3560,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3...",71.0,"[38, 73, 83, 150, 183, 213, 259, 308, 314, 337...",scrambledegg
3,P21_cam01_P21_scrambledegg,5281,9,"[234, 624, 1120, 1408, 1577, 1975, 4189, 4451,...",,,scrambledegg
4,P46_cam02_P46_scrambledegg,3135,7,"[2, 272, 754, 1226, 1421, 2617, 2682]",,,scrambledegg


In [21]:
np.unique(df.actions, return_counts=True)

(array(['coffee', 'friedegg', 'juice', 'milk', 'pancake', 'sandwich',
        'scrambledegg'], dtype=object),
 array([167, 173, 162, 187, 157, 169,  14]))

In [22]:
df.tail()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts,actions
1024,P36_webcam02_P36_pancake,7049,8,"[2, 279, 548, 1044, 1958, 2243, 2844, 6903]",,,pancake
1025,P54_stereo01_P54_pancake,3340,11,"[2, 225, 506, 692, 1018, 1336, 1709, 1837, 206...",66.0,"[11, 27, 55, 77, 115, 132, 191, 223, 235, 251,...",pancake
1026,P13_cam01_P13_pancake,4436,12,"[31, 611, 717, 907, 1036, 1316, 1443, 1493, 17...",111.0,"[32, 47, 93, 138, 212, 318, 333, 348, 377, 467...",pancake
1027,P21_cam01_P21_pancake,5841,10,"[95, 415, 699, 960, 1634, 2125, 2437, 5367, 54...",,,pancake
1028,P12_webcam01_P12_pancake,2870,8,"[2, 136, 324, 446, 603, 744, 2680, 2805]",54.0,"[32, 47, 62, 76, 77, 122, 123, 152, 213, 287, ...",pancake


## Selecting N-Frame Clips from Coarse Segment Starts

In [23]:
# Select this number of frames
n_frames = 64

In [24]:
np.unique(n_coarse_segments, return_counts=True)

(array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 15, 17]),
 array([ 41,  71, 225, 243, 151,  88,  48,  33,  50,  50,  17,   6,   2,
          4]))

In [25]:
starting_segments = [coarse_segment_starts[i][0]
                     for i, n in enumerate(n_coarse_segments)
                     if n < 3]

In [26]:
np.unique(starting_segments, return_counts=True)

(array([ 2, 25, 38, 47, 60, 72]), array([20,  5,  5,  3,  4,  4]))

Should ignore indices that start with 2

### Making the List of Indices

Each element of the list corresponds to a row in the dataframe.

In [27]:
np.random.seed(SEED)

all_clip_ranges = []
event_clip_idcs = []
start_clip_idcs = []

for n, starts, length in zip(n_coarse_segments, coarse_segment_starts, df.Length):
    # Ignore last segment which is start of SIL
    idcs_to_use = starts[:-1]
    # 2 Shows up disproportionally more than others. Ignore it
    if 2 in idcs_to_use:
        idcs_to_use = idcs_to_use[1:]
    
    # Move on if there are no events left to use
    if not idcs_to_use:
        all_clip_ranges.append([])
        event_clip_idcs.append(np.array([]))
        start_clip_idcs.append([])
        continue 
        
    # Randomly select len(idcs_to_use) numbers between 0 and 63
    selected_clip_idx = np.random.choice(range(n_frames), len(idcs_to_use), replace=True)
    
    # Earliest starting frame can be 0
    clip_starts = np.maximum(np.array(idcs_to_use) - selected_clip_idx, 0)
    # Latest starting frame can be length - n_frames
    clip_starts = np.minimum(clip_starts, length-n_frames)
    
    # List of the full 64 frame clip indices
    all_clip_ranges.append([np.arange(start, start+n_frames) for start in clip_starts])
    # Keep track of the idx that had the event
    event_clip_idcs.append(selected_clip_idx)
    # And the starting idx of the clip
    start_clip_idcs.append(clip_starts)

print(len(all_clip_ranges))
print(len(event_clip_idcs))
print(len(start_clip_idcs))

1029
1029
1029


In [28]:
event_clip_idcs[:10]

[array([46, 48, 16, 17, 23, 19, 33, 37]),
 array([17, 11, 48, 34, 41]),
 array([27,  2, 26, 11, 38, 58,  2, 62]),
 array([53, 49, 49, 22, 55, 22, 58, 33]),
 array([14, 16, 31, 49, 45]),
 array([12,  7, 37, 19]),
 array([37, 61, 24,  5,  4, 41, 54, 20]),
 array([34, 31, 48, 14, 31,  8, 25, 11, 52]),
 array([ 2, 59, 53,  4, 45, 12]),
 array([45,  7, 46, 13])]

### Saving the Interim Data

In [29]:
path_interim_data = DIR_DATA_INT / 'breakfast/video_clips/event_clips'
if not path_interim_data.exists():
    path_interim_data.mkdir(parents=True)
    
with open(str(path_interim_data / f'event_clip_indices_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(event_clip_idcs, filehandle)

In [30]:
with open(str(path_interim_data / f'event_clip_ranges_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(all_clip_ranges, filehandle)

In [31]:
with open(str(path_interim_data / f'event_clip_starts_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(start_clip_idcs, filehandle)

## Creating the 64 Dim FV Clips

In [63]:
path_clips = path_interim_data / '64dim_fv_clips'
if not path_clips.exists():
    path_clips.mkdir(parents=True)

In [33]:
for name, action, clip_ranges, starts, events, length in zip(df.Name, 
                                                             df.actions,
                                                             all_clip_ranges, 
                                                             start_clip_idcs, 
                                                             event_clip_idcs,
                                                             df.Length):
    path_file = DIR_BREAKFAST_DATA / f'{action}/{name}.txt'
    assert path_file.exists(), path_file
    
    file_array = pd.read_table(str(path_file), index_col=0, header=0).to_numpy()
    assert file_array.shape[0] == length
    
    for clip_range, start, event in zip(clip_ranges, starts, events):
        path_out = path_clips / f'{name}_start_{start}_event_{event}_seed_{SEED}.npy'
        np.save(str(path_out), file_array[clip_range, :])

## Creating I3D Clips

In [36]:
path_i3d_clips = path_interim_data / 'i3d_fv_clips'
if not path_i3d_clips.exists():
    path_i3d_clips.mkdir(parents=True)

In [54]:
for name, action, clip_ranges, starts, events, length in zip(df.Name, 
                                                             df.actions,
                                                             all_clip_ranges, 
                                                             start_clip_idcs, 
                                                             event_clip_idcs,
                                                             df.Length):
    path_file = DIR_I3D_FVS / f'{name}.npy'
    if not path_file.exists():
        continue
    
    file_array = np.load(str(path_file))
    if not file_array.shape[0] == length:
        continue
    
    for clip_range, start, event in zip(clip_ranges, starts, events):
        path_out = path_i3d_clips / f'{name}_start_{start}_event_{event}_seed_{SEED}.npy'
        if not path_out.exists():
            np.save(str(path_out), file_array[clip_range, :])

## Creating One 64FV Dataset

In [64]:
all_clips_data = [[path, np.load(str(path))] for path in path_clips.iterdir()]

In [65]:
path_all_clips, all_data = zip(*all_clips_data)

In [66]:
path_all_clips[:5]

(PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/P54_webcam02_P54_milk_start_174_event_42_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/P12_cam01_P12_sandwich_start_674_event_41_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/P38_webcam02_P38_friedegg_start_532_event_57_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/P53_webcam02_P53_friedegg_start_960_event_54_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/P49_cam01_P49_pancake_start_5180_event_61_seed_117.npy'))

In [67]:
path_raw_64fv_data = DIR_DATA_RAW / f'breakfast/{n_frames}_frame_clips/64dim_fv'
if not path_raw_64fv_data.exists():
    path_raw_64fv_data.mkdir(parents=True)

In [68]:
with open(str(path_raw_64fv_data / f'path_event_data_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(path_all_clips, filehandle)

In [69]:
all_data_array = np.array(all_data)
all_data_array.shape

(4794, 64, 64)

In [70]:
np.save(str(path_raw_64fv_data / f'event_clips_seed_{SEED}.npy'), all_data_array)

## Creating One I3D Dataset

In [71]:
idx = 259
df.Length.iloc[idx]

1186

In [72]:
df.coarse_segment_starts.iloc[idx]

[155, 340, 550, 2250, 2520, 2555]

In [74]:
start_clip_idcs[idx]

array([ 107,  279,  504, 1122, 1122])

In [239]:
all_clips_data = [[path, np.load(str(path))] for path in path_i3d_clips.iterdir()]

In [252]:
path_all_clips, all_data = zip(*all_clips_data)

In [251]:
path_all_clips[:5]

(PosixPath('/media/data_cifs/apra/work/labwork/data/interim/breakfast/event_clips/64dim_fv_clips/P54_webcam02_P54_milk_start_174_event_42_seed_117.npy'),
 PosixPath('/media/data_cifs/apra/work/labwork/data/interim/breakfast/event_clips/64dim_fv_clips/P12_cam01_P12_sandwich_start_674_event_41_seed_117.npy'),
 PosixPath('/media/data_cifs/apra/work/labwork/data/interim/breakfast/event_clips/64dim_fv_clips/P38_webcam02_P38_friedegg_start_532_event_57_seed_117.npy'),
 PosixPath('/media/data_cifs/apra/work/labwork/data/interim/breakfast/event_clips/64dim_fv_clips/P53_webcam02_P53_friedegg_start_960_event_54_seed_117.npy'),
 PosixPath('/media/data_cifs/apra/work/labwork/data/interim/breakfast/event_clips/64dim_fv_clips/P49_cam01_P49_pancake_start_5180_event_61_seed_117.npy'))

In [257]:
path_raw_i3dfv_data = DIR_DATA_RAW / f'breakfast/{n_frames}_frame_clips/i3d_fv'
if not path_raw_i3dfv_data.exists():
    path_raw_i3dfv_data.mkdir(parents=True)

In [258]:
with open(str(path_raw_i3dfv_data / f'path_event_data_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(path_all_clips, filehandle)

In [248]:
all_data_array = np.array(all_data)
all_data_array.shape

(4794, 64, 64)

In [261]:
np.save(str(path_raw_i3dfv_data / f'event_clips_seed_{SEED}.npy'), all_data_array)