# 4.0.7 Sampling at Event Boundaries Fixed

A notebook that aims to credo 4.0.3 but with the correct data assembled in 4.0.6, and do so for both 64 Dim and I3D.

## Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [5]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p torch,torchvision,cv2,h5py,pandas,matplotlib,seaborn,jupyterlab,lab

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Wed Feb 26 2020 11:06:32 

CPython 3.6.10
IPython 7.12.0

torch 1.2.0
torchvision 0.1.8
cv2 3.4.2
h5py 2.8.0
pandas 1.0.1
matplotlib 3.1.3
seaborn 0.10.0
jupyterlab 1.2.6
lab 0+untagged.27.g6e0399a.dirty

compiler   : GCC 7.3.0
system     : Linux
release    : 4.15.0-76-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 16
interpreter: 64bit
Git hash   : 6e0399a8926bae5f457e02710c59967fed2d20dc
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [6]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [7]:
# Set the matplotlib mode
%matplotlib inline

## Imports

In [8]:
from pathlib import Path

import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import pickle
import matplotlib.pyplot as plt
from tqdm import tqdm

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [12]:
# Constants to be used throughout the package
%aimport lab.index
from lab.index import DIR_DATA_INT, DIR_DATA_RAW
%aimport lab.breakfast.constants
from lab.breakfast.constants import SEED
# Import the data subdirectories
%aimport lab.breakfast.index
from lab.breakfast.index import (DIR_BREAKFAST, 
                                 DIR_RAW_BREAKFAST,
                                 DIR_BREAKFAST_DATA, 
                                 DIR_COARSE_SEG, 
                                 DIR_FINE_SEG,
                                )

## Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [10]:
sns.set()

In [13]:
master_df = pd.read_csv(str(DIR_RAW_BREAKFAST / 'master_meta.csv'))
master_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse,path_fine,path_64dim_fv,path_i3d
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...,...,...,...,...,...,...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1985,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,0.0,5_juice_stereo_0,,,,
1986,/media/data_cifs2/apra/work/labwork/data/exter...,1084,5,milk,stereo,0.0,5_milk_stereo_0,,,,
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...


## Getting Start Frames and Number of Segments

In [15]:
with open(str(master_df.path_coarse.iloc[0]), 'r') as f:
    for l in f:
        print(l)

1-31 SIL  

32-1366 cut_fruit  

1367-1421 take_bowl  

1422-1551 put_fruit2bowl  

1552-1671 cut_fruit  

1672-2071 peel_fruit  

2072-2401 cut_fruit  

2402-2511 put_fruit2bowl  

2512-2669 SIL  



In [16]:
with open(str(master_df.path_coarse.iloc[0]), 'r') as f:
    starts = [line.split('-')[0] for line in f][1:]
    print(starts)

['32', '1367', '1422', '1552', '1672', '2072', '2402', '2512']


In [26]:
coarse_sub_df = master_df[master_df.path_coarse.notnull()]
coarse_sub_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse,path_fine,path_64dim_fv,path_i3d
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...,...,...,...,...,...,...
1982,/media/data_cifs2/apra/work/labwork/data/exter...,1186,5,cereals,cam01,,5_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1983,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,cam01,,5_juice_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...


In [23]:
ids, n_coarse_segments, coarse_segment_starts = [], [], []

for id, path in coarse_sub_df[['id', 'path_coarse']].values:
    with open(str(path), 'r') as file:
        ids.append(id)
        coarse_segment_starts.append([int(line.split('-')[0]) for line in file][1:])
        n_coarse_segments.append(len(coarse_segment_starts[-1]))

In [24]:
len(ids)

1677

In [25]:
ids[:5]

['39_salat_cam01',
 '39_tea_cam01',
 '39_pancake_cam01',
 '39_cereals_cam01',
 '39_friedegg_cam01']

## Convenience DFs

In [66]:
coarse_df = pd.DataFrame({'id' : ids})
coarse_df['n_coarse_segments'] = n_coarse_segments
coarse_df['coarse_segment_starts'] = coarse_segment_starts
coarse_df = coarse_df.merge(master_df[['id', 'length']], how='left', on='id')
coarse_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890
...,...,...,...,...
1672,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186
1673,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795
1674,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123
1675,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795


In [67]:
coarse_64d_fv_df = coarse_df.merge(master_df[['id', 'path_64dim_fv']].dropna(), 
                                   how='inner', on='id')
coarse_64d_fv_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_64dim_fv
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...
1469,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...
1470,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...
1471,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...
1472,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...


In [68]:
coarse_i3d_df = coarse_df.merge(master_df[['id', 'path_i3d']].dropna(), 
                                how='inner', on='id')
coarse_i3d_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_i3d
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...
1571,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...
1572,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...
1573,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...
1574,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...


## Finding Good First Frames

In [69]:
# Select this number of frames
n_frames = 64

In [70]:
np.unique(n_coarse_segments, return_counts=True)

(array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 17, 18, 24]),
 array([ 40, 214, 397, 298, 191, 116, 110,  86, 105,  62,  27,  12,   1,
          2,  10,   4,   2]))

In [38]:
starting_segments = [coarse_segment_starts[i][0]
                     for i, n in enumerate(n_coarse_segments)
                     if n < 3]

In [39]:
np.unique(starting_segments, return_counts=True)

(array([ 2, 25, 38, 47, 60, 72]), array([19,  5,  5,  3,  4,  4]))

Should ignore indices that start with 2

## Making the List of Event Indices

### 64 FV Events

In [86]:
np.random.seed(SEED)

all_clip_ranges = []
event_clip_idcs = []
start_clip_idcs = []

for n, starts, length in coarse_64d_fv_df[
    ['n_coarse_segments', 'coarse_segment_starts', 'length']].values:
    # Ignore last segment which is start of SIL
    idcs_to_use = starts[:-1]
    # 2 Shows up disproportionally more than others. Ignore it
    if 2 in idcs_to_use:
        idcs_to_use = idcs_to_use[1:]
    
    # Move on if there are no events left to use
    if not idcs_to_use:
        all_clip_ranges.append([])
        event_clip_idcs.append(np.array([]))
        start_clip_idcs.append([])
        continue 
        
    # Randomly select len(idcs_to_use) numbers between 0 and 63
    selected_clip_idx = np.random.choice(range(n_frames), len(idcs_to_use), replace=True)
    
    # Earliest starting frame can be 0
    clip_starts = np.maximum(np.array(idcs_to_use) - selected_clip_idx, 0)
    # Latest starting frame can be length - n_frames
    clip_starts = np.minimum(clip_starts, length-n_frames-4)
    
    # List of the full 64 frame clip indices
    all_clip_ranges.append([np.arange(start, start+n_frames) for start in clip_starts])
    # Keep track of the idx that had the event
    event_clip_idcs.append(selected_clip_idx)
    # And the starting idx of the clip
    start_clip_idcs.append(clip_starts)

print(len(all_clip_ranges))
print(len(event_clip_idcs))
print(len(start_clip_idcs))

1474
1474
1474


In [87]:
event_clip_idcs[:10]

[array([46, 48, 16, 17, 23, 19, 33]),
 array([37, 17, 11]),
 array([48, 34, 41, 27,  2, 26, 11, 38]),
 array([58,  2, 62]),
 array([53, 49, 49, 22]),
 array([55, 22, 58, 33]),
 array([14, 16, 31, 49, 45, 12]),
 array([ 7, 37, 19, 37]),
 array([61, 24,  5,  4, 41, 54, 20]),
 array([34, 31, 48, 14, 31,  8, 25, 11])]

#### Saving the Interim Data

In [88]:
path_interim_data = DIR_DATA_INT / 'breakfast/video_clips/event_clips/'
if not path_interim_data.exists():
    path_interim_data.mkdir(parents=True)
    
path_interim_64fv = path_interim_data / '64dim_fv_clips'
if not path_interim_64fv.exists():
    path_interim_64fv.mkdir(parents=True)

In [89]:
with open(str(path_interim_64fv / f'event_clip_indices_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(event_clip_idcs, filehandle)

In [90]:
with open(str(path_interim_64fv / f'event_clip_ranges_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(all_clip_ranges, filehandle)

In [91]:
with open(str(path_interim_64fv / f'event_clip_starts_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(start_clip_idcs, filehandle)

#### Creating the 64 Dim FV Clips

In [92]:
coarse_64d_fv_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_64dim_fv
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...
1469,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...
1470,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...
1471,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...
1472,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...


In [93]:
coarse_df_64d_fv['all_clip_ranges'] = all_clip_ranges
coarse_df_64d_fv['start_clip_idcs'] = start_clip_idcs
coarse_df_64d_fv['event_clip_idcs'] = event_clip_idcs
coarse_df_64d_fv

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_64dim_fv,all_clip_ranges,start_clip_idcs,event_clip_idcs
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 1319, 1406, 1535, 1649, 2053, 2369]","[46, 48, 16, 17, 23, 19, 33]"
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...,"[[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...","[15, 115, 791]","[37, 17, 11]"
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...,"[[578, 579, 580, 581, 582, 583, 584, 585, 586,...","[578, 862, 1186, 3185, 3701, 4098, 8695, 8778]","[48, 34, 41, 27, 2, 26, 11, 38]"
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 68, 410]","[58, 2, 62]"
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 451, 805, 2783]","[53, 49, 49, 22]"
...,...,...,...,...,...,...,...,...
1469,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...,"[[14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, ...","[14, 243, 597, 1065]","[0, 4, 28, 60]"
1470,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 158, 310, 371, 1372, 1572, 2439]","[61, 5, 33, 2, 61, 1, 24]"
1471,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...,"[[223, 224, 225, 226, 227, 228, 229, 230, 231,...","[223, 341, 613]","[20, 38, 15]"
1472,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 160, 319, 313, 1432, 1572, 2459]","[38, 3, 24, 60, 1, 1, 4]"


In [99]:
path_clips = path_interim_64fv / 'clips'
if not path_clips.exists():
    path_clips.mkdir(parents=True)

In [102]:
for id, path, clip_ranges, starts, events, length in coarse_df_64d_fv[[
    'id',
    'path_64dim_fv',
    'all_clip_ranges', 
    'start_clip_idcs', 
    'event_clip_idcs',
    'length',
]].values:
    file_array = pd.read_table(str(path), index_col=0, header=0).to_numpy()
    for clip_range, start, event in zip(clip_ranges, starts, events):
        path_out = path_clips / f'{id}_start_{start}_event_{event}_seed_{SEED}.npy'
        np.save(str(path_out), file_array[clip_range, :])

### I3D Events

In [103]:
np.random.seed(SEED)

all_clip_ranges = []
event_clip_idcs = []
start_clip_idcs = []

for n, starts, length in coarse_i3d_df[
    ['n_coarse_segments', 'coarse_segment_starts', 'length']].values:
    # Ignore last segment which is start of SIL
    idcs_to_use = starts[:-1]
    # 2 Shows up disproportionally more than others. Ignore it
    if 2 in idcs_to_use:
        idcs_to_use = idcs_to_use[1:]
    
    # Move on if there are no events left to use
    if not idcs_to_use:
        all_clip_ranges.append([])
        event_clip_idcs.append(np.array([]))
        start_clip_idcs.append([])
        continue 
        
    # Randomly select len(idcs_to_use) numbers between 0 and 63
    selected_clip_idx = np.random.choice(range(n_frames), len(idcs_to_use), replace=True)
    
    # Earliest starting frame can be 0
    clip_starts = np.maximum(np.array(idcs_to_use) - selected_clip_idx, 0)
    # Latest starting frame can be length - n_frames
    clip_starts = np.minimum(clip_starts, length-n_frames-4)
    
    # List of the full 64 frame clip indices
    all_clip_ranges.append([np.arange(start, start+n_frames) for start in clip_starts])
    # Keep track of the idx that had the event
    event_clip_idcs.append(selected_clip_idx)
    # And the starting idx of the clip
    start_clip_idcs.append(clip_starts)

print(len(all_clip_ranges))
print(len(event_clip_idcs))
print(len(start_clip_idcs))

1576
1576
1576


In [104]:
event_clip_idcs[:10]

[array([46, 48, 16, 17, 23, 19, 33]),
 array([37, 17, 11]),
 array([48, 34, 41, 27,  2, 26, 11, 38]),
 array([58,  2, 62]),
 array([53, 49, 49, 22]),
 array([55, 22, 58, 33]),
 array([14, 16, 31, 49, 45, 12]),
 array([ 7, 37, 19, 37]),
 array([61, 24,  5,  4, 41, 54, 20]),
 array([34, 31, 48, 14, 31,  8, 25, 11])]

### Saving the Interim Data

In [105]:
path_interim_data = DIR_DATA_INT / 'breakfast/video_clips/event_clips/'
if not path_interim_data.exists():
    path_interim_data.mkdir(parents=True)
    
path_interim_i3d = path_interim_data / 'i3d_clips'
if not path_interim_i3d.exists():
    path_interim_i3d.mkdir(parents=True)

In [106]:
with open(str(path_interim_i3d / f'event_clip_indices_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(event_clip_idcs, filehandle)

In [107]:
with open(str(path_interim_i3d / f'event_clip_ranges_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(all_clip_ranges, filehandle)

In [108]:
with open(str(path_interim_i3d / f'event_clip_starts_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(start_clip_idcs, filehandle)

### Creating the I3D Clips

In [109]:
coarse_i3d_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_i3d
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...
1571,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...
1572,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...
1573,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...
1574,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...


In [110]:
coarse_i3d_df['all_clip_ranges'] = all_clip_ranges
coarse_i3d_df['start_clip_idcs'] = start_clip_idcs
coarse_i3d_df['event_clip_idcs'] = event_clip_idcs
coarse_i3d_df

Unnamed: 0,id,n_coarse_segments,coarse_segment_starts,length,path_i3d,all_clip_ranges,start_clip_idcs,event_clip_idcs
0,39_salat_cam01,8,"[32, 1367, 1422, 1552, 1672, 2072, 2402, 2512]",2669,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 1319, 1406, 1535, 1649, 2053, 2369]","[46, 48, 16, 17, 23, 19, 33]"
1,39_tea_cam01,4,"[52, 132, 802, 1212]",1304,/media/data_cifs2/apra/work/labwork/data/exter...,"[[15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...","[15, 115, 791]","[37, 17, 11]"
2,39_pancake_cam01,10,"[2, 626, 896, 1227, 3212, 3703, 4124, 8706, 88...",8990,/media/data_cifs2/apra/work/labwork/data/exter...,"[[578, 579, 580, 581, 582, 583, 584, 585, 586,...","[578, 862, 1186, 3185, 3701, 4098, 8695, 8778]","[48, 34, 41, 27, 2, 26, 11, 38]"
3,39_cereals_cam01,4,"[14, 70, 472, 646]",699,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 68, 410]","[58, 2, 62]"
4,39_friedegg_cam01,5,"[15, 500, 854, 2805, 2881]",2890,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 451, 805, 2783]","[53, 49, 49, 22]"
...,...,...,...,...,...,...,...,...
1571,5_cereals_cam01,5,"[14, 247, 625, 1125, 1159]",1186,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 218, 571, 1066]","[60, 29, 54, 59]"
1572,5_juice_cam01,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 153, 311, 357, 1430, 1518, 2438]","[29, 10, 32, 16, 3, 55, 25]"
1573,5_coffee_cam01,5,"[2, 243, 379, 628, 1028]",1123,/media/data_cifs2/apra/work/labwork/data/exter...,"[[198, 199, 200, 201, 202, 203, 204, 205, 206,...","[198, 362, 568]","[45, 17, 60]"
1574,5_juice_stereo_1,8,"[18, 163, 343, 373, 1433, 1573, 2463, 2733]",2795,/media/data_cifs2/apra/work/labwork/data/exter...,"[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...","[0, 131, 301, 338, 1424, 1528, 2426]","[41, 32, 42, 35, 9, 45, 37]"


In [114]:
path_clips = path_interim_i3d / 'clips'
if not path_clips.exists():
    path_clips.mkdir(parents=True)
path_clips

PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips')

In [115]:
for id, path, clip_ranges, starts, events, length in coarse_i3d_df[[
    'id',
    'path_i3d',
    'all_clip_ranges', 
    'start_clip_idcs', 
    'event_clip_idcs',
    'length',
]].values:
    file_array = np.load(str(path))
    for clip_range, start, event in zip(clip_ranges, starts, events):
        path_out = path_clips / f'{id}_start_{start}_event_{event}_seed_{SEED}.npy'
        np.save(str(path_out), file_array[clip_range, :])

### Saving the Convenience DFs

In [116]:
coarse_64d_fv_df.to_csv(str(path_interim_64fv / 'coarse_summary.csv'), index=False)

In [117]:
coarse_i3d_df.to_csv(str(path_interim_i3d / 'coarse_summary.csv'), index=False)

## Creating One Dataset

In [118]:
raw_data_path = DIR_RAW_BREAKFAST / f'{n_frames}_frame_clips'
if not raw_data_path.exists():
    raw_data_path.mkdir(parents=True)

### 64 Dim FVs

In [119]:
path_64dim_clips = path_interim_64fv / 'clips'
all_clips_data = [[path, np.load(str(path))] for path in path_64dim_clips.iterdir()]

In [120]:
path_all_clips, all_data = zip(*all_clips_data)

In [121]:
path_all_clips[:5]

(PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/38_pancake_stereo_1_start_1244_event_55_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/44_juice_cam02_start_1099_event_39_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/49_scrambledegg_cam02_start_2801_event_7_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/42_friedegg_webcam01_start_2860_event_34_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/64dim_fv_clips/clips/35_tea_cam01_start_272_event_14_seed_117.npy'))

In [122]:
path_raw_64fv_data = raw_data_path / '64dim_fv'
if not path_raw_64fv_data.exists():
    path_raw_64fv_data.mkdir(parents=True)

In [123]:
with open(str(path_raw_64fv_data / f'path_event_data_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(path_all_clips, filehandle)

In [124]:
all_data_array = np.array(all_data)
all_data_array.shape

(7061, 64, 64)

In [125]:
np.save(str(path_raw_64fv_data / f'event_clips_seed_{SEED}.npy'), all_data_array)

### I3D Dim FVs

In [126]:
path_i3d_clips = path_interim_i3d / 'clips'
all_clips_data = [[path, np.load(str(path))] for path in path_i3d_clips.iterdir()]

In [127]:
path_all_clips, all_data = zip(*all_clips_data)

In [128]:
path_all_clips[:5]

(PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips/51_juice_webcam01_start_201_event_6_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips/47_pancake_webcam01_start_2212_event_16_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips/41_tea_webcam01_start_43_event_30_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips/27_pancake_cam01_start_6078_event_44_seed_117.npy'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/interim/breakfast/video_clips/event_clips/i3d_clips/clips/52_sandwich_webcam01_start_2464_event_56_seed_117.npy'))

In [129]:
path_raw_i3d_data = raw_data_path / 'i3d_fv'
if not path_raw_i3d_data.exists():
    path_raw_i3d_data.mkdir(parents=True)

In [130]:
with open(str(path_raw_i3d_data / f'path_event_data_seed_{SEED}.pkl'), 'wb') as filehandle:
    pickle.dump(path_all_clips, filehandle)

In [131]:
all_data_array = np.array(all_data)
all_data_array.shape

(7557, 64, 2048)

In [132]:
np.save(str(path_raw_i3d_data / f'event_clips_seed_{SEED}.npy'), all_data_array)