# 4.0.5 Assessing the Video Length Issue

The coarse segments and 64 dim feature vectors disagree on how long certain videos are. This notebook explores how much all the data disagrees with each other.

## Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [1]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p torch,torchvision,cv2,h5py,pandas,matplotlib,seaborn,jupyterlab,lab

Mon Feb 24 2020 12:38:12 

CPython 3.6.10
IPython 7.12.0

torch 1.2.0
torchvision 0.1.8
cv2 3.4.2
h5py 2.8.0
pandas 1.0.1
matplotlib 3.1.3
seaborn 0.10.0
jupyterlab 1.2.6
lab 0+untagged.21.g9d80dae.dirty

compiler   : GCC 7.3.0
system     : Linux
release    : 4.4.0-173-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 16
interpreter: 64bit
Git hash   : 9d80daec6156826fef6a7472507201013d8c7801
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [5]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [6]:
# Set the matplotlib mode
%matplotlib inline

## Set the GPU

Make sure we aren't greedy.

In [7]:
!nvidia-smi

Mon Feb 24 12:38:57 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  TITAN X (Pascal)    Off  | 00000000:04:00.0 Off |                  N/A |
| 23%   22C    P8     7W / 250W |      0MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:05:00.0 Off |                  N/A |
| 29%   50C    P2    58W / 250W |   2885MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:08:00.0 Off |                  N/A |
| 32%   

In [8]:
%env CUDA_VISIBLE_DEVICES=0

env: CUDA_VISIBLE_DEVICES=0


## Imports

In [9]:
from pathlib import Path

import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [276]:
# Constants to be used throughout the package
%aimport lab.index
from lab.index import DIR_DATA_INT, DIR_DATA_RAW
%aimport lab.breakfast.constants
from lab.breakfast.constants import SEED
# Import the data subdirectories
%aimport lab.breakfast.index
from lab.breakfast.index import (DIR_BREAKFAST, 
                                 DIR_BREAKFAST_VIDEOS,
                                 DIR_BREAKFAST_DATA, 
                                 DIR_COARSE_SEG, 
                                 DIR_FINE_SEG,
                                 DIR_I3D_FVS,
                                )

## Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [8]:
sns.set()

## Rerunning the Same Code Blocks

### Getting Start Frames and Number of Segments

In [11]:
df = pd.read_csv(str(DIR_BREAKFAST / 'video_lengths.csv'),
                 index_col=0,
                 header=0)

In [12]:
df.head()

Unnamed: 0,Name,Length
0,P27_cam01_P27_scrambledegg,3561
1,P11_cam01_P11_scrambledegg,2649
2,P27_stereo01_P27_scrambledegg,3560
3,P21_cam01_P21_scrambledegg,5281
4,P46_cam02_P46_scrambledegg,3135


In [15]:
with open(str(DIR_COARSE_SEG / 'pancake/P53_webcam01_P53_pancake.txt'), 'r') as f:
    for l in f:
        print(l)

1-1 SIL  

2-302 spoon_flour  

303-501 crack_egg  

502-727 pour_milk  

728-1077 butter_pan  

1078-1844 stir_dough  

1845-1948 pour_dough2pan  

1949-3649 fry_pancake  

3650-3736 take_plate  

3737-3826 put_pancake2plate  

3827-3874 SIL  



In [16]:
with open(str(DIR_COARSE_SEG / 'pancake/P53_webcam01_P53_pancake.txt'), 'r') as f:
    starts = [line.split('-')[0] for line in f][1:]
    print(starts)

['2', '303', '502', '728', '1078', '1845', '1949', '3650', '3737', '3827']


In [17]:
n_coarse_segments, coarse_segment_starts = [], []

for name in df.Name:
    action = name.split('_')[-1]
    path = DIR_COARSE_SEG / f'{action}/{name}.txt'
    
    with open(str(path), 'r') as file:
        coarse_segment_starts.append([int(line.split('-')[0]) for line in file][1:])
        n_coarse_segments.append(len(coarse_segment_starts[-1]))

In [18]:
df['n_coarse_segments'] = n_coarse_segments
df['coarse_segment_starts'] = coarse_segment_starts

In [19]:
df.head()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts
0,P27_cam01_P27_scrambledegg,3561,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3..."
1,P11_cam01_P11_scrambledegg,2649,6,"[39, 293, 897, 1928, 2410, 2620]"
2,P27_stereo01_P27_scrambledegg,3560,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3..."
3,P21_cam01_P21_scrambledegg,5281,9,"[234, 624, 1120, 1408, 1577, 1975, 4189, 4451,..."
4,P46_cam02_P46_scrambledegg,3135,7,"[2, 272, 754, 1226, 1421, 2617, 2682]"


In [20]:
n_fine_segments, fine_segment_starts = [], []

for name in df.Name:
    action = name.split('_')[-1]
    path = DIR_FINE_SEG / f'{action}/{name}.txt'
    
    try:
        with open(str(path), 'r') as file:
            fine_segment_starts.append([int(line.split('-')[0]) for line in file][1:-1])
            n_fine_segments.append(len(fine_segment_starts[-1]))
    except FileNotFoundError:
        fine_segment_starts.append(None)
        n_fine_segments.append(None)
        
df['n_fine_segments'] = n_fine_segments
df['fine_segment_starts'] = fine_segment_starts

df.tail()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts
1024,P36_webcam02_P36_pancake,7049,8,"[2, 279, 548, 1044, 1958, 2243, 2844, 6903]",,
1025,P54_stereo01_P54_pancake,3340,11,"[2, 225, 506, 692, 1018, 1336, 1709, 1837, 206...",66.0,"[11, 27, 55, 77, 115, 132, 191, 223, 235, 251,..."
1026,P13_cam01_P13_pancake,4436,12,"[31, 611, 717, 907, 1036, 1316, 1443, 1493, 17...",111.0,"[32, 47, 93, 138, 212, 318, 333, 348, 377, 467..."
1027,P21_cam01_P21_pancake,5841,10,"[95, 415, 699, 960, 1634, 2125, 2437, 5367, 54...",,
1028,P12_webcam01_P12_pancake,2870,8,"[2, 136, 324, 446, 603, 744, 2680, 2805]",54.0,"[32, 47, 62, 76, 77, 122, 123, 152, 213, 287, ..."


In [21]:
len([n for n in n_fine_segments if n is None])

505

In [22]:
len(n_fine_segments)

1029

### Counting Actions

In [23]:
actions = []
for name in df.Name:
    actions.append(name.split('_')[-1])
df['actions'] = actions

In [24]:
df.head()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts,actions
0,P27_cam01_P27_scrambledegg,3561,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3...",71.0,"[38, 73, 83, 150, 183, 213, 259, 308, 314, 337...",scrambledegg
1,P11_cam01_P11_scrambledegg,2649,6,"[39, 293, 897, 1928, 2410, 2620]",74.0,"[23, 34, 56, 67, 87, 107, 116, 122, 128, 132, ...",scrambledegg
2,P27_stereo01_P27_scrambledegg,3560,9,"[12, 185, 444, 1031, 1310, 1923, 2880, 2987, 3...",71.0,"[38, 73, 83, 150, 183, 213, 259, 308, 314, 337...",scrambledegg
3,P21_cam01_P21_scrambledegg,5281,9,"[234, 624, 1120, 1408, 1577, 1975, 4189, 4451,...",,,scrambledegg
4,P46_cam02_P46_scrambledegg,3135,7,"[2, 272, 754, 1226, 1421, 2617, 2682]",,,scrambledegg


In [25]:
np.unique(df.actions, return_counts=True)

(array(['coffee', 'friedegg', 'juice', 'milk', 'pancake', 'sandwich',
        'scrambledegg'], dtype=object),
 array([167, 173, 162, 187, 157, 169,  14]))

In [26]:
df.tail()

Unnamed: 0,Name,Length,n_coarse_segments,coarse_segment_starts,n_fine_segments,fine_segment_starts,actions
1024,P36_webcam02_P36_pancake,7049,8,"[2, 279, 548, 1044, 1958, 2243, 2844, 6903]",,,pancake
1025,P54_stereo01_P54_pancake,3340,11,"[2, 225, 506, 692, 1018, 1336, 1709, 1837, 206...",66.0,"[11, 27, 55, 77, 115, 132, 191, 223, 235, 251,...",pancake
1026,P13_cam01_P13_pancake,4436,12,"[31, 611, 717, 907, 1036, 1316, 1443, 1493, 17...",111.0,"[32, 47, 93, 138, 212, 318, 333, 348, 377, 467...",pancake
1027,P21_cam01_P21_pancake,5841,10,"[95, 415, 699, 960, 1634, 2125, 2437, 5367, 54...",,,pancake
1028,P12_webcam01_P12_pancake,2870,8,"[2, 136, 324, 446, 603, 744, 2680, 2805]",54.0,"[32, 47, 62, 76, 77, 122, 123, 152, 213, 287, ...",pancake


### Finding Good First Frames

In [27]:
# Select this number of frames
n_frames = 64

In [28]:
np.unique(n_coarse_segments, return_counts=True)

(array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 15, 17]),
 array([ 41,  71, 225, 243, 151,  88,  48,  33,  50,  50,  17,   6,   2,
          4]))

In [29]:
starting_segments = [coarse_segment_starts[i][0]
                     for i, n in enumerate(n_coarse_segments)
                     if n < 3]

In [30]:
np.unique(starting_segments, return_counts=True)

(array([ 2, 25, 38, 47, 60, 72]), array([20,  5,  5,  3,  4,  4]))

## Illustration of the Problem

The largest starting coarse segment is after the video length

In [31]:
idx = 259

In [32]:
df.iloc[idx]

Name                                             P52_stereo01_P52_sandwich
Length                                                                1186
n_coarse_segments                                                        6
coarse_segment_starts                    [155, 340, 550, 2250, 2520, 2555]
n_fine_segments                                                         73
fine_segment_starts      [358, 507, 517, 542, 595, 619, 683, 701, 716, ...
actions                                                           sandwich
Name: 259, dtype: object

In [33]:
max(df.iloc[idx].coarse_segment_starts)

2555

In [34]:
df.iloc[idx].Length

1186

## Checking the Raw Video

In [39]:
all_avis = list(DIR_BREAKFAST_VIDEOS.rglob('*.avi'))

In [40]:
all_avis[0]

PosixPath('/media/data_cifs2/apra/work/labwork/data/external/breakfast/BreakfastII_15fps_qvga_sync/P39/cam01/P39_salat.avi')

In [41]:
df.Name.iloc[idx]

'P52_stereo01_P52_sandwich'

In [43]:
idx_path = DIR_BREAKFAST_VIDEOS / 'P52/stereo'
assert idx_path.exists()

In [48]:
idx_video_paths = list(idx_path.rglob('*sandwich_ch*.avi'))
idx_video_paths

[PosixPath('/media/data_cifs2/apra/work/labwork/data/external/breakfast/BreakfastII_15fps_qvga_sync/P52/stereo/P52_sandwich_ch1.avi'),
 PosixPath('/media/data_cifs2/apra/work/labwork/data/external/breakfast/BreakfastII_15fps_qvga_sync/P52/stereo/P52_sandwich_ch0.avi')]

In [49]:
for path in idx_video_paths:
    cam = cv2.VideoCapture(str(path))
    print(path.stem, cam.get(cv2.CAP_PROP_FRAME_COUNT))

P52_sandwich_ch1 2677.0
P52_sandwich_ch0 2677.0


## Checking All Videos

In [50]:
len(all_avis)

1989

In [52]:
all_lengths = [cv2.VideoCapture(str(path)).get(cv2.CAP_PROP_FRAME_COUNT)
               for path in tqdm(all_avis)]

100%|██████████| 1989/1989 [02:51<00:00, 11.63it/s]


In [63]:
all_lengths = [int(length) for length in all_lengths]

## Checking Number of Segments

### Coarse

In [57]:
coarse_paths = list(DIR_COARSE_SEG.rglob('*.txt'))
num_coarse = len(coarse_paths)
num_coarse

1712

### Fine

In [58]:
fine_paths = list(DIR_FINE_SEG.rglob('*.txt'))
num_fine = len(fine_paths)
num_fine

1285

## Making Metadata DFs

In [146]:
dir_meta = DIR_BREAKFAST / 'meta'
if not dir_meta.exists():
    dir_meta.mkdir(parents=True)

### Videos

In [64]:
video_df = pd.DataFrame(zip(all_avis, all_lengths), columns=['path', 'length'])

In [65]:
video_df.head()

Unnamed: 0,path,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990
3,/media/data_cifs2/apra/work/labwork/data/exter...,699
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890


In [85]:
name_splits = [str(path.stem).split('_') for path in video_df.path.values] 

In [86]:
name_splits[:5]

[['P39', 'salat'],
 ['P39', 'tea'],
 ['P39', 'pancake'],
 ['P39', 'cereals'],
 ['P39', 'friedegg']]

In [87]:
patient, action = zip(*name_splits)

In [90]:
patient_int = [int(p[1:]) for p in patient]

In [91]:
patient_int[:5]

[39, 39, 39, 39, 39]

In [92]:
video_df['patient'] = patient_int

In [93]:
video_df['action'] = action

In [94]:
video_df.head()

Unnamed: 0,path,length,patient,action
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg


In [97]:
cameras = [video.parent.stem for video in video_df.path.values]
cameras[:5]

['cam01', 'cam01', 'cam01', 'cam01', 'cam01']

In [98]:
video_df['camera'] = cameras

In [99]:
video_df.head()

Unnamed: 0,path,length,patient,action,camera
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01


In [136]:
channel = []
for video in video_df.path.values:
    if video.parent.stem != 'stereo':
        channel.append(None)
    else:
        val = int(video.stem[-1])
        channel.append(val)

In [134]:
channel[:5]

[None, None, None, None, None]

In [137]:
video_df['channel'] = channel

In [140]:
video_df.channel.dtype

dtype('float64')

In [104]:
video_df.head()

Unnamed: 0,path,length,patient,action,camera,channel
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,


In [142]:
video_id = [f'{p}_{a}_{ca}'
            if np.isnan(float(ch))
            else f'{p}_{a}_{ca}_{int(ch)}'
            for p, a, ca, ch in zip(
                video_df.patient, 
                video_df.action, 
                video_df.camera, 
                video_df.channel)]

In [143]:
video_id[:5]

['39_salat_cam01',
 '39_tea_cam01',
 '39_pancake_cam01',
 '39_cereals_cam01',
 '39_friedegg_cam01']

In [144]:
video_df['id'] = video_id

In [145]:
video_df[video_df.camera == 'stereo'].head()

Unnamed: 0,path,length,patient,action,camera,channel,id
24,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,stereo,1.0,39_pancake_stereo_1
25,/media/data_cifs2/apra/work/labwork/data/exter...,1811,39,juice,stereo,1.0,39_juice_stereo_1
26,/media/data_cifs2/apra/work/labwork/data/exter...,1524,39,sandwich,stereo,1.0,39_sandwich_stereo_1
27,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,stereo,1.0,39_friedegg_stereo_1
28,/media/data_cifs2/apra/work/labwork/data/exter...,4407,39,scrambledegg,stereo,1.0,39_scrambledegg_stereo_1


In [148]:
video_df.to_csv(str(dir_meta / 'video_meta.csv'), index=False)

### Coarse Segmentations

In [211]:
coarse_df = pd.DataFrame(coarse_paths, columns=['path'])

In [212]:
coarse_df.head()

Unnamed: 0,path
0,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...


In [213]:
patient, action, camera, channel = [], [], [], []
for path in coarse_df.path.values:
    splits = path.stem.split('_')
    action.append(splits[-1])
    patient.append(int(splits[0][1:]))
    
    if 'stereo' in splits[1]:
        camera.append('stereo')
        channel.append(int(splits[1][-2:]))
    else:
        camera.append(splits[1])
        channel.append(None)

In [214]:
action[:5]

['cereals', 'cereals', 'cereals', 'cereals', 'cereals']

In [215]:
patient[:5]

[18, 54, 34, 13, 51]

In [216]:
data_dict = {'patient' : patient, 
             'action' : action,
             'camera' : camera, 
             'channel' : channel}


In [217]:
for key, val in data_dict.items():
    coarse_df[key] = val

In [218]:
coarse_df.head()

Unnamed: 0,path,patient,action,camera,channel
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,


In [219]:
coarse_id = [f'{p}_{a}_{ca}'
            if np.isnan(float(ch))
            else f'{p}_{a}_{ca}_{int(ch)}'
            for p, a, ca, ch in zip(
                coarse_df.patient, 
                coarse_df.action, 
                coarse_df.camera, 
                coarse_df.channel)]

In [220]:
coarse_df['id'] = coarse_id

In [221]:
coarse_df.head()

Unnamed: 0,path,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02


In [226]:
last_frames = []
for path in coarse_df.path.values:
    with open(str(path), 'r') as f:
        for l in f: pass
        last_frames.append(int(l.split('-')[1].split(' ')[0]))

In [230]:
coarse_df['length'] = last_frames

In [231]:
coarse_df.head()

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,465
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,747
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,765
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,679
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,711


In [232]:
coarse_df.to_csv(str(dir_meta / 'coarse_segmentations_meta.csv'), index=False)

### Fine Segmentations

In [195]:
fine_paths = list(DIR_FINE_SEG.rglob('*.txt'))

In [197]:
fine_df = pd.DataFrame(fine_paths, columns=['path'])

In [198]:
fine_df.head()

Unnamed: 0,path
0,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...


In [199]:
patient, action, camera, channel = [], [], [], []
for path in fine_df.path.values:
    splits = path.stem.split('_')
    action.append(splits[-1])
    patient.append(int(splits[0][1:]))
    
    if 'stereo' in splits[1]:
        camera.append('stereo')
        channel.append(int(splits[1][-2:]))
    else:
        camera.append(splits[1])
        channel.append(None)

In [200]:
action[:5]

['cereals', 'cereals', 'cereals', 'cereals', 'cereals']

In [201]:
patient[:5]

[5, 18, 25, 8, 34]

In [202]:
data_dict = {'patient' : patient, 
             'action' : action,
             'camera' : camera, 
             'channel' : channel}


In [203]:
for key, val in data_dict.items():
    fine_df[key] = val

In [204]:
fine_df.head()

Unnamed: 0,path,patient,action,camera,channel
0,/media/data_cifs2/apra/work/labwork/data/exter...,5,cereals,stereo,1.0
1,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,
2,/media/data_cifs2/apra/work/labwork/data/exter...,25,cereals,stereo,1.0
3,/media/data_cifs2/apra/work/labwork/data/exter...,8,cereals,cam01,
4,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0


In [205]:
fine_id = [f'{p}_{a}_{ca}'
            if np.isnan(float(ch))
            else f'{p}_{a}_{ca}_{int(ch)}'
            for p, a, ca, ch in zip(
                fine_df.patient, 
                fine_df.action, 
                fine_df.camera, 
                fine_df.channel)]

In [206]:
fine_df['id'] = fine_id

In [207]:
fine_df.head()

Unnamed: 0,path,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,5,cereals,stereo,1.0,5_cereals_stereo_1
1,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02
2,/media/data_cifs2/apra/work/labwork/data/exter...,25,cereals,stereo,1.0,25_cereals_stereo_1
3,/media/data_cifs2/apra/work/labwork/data/exter...,8,cereals,cam01,,8_cereals_cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1


In [233]:
last_frames = []
for path in fine_df.path.values:
    with open(str(path), 'r') as f:
        for l in f: pass
        last_frames.append(int(l.split('-')[1].split(' ')[0]))

In [234]:
fine_df['length'] = last_frames

In [235]:
fine_df.head()

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,5,cereals,stereo,1.0,5_cereals_stereo_1,1185
1,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,465
2,/media/data_cifs2/apra/work/labwork/data/exter...,25,cereals,stereo,1.0,25_cereals_stereo_1,795
3,/media/data_cifs2/apra/work/labwork/data/exter...,8,cereals,cam01,,8_cereals_cam01,1097
4,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,765


In [236]:
fine_df.to_csv(str(dir_meta / 'fine_segmentations_meta.csv'), index=False)

### 64 Dim FVs

In [237]:
all_64_dim_paths = list(DIR_BREAKFAST_DATA.rglob('*.txt'))

In [241]:
len(all_64_dim_paths)

1712

In [242]:
all_64_dim_paths[0]

PosixPath('/media/data_cifs2/apra/work/labwork/data/external/breakfast/Breakfast_data/s1/cereals/P18_webcam02_P18_cereals.txt')

In [258]:
fvs_df = pd.DataFrame(all_64_dim_paths, columns=['path'])
fvs_df.head()

Unnamed: 0,path
0,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...


In [259]:
patient, action, camera, channel = [], [], [], []
for path in fvs_df.path.values:
    splits = path.stem.split('_')
    action.append(splits[-1])
    patient.append(int(splits[0][1:]))
    
    if 'stereo' in splits[1]:
        camera.append('stereo')
        channel.append(int(splits[1][-2:]))
    else:
        camera.append(splits[1])
        channel.append(None)

In [260]:
action[:5]

['cereals', 'cereals', 'cereals', 'cereals', 'cereals']

In [261]:
patient[:5]

[18, 54, 34, 13, 51]

In [262]:
data_dict = {'patient' : patient, 
             'action' : action,
             'camera' : camera, 
             'channel' : channel}


In [263]:
for key, val in data_dict.items():
    fvs_df[key] = val

In [264]:
fvs_df.head()

Unnamed: 0,path,patient,action,camera,channel
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,


In [265]:
fvs_id = [f'{p}_{a}_{ca}'
          if np.isnan(float(ch))
          else f'{p}_{a}_{ca}_{int(ch)}'
          for p, a, ca, ch in zip(
              fvs_df.patient, 
              fvs_df.action, 
              fvs_df.camera, 
              fvs_df.channel)]

In [266]:
fvs_id[:5]

['18_cereals_webcam02',
 '54_cereals_cam01',
 '34_cereals_stereo_1',
 '13_cereals_cam01',
 '51_cereals_webcam02']

In [268]:
fvs_df['id'] = fvs_id

In [269]:
fvs_df.head()

Unnamed: 0,path,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02


In [274]:
data_length = []
for path in fvs_df.path.values:
    data_length.append(len(pd.read_table(str(path), index_col=0, header=0)))

In [277]:
fvs_df['length'] = data_length

In [278]:
fvs_df.head()

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,743
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,761
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,675
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,707


In [279]:
fvs_df.to_csv(str(dir_meta / '64dim_fvs_meta.csv'), index=False)

### I3D FVs

In [283]:
all_i3d_paths = list(DIR_I3D_FVS.rglob('*.npy'))

In [284]:
len(all_i3d_paths)

1712

In [285]:
all_i3d_paths[0]

PosixPath('/media/data_cifs2/apra/work/labwork/data/external/breakfast/i3d_fvs/P23_webcam01_P23_milk.npy')

In [296]:
i3d_df = pd.DataFrame(all_i3d_paths, columns=['path'])
i3d_df.head()

Unnamed: 0,path
0,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...


In [297]:
patient, action, camera, channel = [], [], [], []
for path in i3d_df.path.values:
    splits = path.stem.split('_')
    action.append(splits[-1])
    patient.append(int(splits[0][1:]))
    
    if 'stereo' in splits[1]:
        camera.append('stereo')
        channel.append(int(splits[1][-2:]))
    else:
        camera.append(splits[1])
        channel.append(None)

In [298]:
action[:5]

['milk', 'tea', 'cereals', 'friedegg', 'tea']

In [299]:
patient[:5]

[23, 14, 18, 49, 17]

In [300]:
data_dict = {'patient' : patient, 
             'action' : action,
             'camera' : camera, 
             'channel' : channel}


In [301]:
for key, val in data_dict.items():
    i3d_df[key] = val

In [302]:
i3d_df.head()

Unnamed: 0,path,patient,action,camera,channel
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,


In [303]:
i3d_id = [f'{p}_{a}_{ca}'
          if np.isnan(float(ch))
          else f'{p}_{a}_{ca}_{int(ch)}'
          for p, a, ca, ch in zip(
              i3d_df.patient, 
              i3d_df.action, 
              i3d_df.camera, 
              i3d_df.channel)]

In [306]:
i3d_id[:5]

['23_milk_webcam01',
 '14_tea_webcam01',
 '18_cereals_webcam02',
 '49_friedegg_webcam01',
 '17_tea_webcam02']

In [307]:
i3d_df['id'] = i3d_id

In [308]:
i3d_df.head()

Unnamed: 0,path,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,,23_milk_webcam01
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,,14_tea_webcam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,,49_friedegg_webcam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,,17_tea_webcam02


In [318]:
data_length = []
for path in i3d_df.path.values:
    # See this post for mmap_mode
    # https://stackoverflow.com/questions/52889798/obtain-lengths-of-vectors-without-loading-multiple-npy-files
    data_length.append(len(np.load(str(path), mmap_mode='r')))

In [319]:
i3d_df['length'] = data_length

In [320]:
i3d_df.head()

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,,23_milk_webcam01,1246
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,,14_tea_webcam01,397
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,,49_friedegg_webcam01,5174
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,,17_tea_webcam02,703


In [321]:
i3d_df.to_csv(str(dir_meta / 'i3d_fvs_meta.csv'), index=False)