# 4.0.6 Checking Metadata

Continuing where the last notebook left off since it was getting long.

## Summary of Processing

- Remove coarse segmentations that have no videos
- Remove coarse segmentations that have a length mismatach with videos
- Outer join videos with remaining coarse segmentations
- Remove fine segmentations that have no corresponding video
- Remove fine segmentations that have a length mismatach with videos
- Outer join between master and fine
- Remove 64 dim fvs with no matching video
- Remove 64 dim fvs with length mismatch greater than 4 frames (none have an exact match, 204 have >4 frame mismatch)
- Outer join between master and fvs
- Remove I3D fvs with no matching video
- Remove I3D fvs with length mismatch greater than 4 frames (none have an exact match, 104 have >4 frame mismatch)
- Outer join between master and fvs

## Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [1]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p torch,torchvision,cv2,h5py,pandas,matplotlib,seaborn,jupyterlab,lab

Tue Feb 25 2020 20:13:59 

CPython 3.6.10
IPython 7.12.0

torch 1.2.0
torchvision 0.1.8
cv2 3.4.2
h5py 2.8.0
pandas 1.0.1
matplotlib 3.1.3
seaborn 0.10.0
jupyterlab 1.2.6
lab 0+untagged.23.g7f125f8.dirty

compiler   : GCC 7.3.0
system     : Linux
release    : 4.4.0-173-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 16
interpreter: 64bit
Git hash   : 7f125f8034fb6eee72ac6d3e6a7a765d65007482
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [2]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [3]:
# Set the matplotlib mode
%matplotlib inline

## Set the GPU

Make sure we aren't greedy.

In [4]:
!nvidia-smi

Tue Feb 25 20:16:25 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  TITAN X (Pascal)    Off  | 00000000:04:00.0 Off |                  N/A |
| 25%   45C    P2    55W / 250W |   1745MiB / 12196MiB |     10%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   52C    P2    59W / 250W |   2885MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    Off  | 00000000:08:00.0 Off |                  N/A |
| 34%   

In [5]:
%env CUDA_VISIBLE_DEVICES=0

env: CUDA_VISIBLE_DEVICES=0


## Imports

In [6]:
from pathlib import Path

import cv2
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [239]:
# Constants to be used throughout the package
%aimport lab.index
from lab.index import DIR_DATA_INT, DIR_DATA_RAW
%aimport lab.breakfast.constants
from lab.breakfast.constants import SEED
# Import the data subdirectories
%aimport lab.breakfast.index
from lab.breakfast.index import (DIR_BREAKFAST, 
                                 DIR_RAW_BREAKFAST,
                                 DIR_BREAKFAST_META,
                                 DIR_BREAKFAST_VIDEOS,
                                 DIR_BREAKFAST_DATA, 
                                 DIR_COARSE_SEG, 
                                 DIR_FINE_SEG,
                                 DIR_I3D_FVS,
                                )

## Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [8]:
sns.set()

## Loading the DFs

In [88]:
video_df = pd.read_csv(str(DIR_BREAKFAST_META / 'video_meta.csv'))
coarse_df = pd.read_csv(str(DIR_BREAKFAST_META / 'coarse_segmentations_meta.csv'))
fine_df = pd.read_csv(str(DIR_BREAKFAST_META / 'fine_segmentations_meta.csv'))
fvs_df = pd.read_csv(str(DIR_BREAKFAST_META / '64dim_fvs_meta.csv'))
i3d_df = pd.read_csv(str(DIR_BREAKFAST_META / 'i3d_fvs_meta.csv'))

In [13]:
all_dfs = {'videos' : video_df, 
           'coarse' : coarse_df, 
           'fine' : fine_df, 
           '64dim_fvs' : fvs_df, 
           'i3d_fvs' : i3d_df}

## Videos vs Coarse

In [14]:
for name, val in all_dfs.items():
    print(name, len(val))

videos 1989
coarse 1712
fine 1285
64dim_fvs 1712
i3d_fvs 1712


In [15]:
1989 - 1712

277

### Not Total Correspondance

In [16]:
dropped_videos_2_coarse = video_df[~video_df.id.isin(coarse_df.id)].dropna()
len(dropped_videos_2_coarse)

309

In [17]:
dropped_videos_2_coarse.head()

Unnamed: 0,path,length,patient,action,camera,channel,id
29,/media/data_cifs2/apra/work/labwork/data/exter...,1811,39,juice,stereo,0.0,39_juice_stereo_0
30,/media/data_cifs2/apra/work/labwork/data/exter...,947,39,milk,stereo,0.0,39_milk_stereo_0
33,/media/data_cifs2/apra/work/labwork/data/exter...,644,39,coffee,stereo,0.0,39_coffee_stereo_0
34,/media/data_cifs2/apra/work/labwork/data/exter...,4407,39,scrambledegg,stereo,0.0,39_scrambledegg_stereo_0
35,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,stereo,0.0,39_friedegg_stereo_0


In [18]:
dropped_videos_2_coarse.camera.unique()

array(['stereo'], dtype=object)

In [19]:
len(np.unique(coarse_df.id))

1712

In [20]:
len(pd.merge(video_df, coarse_df, how='inner', on='id'))

1680

In [81]:
len(video_df)

1989

### 32 Segmentations Don't Have Videos

In [21]:
coarse_ids = coarse_df.id.to_list()
video_ids = video_df.id.to_list()

In [22]:
not_in_video_ids = [cid for cid in coarse_ids if cid not in video_ids]

In [23]:
len(not_in_video_ids)

32

In [24]:
not_in_video_ids

['22_cereals_stereo_1',
 '49_cereals_stereo_1',
 '17_cereals_stereo_1',
 '13_cereals_stereo_1',
 '14_cereals_stereo_1',
 '20_cereals_stereo_1',
 '15_salat_stereo_1',
 '17_salat_stereo_1',
 '34_salat_stereo_1',
 '13_scrambledegg_stereo_1',
 '14_scrambledegg_stereo_1',
 '16_scrambledegg_stereo_1',
 '15_scrambledegg_stereo_1',
 '15_milk_stereo_1',
 '13_milk_stereo_1',
 '23_milk_stereo_1',
 '15_sandwich_stereo_1',
 '21_sandwich_stereo_1',
 '7_sandwich_stereo_1',
 '14_juice_stereo_1',
 '15_juice_stereo_1',
 '7_juice_stereo_1',
 '16_coffee_stereo_1',
 '26_coffee_stereo_1',
 '14_coffee_stereo_1',
 '21_friedegg_stereo_1',
 '22_friedegg_stereo_1',
 '23_friedegg_stereo_1',
 '24_friedegg_stereo_1',
 '16_pancake_stereo_1',
 '15_pancake_stereo_1',
 '4_pancake_stereo_1']

### Remove Coarse Segmentations that have no videos

In [79]:
good_coarse_ids = [cid for cid in coarse_ids if cid not in not_in_video_ids]
len(good_coarse_ids)

1680

In [89]:
coarse_df_cleaner = coarse_df[coarse_df.id.isin(good_coarse_ids)]
len(coarse_df_cleaner)

1680

### 309 Videos Have No Coarse Segments

In [31]:
doesnt_have_coarse_segments = [vid for vid in video_ids if vid not in coarse_ids]

In [35]:
len(doesnt_have_coarse_segments)

309

### Three Have Different Lengths

In [95]:
common_videos = video_df[video_df.id.isin(coarse_df_cleaner.id)]
merged_videos_coarse_common = common_videos[['id', 'length']].merge(
    coarse_df_cleaner[['id', 'length']],
    on='id')
merged_videos_coarse_common

Unnamed: 0,id,length_x,length_y
0,39_salat_cam01,2669,2669
1,39_tea_cam01,1304,1304
2,39_pancake_cam01,8990,8990
3,39_cereals_cam01,699,699
4,39_friedegg_cam01,2890,2890
...,...,...,...
1675,5_cereals_cam01,1186,1186
1676,5_juice_cam01,2795,2795
1677,5_coffee_cam01,1123,1123
1678,5_juice_stereo_1,2795,2795


In [96]:
video_lengths = merged_videos_coarse_common.length_x.values
coarse_lengths = merged_videos_coarse_common.length_y.values

In [97]:
video_lengths[:5]

array([2669, 1304, 8990,  699, 2890])

In [98]:
coarse_lengths[:5]

array([2669, 1304, 8990,  699, 2890])

In [99]:
sum(video_lengths != coarse_lengths)

3

Three of them have different lengths.

In [100]:
merged_videos_coarse_common.iloc[np.where(video_lengths != coarse_lengths)]

Unnamed: 0,id,length_x,length_y
173,8_coffee_webcam01,275,289
249,43_juice_stereo_1,1622,1635
264,43_juice_cam02,1635,1622


They do indeed have different lengths

### Removing the Bad IDs

In [101]:
bad_ids = merged_videos_coarse_common.iloc[np.where(video_lengths != coarse_lengths)].id.to_list()
bad_ids

['8_coffee_webcam01', '43_juice_stereo_1', '43_juice_cam02']

In [102]:
merged_videos_coarse_common[merged_videos_coarse_common.id.isin(bad_ids)]

Unnamed: 0,id,length_x,length_y
173,8_coffee_webcam01,275,289
249,43_juice_stereo_1,1622,1635
264,43_juice_cam02,1635,1622


In [104]:
len(coarse_df_cleaner)

1680

In [105]:
coarse_df_cleaner = coarse_df_cleaner[~coarse_df_cleaner.id.isin(bad_ids)]
coarse_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,465
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,747
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,765
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,679
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,711
...,...,...,...,...,...,...,...
1707,/media/data_cifs2/apra/work/labwork/data/exter...,36,pancake,webcam02,,36_pancake_webcam02,7053
1708,/media/data_cifs2/apra/work/labwork/data/exter...,54,pancake,stereo,1.0,54_pancake_stereo_1,3344
1709,/media/data_cifs2/apra/work/labwork/data/exter...,13,pancake,cam01,,13_pancake_cam01,4440
1710,/media/data_cifs2/apra/work/labwork/data/exter...,21,pancake,cam01,,21_pancake_cam01,5845


### Making the Master DF

In [194]:
master_df = video_df
master_df.head()

Unnamed: 0,path_videos,length,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01


In [195]:
master_df.rename(columns={'path' : 'path_videos'}, inplace=True)
master_df.head()

Unnamed: 0,path_videos,length,patient,action,camera,channel,id
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01


In [196]:
master_df = master_df.merge(coarse_df_cleaner[['id', 'path']], how='outer', on='id')
master_df.rename(columns={'path' : 'path_coarse'}, inplace=True)
master_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...,...,...,...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...
1985,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,0.0,5_juice_stereo_0,
1986,/media/data_cifs2/apra/work/labwork/data/exter...,1084,5,milk,stereo,0.0,5_milk_stereo_0,
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...


## Master vs Fine Segmentations

### 499 IDs in Fine Aren't in Master

In [124]:
fine_ids = fine_df.id.to_list()
master_ids = master_df.id.to_list()

In [125]:
len(fine_ids)

1285

In [126]:
fine_ids[:5]

['5_cereals_stereo_1',
 '18_cereals_webcam02',
 '25_cereals_stereo_1',
 '8_cereals_cam01',
 '34_cereals_stereo_1']

In [127]:
not_in_master = [fid for fid in fine_ids if fid not in master_ids]
len(not_in_master)

499

In [128]:
fine_df_cleaner = fine_df[~fine_df.id.isin(not_in_master)]
fine_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
1,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,465
4,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,765
6,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,711
7,/media/data_cifs2/apra/work/labwork/data/exter...,16,cereals,webcam01,,16_cereals_webcam01,548
8,/media/data_cifs2/apra/work/labwork/data/exter...,52,cereals,cam02,,52_cereals_cam02,805
...,...,...,...,...,...,...,...
1274,/media/data_cifs2/apra/work/labwork/data/exter...,14,pancake,webcam01,,14_pancake_webcam01,5930
1278,/media/data_cifs2/apra/work/labwork/data/exter...,5,pancake,cam01,,5_pancake_cam01,5896
1280,/media/data_cifs2/apra/work/labwork/data/exter...,54,pancake,stereo,1.0,54_pancake_stereo_1,3344
1281,/media/data_cifs2/apra/work/labwork/data/exter...,13,pancake,cam01,,13_pancake_cam01,4440


### None have Different Lengths

In [197]:
common_videos = master_df[master_df.id.isin(fine_df_cleaner.id)]
merged_videos_fine_common = common_videos[['id', 'length']].merge(
    fine_df_cleaner[['id', 'length']],
    on='id')
merged_videos_fine_common

Unnamed: 0,id,length_x,length_y
0,39_sandwich_cam01,1524,1524
1,39_sandwich_webcam01,1524,1524
2,39_sandwich_webcam02,1524,1524
3,39_sandwich_stereo_1,1524,1524
4,39_sandwich_cam02,1524,1524
...,...,...,...
616,15_cereals_stereo_1,383,383
617,5_scrambledegg_cam01,2700,2700
618,5_pancake_cam01,5896,5896
619,5_tea_cam01,1665,1665


In [198]:
video_lengths = merged_videos_fine_common.length_x.values
fine_lengths = merged_videos_fine_common.length_y.values

In [199]:
sum(video_lengths != fine_lengths)

0

### Adding to the Master DF

In [137]:
fine_df_cleaner.rename(columns={'path' : 'path_fine'}, inplace=True)
fine_df_cleaner.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,path_fine,patient,action,camera,channel,id,length
1,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,465
4,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,765
6,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,711
7,/media/data_cifs2/apra/work/labwork/data/exter...,16,cereals,webcam01,,16_cereals_webcam01,548
8,/media/data_cifs2/apra/work/labwork/data/exter...,52,cereals,cam02,,52_cereals_cam02,805


In [200]:
master_df = master_df.merge(fine_df_cleaner[['id', 'path_fine']], how='outer', on='id')
master_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse,path_fine
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
...,...,...,...,...,...,...,...,...,...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,
1985,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,0.0,5_juice_stereo_0,,
1986,/media/data_cifs2/apra/work/labwork/data/exter...,1084,5,milk,stereo,0.0,5_milk_stereo_0,,
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...,


## Master vs 64 Dim FVs

### 32 Not in Master

In [144]:
fvs_ids = fvs_df.id.to_list()
len(fvs_ids)

1712

In [145]:
fvs_ids[:5]

['18_cereals_webcam02',
 '54_cereals_cam01',
 '34_cereals_stereo_1',
 '13_cereals_cam01',
 '51_cereals_webcam02']

In [146]:
not_in_master = [fvid for fvid in fvs_ids if fvid not in master_ids]
len(not_in_master)

32

In [171]:
fvs_df_cleaner = fvs_df[~fvs_df.id.isin(not_in_master)]
fvs_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,743
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,761
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,675
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,707
...,...,...,...,...,...,...,...
1707,/media/data_cifs2/apra/work/labwork/data/exter...,36,pancake,webcam02,,36_pancake_webcam02,7049
1708,/media/data_cifs2/apra/work/labwork/data/exter...,54,pancake,stereo,1.0,54_pancake_stereo_1,3340
1709,/media/data_cifs2/apra/work/labwork/data/exter...,13,pancake,cam01,,13_pancake_cam01,4436
1710,/media/data_cifs2/apra/work/labwork/data/exter...,21,pancake,cam01,,21_pancake_cam01,5841


### Every FV has Different Lengths

In [201]:
common_videos = master_df[master_df.id.isin(fvs_df_cleaner.id)]
merged_videos_fvs_common = common_videos[['id', 'length']].merge(
    fvs_df_cleaner[['id', 'length']],
    on='id')
merged_videos_fvs_common

Unnamed: 0,id,length_x,length_y
0,39_salat_cam01,2669,2665
1,39_tea_cam01,1304,1300
2,39_pancake_cam01,8990,8986
3,39_cereals_cam01,699,695
4,39_friedegg_cam01,2890,2886
...,...,...,...
1675,5_cereals_cam01,1186,1182
1676,5_juice_cam01,2795,2791
1677,5_coffee_cam01,1123,1119
1678,5_juice_stereo_1,2795,2791


In [202]:
video_lengths = merged_videos_fvs_common.length_x.values
fv_lengths = merged_videos_fvs_common.length_y.values

In [203]:
sum(video_lengths != fv_lengths)

1680

### Majority Are Exactly 4 Frames Less

In [204]:
sum(video_lengths == fv_lengths + 4)

1476

### 204 Differ By More than 4 Frames

In [205]:
sum(video_lengths != fv_lengths + 4)

204

### Removing the Bad IDs

In [206]:
bad_ids = merged_videos_fvs_common.iloc[np.where(video_lengths != fv_lengths+4)].id.to_list()
bad_ids

['39_salat_webcam02',
 '39_tea_webcam02',
 '39_pancake_stereo_1',
 '39_juice_stereo_1',
 '39_tea_stereo_1',
 '39_tea_cam02',
 '41_coffee_cam01',
 '41_juice_cam01',
 '41_scrambledegg_webcam01',
 '41_coffee_webcam01',
 '41_cereals_webcam01',
 '41_pancake_stereo_1',
 '41_juice_stereo_1',
 '41_sandwich_stereo_1',
 '41_juice_cam02',
 '50_tea_webcam01',
 '50_cereals_webcam01',
 '50_tea_webcam02',
 '50_juice_stereo_1',
 '50_sandwich_cam02',
 '50_tea_cam02',
 '4_scrambledegg_cam01',
 '4_milk_cam01',
 '4_tea_webcam01',
 '4_milk_webcam01',
 '4_friedegg_webcam01',
 '4_cereals_webcam01',
 '4_milk_webcam02',
 '4_scrambledegg_stereo_1',
 '4_sandwich_stereo_1',
 '8_juice_cam01',
 '8_juice_webcam01',
 '8_cereals_webcam01',
 '53_friedegg_webcam01',
 '53_milk_webcam01',
 '53_milk_webcam02',
 '53_salat_stereo_1',
 '53_coffee_cam02',
 '53_scrambledegg_cam02',
 '53_pancake_cam02',
 '53_milk_cam02',
 '43_scrambledegg_cam01',
 '43_juice_webcam01',
 '43_milk_stereo_1',
 '43_juice_stereo_1',
 '43_pancake_stere

In [207]:
merged_videos_fvs_common[merged_videos_fvs_common.id.isin(bad_ids)]

Unnamed: 0,id,length_x,length_y
14,39_salat_webcam02,2669,2666
15,39_tea_webcam02,1304,1287
24,39_pancake_stereo_1,8990,8984
25,39_juice_stereo_1,1811,1808
31,39_tea_stereo_1,1304,1290
...,...,...,...
1635,23_cereals_stereo_1,1042,1039
1637,23_juice_cam02,1711,1708
1638,23_cereals_cam02,1042,1039
1640,23_sandwich_cam02,1487,1484


In [208]:
fvs_df_cleaner = fvs_df_cleaner[~fvs_df_cleaner.id.isin(bad_ids)]
fvs_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,743
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,761
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,675
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,707
...,...,...,...,...,...,...,...
1707,/media/data_cifs2/apra/work/labwork/data/exter...,36,pancake,webcam02,,36_pancake_webcam02,7049
1708,/media/data_cifs2/apra/work/labwork/data/exter...,54,pancake,stereo,1.0,54_pancake_stereo_1,3340
1709,/media/data_cifs2/apra/work/labwork/data/exter...,13,pancake,cam01,,13_pancake_cam01,4436
1710,/media/data_cifs2/apra/work/labwork/data/exter...,21,pancake,cam01,,21_pancake_cam01,5841


### Adding to the Master DF

In [209]:
fvs_df_cleaner.rename(columns={'path' : 'path_64dim_fv'}, inplace=True)
fvs_df_cleaner.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,path_64dim_fv,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
1,/media/data_cifs2/apra/work/labwork/data/exter...,54,cereals,cam01,,54_cereals_cam01,743
2,/media/data_cifs2/apra/work/labwork/data/exter...,34,cereals,stereo,1.0,34_cereals_stereo_1,761
3,/media/data_cifs2/apra/work/labwork/data/exter...,13,cereals,cam01,,13_cereals_cam01,675
4,/media/data_cifs2/apra/work/labwork/data/exter...,51,cereals,webcam02,,51_cereals_webcam02,707


In [210]:
master_df = master_df.merge(fvs_df_cleaner[['id', 'path_64dim_fv']], how='outer', on='id')
master_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse,path_fine,path_64dim_fv
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...,...,...,...,...,...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...
1985,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,0.0,5_juice_stereo_0,,,
1986,/media/data_cifs2/apra/work/labwork/data/exter...,1084,5,milk,stereo,0.0,5_milk_stereo_0,,,
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...


## Master vs I3D FVs

### 32 Not in Master

In [218]:
i3d_ids = i3d_df.id.to_list()
master_ids = master_df.id.to_list()
len(i3d_ids)

1712

In [219]:
i3d_ids[:5]

['23_milk_webcam01',
 '14_tea_webcam01',
 '18_cereals_webcam02',
 '49_friedegg_webcam01',
 '17_tea_webcam02']

In [220]:
not_in_master = [iid for iid in i3d_ids if iid not in master_ids]
len(not_in_master)

32

In [221]:
i3d_df_cleaner = i3d_df[~i3d_df.id.isin(not_in_master)]
i3d_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,,23_milk_webcam01,1246
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,,14_tea_webcam01,397
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,,49_friedegg_webcam01,5174
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,,17_tea_webcam02,703
...,...,...,...,...,...,...,...
1707,/media/data_cifs2/apra/work/labwork/data/exter...,30,milk,cam02,,30_milk_cam02,634
1708,/media/data_cifs2/apra/work/labwork/data/exter...,22,scrambledegg,webcam02,,22_scrambledegg_webcam02,2291
1709,/media/data_cifs2/apra/work/labwork/data/exter...,15,coffee,webcam01,,15_coffee_webcam01,693
1710,/media/data_cifs2/apra/work/labwork/data/exter...,25,pancake,cam02,,25_pancake_cam02,5220


In [224]:
common_videos = master_df[master_df.id.isin(i3d_df_cleaner.id)]
merged_videos_i3d_common = common_videos[['id', 'length']].merge(
    i3d_df_cleaner[['id', 'length']],
    on='id')
merged_videos_i3d_common

Unnamed: 0,id,length_x,length_y
0,39_salat_cam01,2669,2665
1,39_tea_cam01,1304,1300
2,39_pancake_cam01,8990,8986
3,39_cereals_cam01,699,695
4,39_friedegg_cam01,2890,2886
...,...,...,...
1675,5_cereals_cam01,1186,1182
1676,5_juice_cam01,2795,2791
1677,5_coffee_cam01,1123,1119
1678,5_juice_stereo_1,2795,2791


### Every FV has Different Lengths

In [225]:
video_lengths = merged_videos_i3d_common.length_x.values
i3d_lengths = merged_videos_i3d_common.length_y.values

In [231]:
sum(video_lengths != i3d_lengths)

1680

### Majority Are Exactly 4 Frames Less

In [232]:
sum(video_lengths == i3d_lengths + 4)

1576

### 104 Differ By More than 4 Frames

In [233]:
sum(video_lengths != i3d_lengths + 4)

104

### Removing the Bad IDs

In [229]:
bad_ids = merged_videos_i3d_common.iloc[np.where(video_lengths != i3d_lengths+4)].id.to_list()
bad_ids

['39_salat_webcam02',
 '39_juice_stereo_1',
 '41_coffee_cam01',
 '41_juice_cam01',
 '41_scrambledegg_webcam01',
 '41_coffee_webcam01',
 '41_cereals_webcam01',
 '41_juice_stereo_1',
 '41_sandwich_stereo_1',
 '41_juice_cam02',
 '50_cereals_webcam01',
 '50_juice_stereo_1',
 '4_sandwich_stereo_1',
 '8_coffee_webcam01',
 '53_friedegg_webcam01',
 '53_salat_stereo_1',
 '53_coffee_cam02',
 '53_scrambledegg_cam02',
 '53_pancake_cam02',
 '43_scrambledegg_cam01',
 '43_juice_webcam01',
 '43_juice_stereo_1',
 '43_juice_cam02',
 '24_juice_cam01',
 '24_milk_cam02',
 '47_coffee_cam01',
 '47_tea_webcam01',
 '47_friedegg_webcam01',
 '47_cereals_webcam02',
 '47_friedegg_cam02',
 '26_cereals_cam01',
 '26_coffee_webcam02',
 '26_cereals_webcam02',
 '3_milk_cam01',
 '3_sandwich_webcam01',
 '3_cereals_webcam02',
 '19_salat_cam01',
 '19_cereals_webcam01',
 '25_cereals_cam01',
 '25_friedegg_cam01',
 '20_juice_webcam02',
 '20_juice_stereo_1',
 '13_cereals_webcam01',
 '13_sandwich_webcam01',
 '54_pancake_cam01',


In [230]:
merged_videos_i3d_common[merged_videos_i3d_common.id.isin(bad_ids)]

Unnamed: 0,id,length_x,length_y
14,39_salat_webcam02,2669,2666
25,39_juice_stereo_1,1811,1808
45,41_coffee_cam01,876,873
46,41_juice_cam01,1582,1579
54,41_scrambledegg_webcam01,3022,3019
...,...,...,...
1635,23_cereals_stereo_1,1042,1039
1637,23_juice_cam02,1711,1708
1638,23_cereals_cam02,1042,1039
1640,23_sandwich_cam02,1487,1484


In [234]:
i3d_df_cleaner = i3d_df_cleaner[~i3d_df_cleaner.id.isin(bad_ids)]
i3d_df_cleaner

Unnamed: 0,path,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,,23_milk_webcam01,1246
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,,14_tea_webcam01,397
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,,49_friedegg_webcam01,5174
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,,17_tea_webcam02,703
...,...,...,...,...,...,...,...
1705,/media/data_cifs2/apra/work/labwork/data/exter...,8,juice,webcam01,,8_juice_webcam01,1248
1706,/media/data_cifs2/apra/work/labwork/data/exter...,36,salat,webcam01,,36_salat_webcam01,5292
1708,/media/data_cifs2/apra/work/labwork/data/exter...,22,scrambledegg,webcam02,,22_scrambledegg_webcam02,2291
1709,/media/data_cifs2/apra/work/labwork/data/exter...,15,coffee,webcam01,,15_coffee_webcam01,693


### Adding to the Master DF

In [235]:
i3d_df_cleaner.rename(columns={'path' : 'path_i3d'}, inplace=True)
i3d_df_cleaner.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,path_i3d,patient,action,camera,channel,id,length
0,/media/data_cifs2/apra/work/labwork/data/exter...,23,milk,webcam01,,23_milk_webcam01,1246
1,/media/data_cifs2/apra/work/labwork/data/exter...,14,tea,webcam01,,14_tea_webcam01,397
2,/media/data_cifs2/apra/work/labwork/data/exter...,18,cereals,webcam02,,18_cereals_webcam02,461
3,/media/data_cifs2/apra/work/labwork/data/exter...,49,friedegg,webcam01,,49_friedegg_webcam01,5174
4,/media/data_cifs2/apra/work/labwork/data/exter...,17,tea,webcam02,,17_tea_webcam02,703


In [236]:
master_df = master_df.merge(i3d_df_cleaner[['id', 'path_i3d']], how='outer', on='id')
master_df

Unnamed: 0,path_videos,length,patient,action,camera,channel,id,path_coarse,path_fine,path_64dim_fv,path_i3d
0,/media/data_cifs2/apra/work/labwork/data/exter...,2669,39,salat,cam01,,39_salat_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1,/media/data_cifs2/apra/work/labwork/data/exter...,1304,39,tea,cam01,,39_tea_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
2,/media/data_cifs2/apra/work/labwork/data/exter...,8990,39,pancake,cam01,,39_pancake_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
3,/media/data_cifs2/apra/work/labwork/data/exter...,699,39,cereals,cam01,,39_cereals_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
4,/media/data_cifs2/apra/work/labwork/data/exter...,2890,39,friedegg,cam01,,39_friedegg_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
...,...,...,...,...,...,...,...,...,...,...,...
1984,/media/data_cifs2/apra/work/labwork/data/exter...,1123,5,coffee,cam01,,5_coffee_cam01,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...
1985,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,0.0,5_juice_stereo_0,,,,
1986,/media/data_cifs2/apra/work/labwork/data/exter...,1084,5,milk,stereo,0.0,5_milk_stereo_0,,,,
1987,/media/data_cifs2/apra/work/labwork/data/exter...,2795,5,juice,stereo,1.0,5_juice_stereo_1,/media/data_cifs2/apra/work/labwork/data/exter...,,/media/data_cifs2/apra/work/labwork/data/exter...,/media/data_cifs2/apra/work/labwork/data/exter...


## Saving Master DF

In [240]:
master_df.to_csv(str(DIR_RAW_BREAKFAST / 'master_meta.csv'), index=False)