Copyright (c) MONAI Consortium  
Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  
&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  
Unless required by applicable law or agreed to in writing, software  
distributed under the License is distributed on an "AS IS" BASIS,  
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the License for the specific language governing permissions and  
limitations under the License.

## Scene detection and fold split

This tutorial shows how to detect different scenes and then split the fols to make sure each fold contain similar number of vides from similar number of scenes.

## Setup environment

In [1]:
!python -c "import imagehash" || pip -q install imagehash
!python -c "import iterstrat" || pip install -q iterative-stratification
!python -c "import monai" || pip install -q "monai-weekly[pillow]"
!python -c "import matplotlib" || pip install -q matplotlib

## Setup imports

In [None]:
from tqdm import tqdm
import pandas as pd
import numpy as np
import os
from glob import glob

%matplotlib inline

from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
from PIL import Image
import imagehash
import multiprocessing
from monai.config import print_config

print_config()

## Load data

In [2]:
# replace the dir into your local dir

df = pd.read_csv("/raid/surg/_release/training_data/labels.csv")[["clip_name", "tools_present"]]
img_dir = "/raid/surg/image640_blur/"
cpu_ct = 32


def split_label(s):
    return [x.strip(" ") for x in s[1:-1].split(",")]


label_lst = df.tools_present.apply(split_label).values.tolist()
label_lst = [x for xs in label_lst for x in xs]

labels = pd.Series(label_lst).value_counts().index.values[1:]
for lb in labels:
    df[lb] = df.tools_present.str.count(lb)

labels = df.columns.values[2:]
# uncomment the following line to produce the train.csv file
# df.to_csv('../train.csv', index=False)
df.shape

(24695, 16)

In [3]:
df_img = pd.DataFrame([os.path.basename(x) for x in sorted(glob(os.path.join(img_dir, "*.jpg")))], columns=["img_path"])
df_img["clip_name"] = df_img.img_path.apply(lambda x: x[:11])

df = df.merge(df_img, on="clip_name", how="left")
df = df[pd.notna(df.img_path)]

df["frame"] = df.img_path.apply(lambda x: int(x[:-4].split("_")[-1]))
df = df.sort_values(["clip_name", "frame"]).reset_index(drop=True)

df.shape

(765000, 18)

## Scene detection

In this dataset, a number of (sometimes up dozens of) consecutive videos are from the same operation, or scene. Therefore, it is important to identify them and put videos from the same scene into the same fold when making fold splits, in order to prevent leakage in local validation.

The way we detect scenes is to compare the image hashes of the last frame of a video against the first frame of the next video. If the similarity is above a threshold, they belong to the same scene. Otherwise, the next video is the start of the next scene

In [4]:
df["last"] = df.frame.diff(-1)
dfl = df[(df.frame == 0) | (df["last"] > 0)].iloc[:-1]
dfl = dfl.reset_index(drop=True)
dfl.shape

(49370, 19)

In [5]:
funcs = [
    imagehash.average_hash,
    imagehash.phash,
    imagehash.dhash,
    imagehash.whash,
]


def get_hash(img_path):
    image = Image.open(f"{img_dir}/{img_path}")
    return np.array([f(image).hash for f in funcs]).reshape(256)

In [6]:
with multiprocessing.Pool(cpu_ct) as pool:
    imap = pool.imap(get_hash, dfl.img_path.values)
    hashes = list(tqdm(imap, total=len(dfl)))

100%|██████████| 49370/49370 [01:28<00:00, 559.23it/s]


In [7]:
hashes = np.stack(hashes)[:-1, :]
hashes.shape

(49369, 256)

In [8]:
hash_diffs = (hashes[1::2, :] == hashes[2::2, :]).sum(1)
hash_diffs.shape

(24684,)

In [9]:
dfl["hash_sim"] = 0
dfl.loc[1 : len(dfl) - 2 : 2, "hash_sim"] = hash_diffs

Through visual inspections, **170** is a good cutoff hash similary difference.

When the hash similarity is larger than or equal to 170, the two consecutive videos belong to the same scene.

When the hash similarity is smaller than 170, the second video starts a new scene.

In [10]:
tmp = dfl[(dfl.frame != 0) & (dfl.hash_sim < 170)][["clip_name", "img_path", "hash_sim"]].copy()
tmp["EOS"] = True
tmp.shape

(1068, 4)

In [11]:
df_scene = (
    dfl[dfl.frame == 0]
    .drop(columns=["frame", "last", "img_path", "hash_sim"])
    .merge(tmp[["clip_name", "EOS"]], on="clip_name", how="left")
)
df_scene["EOS"] = df_scene["EOS"].fillna(0).astype(int)
df_scene["SOS"] = df_scene["EOS"].shift(1).fillna(1).astype(int)
df_scene["scene"] = df_scene["SOS"].cumsum() - 1
df_scene.shape

(24685, 19)

The number of scenes that each tool appears in:

In [12]:
for lb in labels:
    print(f"{lb:30}  {df_scene[df_scene[lb]>0].scene.nunique()}")

needle driver                   572
cadiere forceps                 665
bipolar forceps                 597
monopolar curved scissors       560
grasping retractor              223
prograsp forceps                240
force bipolar                   64
vessel sealer                   93
permanent cautery hook/spatula  56
clip applier                    168
tip-up fenestrated grasper      14
stapler                         21
bipolar dissector               1
suction irrigator               4


## Splitting folds based on scene number using iterative stratification

In [19]:
df = pd.read_csv("../train.csv")
labels = df.columns.values[2:]

df = df.merge(df_scene[["clip_name", "scene"]], how="left", on="clip_name")
df["scene"] = df["scene"].fillna(df_scene.scene.nunique()).astype(int)
df.shape

(24695, 17)

There are 1069 unique scenes

In [20]:
df.scene.nunique()

1069

In [21]:
tmp = df.groupby("scene")[labels].max().clip(0, 1).reset_index()

X = tmp[labels].values
y = tmp[labels].values
tmp["fold"] = -1

mskf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=1)

for i, (_, test_index) in enumerate(mskf.split(X, y)):
    tmp.loc[test_index, "fold"] = i

Numbers of scenes in each fold are evenly distributed for all the tools except `bipolar dissector` (all the videos containing `bipolar dissector` are in the same scene):

In [None]:
tmp.groupby("fold")[labels].sum()

Numbers of videos in each fold are also nearly evenly distributed except for `bipolar dissector` (random_state can be adjusted in the `MultilabelStratifiedKFold` call above for a different distribution):

In [None]:
df = df.merge(tmp[["scene", "fold"]], on="scene", how="left")
df.groupby("fold")[labels].sum()

In [24]:
# df.to_csv('../train_fold_balanced.csv', index=False)