Use this notebook to split a video into separate frames and annotate the hand bounding boxes.

In [1]:
import vctoolkit as vc
import os
import predict_images_wo_annot as original_wrapper

In [2]:
# You need to set the following variables manually.
# input video
video_path = 'test/heart.mov'
# a folder that stores the video frames
frames_dir = 'test/frames'
# path to save the bounding box results
bbox_save_path = 'test/bbox.pkl'

Step 1: turn video into frames. You can skip this if you already have a folder of frames.

In [3]:
os.makedirs(frames_dir, exist_ok=True)
video = vc.VideoReader(video_path)
print(f'{video.n_frames} frames to process...')
for frame_idx in vc.progress_bar(range(video.n_frames), 'video frames'):
  frame = video.next_frame()
  if frame is None:
    break
  vc.save(os.path.join(frames_dir, f'{frame_idx:06d}.jpg'), frame)

video frames:   0%|          | 0/382 [00:00<?, ?it/s]

382 frames to process...


video frames: 100%|#########9| 381/382 [00:11<00:00, 33.48it/s]

Step 2: get file list to be processed.

In [4]:
file_list = []
for file_name in os.listdir(frames_dir):
    if os.path.splitext(file_name)[1][1:].lower() in ['jpg', 'png', 'jpeg']:
        file_list.append(os.path.join(frames_dir, file_name))
file_list = sorted(file_list)

Step 3: get bounding boxes of each frame.

In [5]:
detections = original_wrapper.main_wrapper(file_list)
vc.save(bbox_save_path, detections)


  0%|          | 0/381 [00:00<?, ?it/s][A
  0%|          | 1/381 [00:00<02:52,  2.21it/s][A
  1%|          | 3/381 [00:00<02:08,  2.93it/s][A
  1%|1         | 4/381 [00:00<01:41,  3.70it/s][A
  1%|1         | 5/381 [00:00<01:31,  4.11it/s][A
  2%|1         | 6/381 [00:01<01:17,  4.84it/s][A
  2%|2         | 8/381 [00:01<01:05,  5.69it/s][A
  3%|2         | 10/381 [00:01<00:56,  6.58it/s][A
  3%|2         | 11/381 [00:01<00:50,  7.28it/s][A
  3%|3         | 13/381 [00:01<00:46,  7.98it/s][A
  4%|3         | 14/381 [00:01<00:51,  7.09it/s][A
  4%|3         | 15/381 [00:02<00:51,  7.17it/s][A
  4%|4         | 17/381 [00:02<00:41,  8.70it/s][A
  5%|4         | 19/381 [00:02<00:39,  9.12it/s][A
  6%|5         | 21/381 [00:02<00:37,  9.60it/s][A
  6%|6         | 23/381 [00:02<00:41,  8.58it/s][A
  7%|6         | 25/381 [00:03<00:39,  8.93it/s][A
  7%|6         | 26/381 [00:03<00:39,  9.06it/s][A
  7%|7         | 28/381 [00:03<00:37,  9.39it/s][A
  8%|7         | 30/381 [0

The result format is like:
```
detect_results = {
    'path_to_frame_1': {
        'image_shape': (height, width, 3),
        'type': 'hand',
        'instances': [{'bbox': (x1, y1, x2, y2, conf}]
    },
    ...
}
```
Only the instance with the largest confidence will be kept, i.e. `len(instances) == 1`.