The following code is based on the tutorial provided in https://colab.research.google.com/github/microsoft/CameraTraps/blob/master/detection/megadetector_colab.ipynb.

## Set up the Colab instance to run on GPU processing


Navigate to Edit→Notebook Settings and select "GPU" from the Hardware Accelerator drop-down 

## Copy the model, install dependencies, set PYTHONPATH

Note: from here on you'll start seeing a mix of code. Most are Linux system commands, rather than Python. The system commands are prefixed by a shebang `!`, which tells this notebook to execute them on the command line.

### Install TensorFlow v1

TensorFlow is already installed in Colab, but our scripts are not yet compatible with the newer version of TensorFlow. 

Please follow the next three steps in sequence and do not skip any steps :) If you were not able to follow these, you can reset the runtime by going to "Runtime" in the top menu and "Factory reset runtime".


1. Uninstall the existing version of TensorFlow (this doesn't affect your other Colabs, don't worry)


In [None]:
pip uninstall tensorflow

Uninstalling tensorflow-2.5.0:
  Would remove:
    /usr/local/bin/estimator_ckpt_converter
    /usr/local/bin/import_pb_to_tensorboard
    /usr/local/bin/saved_model_cli
    /usr/local/bin/tensorboard
    /usr/local/bin/tf_upgrade_v2
    /usr/local/bin/tflite_convert
    /usr/local/bin/toco
    /usr/local/bin/toco_from_protos
    /usr/local/lib/python3.7/dist-packages/tensorflow-2.5.0.dist-info/*
    /usr/local/lib/python3.7/dist-packages/tensorflow/*
Proceed (y/n)? y
y
y
  Successfully uninstalled tensorflow-2.5.0


2. Install the older TensorFlow version using `pip`, with GPU processing by specifying `-gpu` and version number `1.13.1`. We also install the other required Python packages that are not already in Colab - `humanfriendly` and `jsonpickle`.

In [None]:
pip install tensorflow-gpu==1.13.1 humanfriendly jsonpickle

Collecting tensorflow-gpu==1.13.1
[?25l  Downloading https://files.pythonhosted.org/packages/2c/65/8dc8fc4a263a24f7ad935b72ad35e72ba381cb9e175b6a5fe086c85f17a7/tensorflow_gpu-1.13.1-cp37-cp37m-manylinux1_x86_64.whl (345.0MB)
[K     |████████████████████████████████| 345.0MB 39kB/s 
[?25hCollecting humanfriendly
[?25l  Downloading https://files.pythonhosted.org/packages/93/66/363d01a81da2108a5cf446daf619779f06d49a0c4426dd02b40734f10e2f/humanfriendly-9.1-py2.py3-none-any.whl (86kB)
[K     |████████████████████████████████| 92kB 12.3MB/s 
[?25hCollecting jsonpickle
  Downloading https://files.pythonhosted.org/packages/bb/1a/f2db026d4d682303793559f1c2bb425ba3ec0d6fd7ac63397790443f2461/jsonpickle-2.0.0-py2.py3-none-any.whl
Collecting tensorboard<1.14.0,>=1.13.0
[?25l  Downloading https://files.pythonhosted.org/packages/0f/39/bdd75b08a6fba41f098b6cb091b9e8c7a80e1b4d679a581a0ccd17b10373/tensorboard-1.13.1-py3-none-any.whl (3.2MB)
[K     |████████████████████████████████| 3.2MB 12.1MB/

3. Importantly, you now need to **re-start the runtime** of this Colab for it to start using the older version TensorFlow that we just installed.

Click on the "Runtime" option on the top menu, then "Restart runtime". After that, you can proceed with the rest of this notebook.

Let's check that we have the right version of TensorFlow (1.13.1):

In [None]:
import tensorflow as tf
print(tf.__version__)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


1.13.1


### Download the MegaDetector model file

Currently, v4.1 is avaialble by direct download. The link can be found in the GitHub MegaDetector readme: MegaDetector v4.1, 2020.04.27 frozen model (.pb)

In [None]:
!wget -O /content/megadetector_v4_1_0.pb https://lilablobssc.blob.core.windows.net/models/camera_traps/megadetector/md_v4.1.0/md_v4.1.0.pb

--2021-05-30 05:59:04--  https://lilablobssc.blob.core.windows.net/models/camera_traps/megadetector/md_v4.1.0/md_v4.1.0.pb
Resolving lilablobssc.blob.core.windows.net (lilablobssc.blob.core.windows.net)... 52.239.159.84
Connecting to lilablobssc.blob.core.windows.net (lilablobssc.blob.core.windows.net)|52.239.159.84|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 245590501 (234M) [application/octet-stream]
Saving to: ‘/content/megadetector_v4_1_0.pb’


2021-05-30 05:59:21 (14.6 MB/s) - ‘/content/megadetector_v4_1_0.pb’ saved [245590501/245590501]



### Clone the two required Microsoft git repos
This will copy the latest version of the Microsoft AI for Earth "utilities" and "Camera Traps" repositories from GitHub. These make data handling and running the model easy. 

In [None]:
!git clone https://github.com/microsoft/CameraTraps
!git clone https://github.com/microsoft/ai4eutils

Cloning into 'CameraTraps'...
remote: Enumerating objects: 11639, done.[K
remote: Counting objects: 100% (615/615), done.[K
remote: Compressing objects: 100% (436/436), done.[K
remote: Total 11639 (delta 360), reused 329 (delta 177), pack-reused 11024[K
Receiving objects: 100% (11639/11639), 120.67 MiB | 23.46 MiB/s, done.
Resolving deltas: 100% (6848/6848), done.
Cloning into 'ai4eutils'...
remote: Enumerating objects: 659, done.[K
remote: Counting objects: 100% (240/240), done.[K
remote: Compressing objects: 100% (181/181), done.[K
remote: Total 659 (delta 143), reused 118 (delta 58), pack-reused 419[K
Receiving objects: 100% (659/659), 1.34 MiB | 20.47 MiB/s, done.
Resolving deltas: 100% (374/374), done.


We'll also copy the Python scripts that run the model and produce visualization of results to the working directory.

In [None]:
!cp /content/CameraTraps/detection/run_tf_detector_batch.py .
!cp /content/CameraTraps/visualization/visualize_detector_output.py .

### Set `PYTHONPATH` to include `CameraTraps` and `ai4eutils`

Add cloned git folders to the `PYTHONPATH` environment variable so that we can import their modules from any working directory.


In [None]:
import os
os.environ['PYTHONPATH'] += ":/content/ai4eutils"
os.environ['PYTHONPATH'] += ":/content/CameraTraps"

!echo "PYTHONPATH: $PYTHONPATH"

PYTHONPATH: /env/python:/content/ai4eutils:/content/CameraTraps


## Mount Google Drive in Colab


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## MegaDetector batch processing

This step executes the Python script `run_tf_detector_batch.py` that we copied from the CameraTraps repo. It has three mandatory arguments and one optional:

1.   path to the MegaDetector saved model file.
2.   a folder containing images. If your images were already on Google Drive, replace `[Image_Folder]` with your folder name from Google Drive. If you are using the sample images from Snapshot Serengeti, change `images_dir` to `'/content/snapshotserengeti'`.
3.   the output JSON file location and name - replace `[Output_Folder]` with your folder name and `[output_file_name.json]` with your file name.
4.   option `--recursive` goes through all subfolders to find and process all images within.

You will need to change the image folder path and output file path, depending on your situation.

In our experience the Colab system will take ~30 seconds to intialize and load the saved MegaDetector model. It will then iterate through all of the images in the folder specified. Processing initially takes a few seconds per image and usually settles to ~1 sec per image. That is ~60 images per minute or ~3600 images per hour. Limit the number of images in your folder so that all of the processing can be completed before the Colab session ends.

If you see the error "AssertionError: output_file specified needs to end with .json" then you haven't update the output folder and file name in the line of code below properly.

In [None]:
# Creating arrays of all the training and testing image folders
folders_train = []
folders_valid = []
labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
train_cam_side = ['Right_CAM', 'Front_CAM', 'Left_CAM']
for side in train_cam_side:
  for label in labels:
    folders_train.append(side + '/' + label)
val_cam_side = ['Below_CAM']
for side in val_cam_side:
  for label in labels:
    folders_valid.append(side + '/' + label)

In [None]:
#Creating box data for training files
for folder in folders_train:
  images_dir = 'drive/MyDrive/github/HandGestureRecognition/HGM_data/' + folder
  # choose a location for the output JSON file
  output_file_path = 'drive/MyDrive/github/HandGestureRecognition/annotations_file/' + folder.replace('/','_') + '.json'
  !python run_tf_detector_batch.py megadetector_v4_1_0.pb "$images_dir" "$output_file_path" --recursive

Here we pass the Python variable value `output_file_path` you specified above to the bash commands below using `$` (double quoting as there are spaces in this path), to run the script. This is so that we can refer to the output file path later for visualization.

## Generate cropped images using the Megadetector box output


In [None]:
import glob 
import json
import pickle
import os
import cv2
import numpy as np 
import pandas as pd 
from PIL import Image, ImageFile, ImageFont, ImageDraw

In [None]:
threshould = 0.7

In [None]:
def get_crop_area(bbox, image_size):
    x1, y1,w_box, h_box = bbox
    ymin,xmin,ymax, xmax = y1, x1, y1 + h_box, x1 + w_box
    area = (xmin * image_size[0], ymin * image_size[1], 
            xmax * image_size[0], ymax * image_size[1])
    return area

In [None]:
def save_image(img, crop_folder, img_id):
  img.save(crop_folder + img_id, format="jpeg")

## Method 1: Split train and valid dataset based on camera angle

In [None]:
def converet_images(annotation, crop_folder, images_dir):
      
    size = (256,256)
    img_id = annotation["file"].replace(images_dir, '')
    
    try:
        detections = annotation["detections"]
    except:
        print(f"Passed {img_id}. There are no detection data.")
        return
    
    path_for_train = annotation["file"]
    
    if os.path.exists(path_for_train):
        file_path = path_for_train
    else:
        print(f"Passed {img_id}. There are no data.")
        return
  
    try:      
        img = Image.open(file_path)
    except:
        print(f"Passed {img_id}. Fail to open image.")
        print(f"pass {file_path}.")
        return
    
    for i, detection in enumerate(detections, 1):
        
        if detection["conf"] < threshould:
            continue
            
        crop_area = get_crop_area(detection["bbox"], img.size)
        img_cropped = img.crop(crop_area).resize(size)
        save_image(img_cropped, crop_folder, img_id)

In [None]:
CROPPED_TRAIN_PATH = "drive/MyDrive/github/HandGestureRecognition/cropped_train/"

for folder in folders_train:
  images_dir = 'drive/MyDrive/github/HandGestureRecognition/HGM_data/' + folder
  # location for the output JSON file
  output_file_path = 'drive/MyDrive/github/HandGestureRecognition/annotations_file/' + folder.replace('/','_') + '.json'

  crop_folder = CROPPED_TRAIN_PATH + folder
  if not os.path.exists(crop_folder):
    os.makedirs(crop_folder)

  with open(output_file_path, encoding='utf-8') as json_file:
    megadetector_results = json.load(json_file)

  annotations = megadetector_results["images"]

  for annotation in annotations:
    converet_images(annotation, crop_folder, images_dir)

In [None]:
#Creating box data for training files
for folder in folders_valid:
  images_dir = 'drive/MyDrive/github/HandGestureRecognition/HGM_data/' + folder
  # choose a location for the output JSON file
  output_file_path = 'drive/MyDrive/github/HandGestureRecognition/annotations_file/' + folder.replace('/','_') + '.json'
  !python run_tf_detector_batch.py megadetector_v4_1_0.pb "$images_dir" "$output_file_path" --recursive

In [None]:
CROPPED_VALID_PATH = "drive/MyDrive/github/HandGestureRecognition/cropped_valid/"

for folder in folders_valid:
  images_dir = 'drive/MyDrive/github/HandGestureRecognition/HGM_data/' + folder
  # location for the output JSON file
  output_file_path = 'drive/MyDrive/github/HandGestureRecognition/annotations_file/' + folder.replace('/','_') + '.json'

  crop_folder = CROPPED_VALID_PATH + folder
  
  if not os.path.exists(crop_folder):
    os.makedirs(crop_folder)

  with open(output_file_path, encoding='utf-8') as json_file:
    megadetector_results = json.load(json_file)

  annotations = megadetector_results["images"]

  for annotation in annotations:
    converet_images(annotation, crop_folder, images_dir)

In [None]:
# Creating labels file for training data
folder_path = 'drive/MyDrive/github/HandGestureRecognition/'
train_folder = 'cropped_train/'

files = []

for image_folder in folders_train:
    for file in os.listdir(folder_path+train_folder+image_folder):
        label = image_folder.split('/')[1]
        
        files.append([train_folder + image_folder + '/' + file, label])
print(files)

df = pd.DataFrame(files, columns=['files', 'target']).to_csv(folder_path+train_folder+'labels.csv')

[['cropped_train/Right_CAM/A/P5_008.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_007.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_006.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_005.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_004.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_003.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_002.jpg', 'A'], ['cropped_train/Right_CAM/A/P5_001.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_008.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_007.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_006.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_005.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_004.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_003.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_002.jpg', 'A'], ['cropped_train/Right_CAM/A/P4_001.jpg', 'A'], ['cropped_train/Right_CAM/A/P3_008.jpg', 'A'], ['cropped_train/Right_CAM/A/P3_007.jpg', 'A'], ['cropped_train/Right_CAM/A/P3_006.jpg', 'A'], ['cropped_train/Right_CAM/A/P3_005.jpg', 'A'], ['cropped_train/Right_CAM/A/P3_004.jpg', 'A'], ['cropped_tr

In [None]:
# Creating labels file for validation data
folder_path = 'drive/MyDrive/github/HandGestureRecognition/'
valid_folder = 'cropped_valid/'

files = []

for image_folder in folders_valid:
    for file in os.listdir(folder_path+valid_folder+image_folder):
        label = image_folder.split('/')[1]
        
        files.append([valid_folder + image_folder + '/' + file, label])
print(files[:3])

df = pd.DataFrame(files, columns=['files', 'target']).to_csv(folder_path+valid_folder+'labels.csv')

[['cropped_valid/Below_CAM/A/P1_001.jpg', 'A'], ['cropped_valid/Below_CAM/A/P1_002.jpg', 'A'], ['cropped_valid/Below_CAM/A/P5_001.jpg', 'A']]


##Method 2: Split train and valid dataset based on the person the image belongs to

In [None]:
folders = []
labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
cam_side = ['Right_CAM', 'Front_CAM', 'Left_CAM', 'Below_CAM']
for side in cam_side:
  for label in labels:
    folders.append(side + '/' + label)

In [None]:
def save_image(img, crop_folder, img_id):
  img.save(crop_folder + img_id, format="jpeg")

def convert_images(annotation, CROPPED_VALID_PATH, CROPPED_TRAIN_PATH, images_dir):
      
  size = (256,256)
  img_id = annotation["file"].replace(images_dir, '').replace('/','_')
  #print(img_id)

  if 'P5' in img_id:
    crop_folder = CROPPED_VALID_PATH
  else:
    crop_folder = CROPPED_TRAIN_PATH

  try:
      detections = annotation["detections"]
  except:
      print(f"Passed {img_id}. There are no detection data.")
      return

  path_image = annotation["file"]

  if os.path.exists(path_image):
      file_path = path_image
  else:
      print(f"Passed {img_id}. There are no data.")
      return

  try:      
      img = Image.open(file_path)
  except:
      print(f"Passed {img_id}. Fail to open image.")
      print(f"pass {file_path}.")
      return

  for i, detection in enumerate(detections, 1):
      
      if detection["conf"] < threshould:
          continue
          
      crop_area = get_crop_area(detection["bbox"], img.size)
      img_cropped = img.crop(crop_area).resize(size)
      save_image(img_cropped, crop_folder, img_id)

In [None]:
CROPPED_TRAIN_PATH = "drive/MyDrive/github/HandGestureRecognition/cropped_train/"
CROPPED_VALID_PATH = "drive/MyDrive/github/HandGestureRecognition/cropped_valid/"

images_dir = 'drive/MyDrive/Colab_Notebooks/HandGestureRecognition/HGM_data/'

for folder in folders:
  print(folder)
  # location for the output JSON file
  output_file_path = 'drive/MyDrive/github/HandGestureRecognition/annotations_file/' + folder.replace('/','_') + '.json'

  with open(output_file_path, encoding='utf-8') as json_file:
    megadetector_results = json.load(json_file)

  annotations = megadetector_results["images"]

  for annotation in annotations:
    convert_images(annotation, CROPPED_VALID_PATH, CROPPED_TRAIN_PATH, images_dir)

Right_CAM/A
Right_CAM/B
Right_CAM/C
Right_CAM/D
Right_CAM/E
Right_CAM/F
Right_CAM/G
Right_CAM/H
Right_CAM/I
Right_CAM/J
Right_CAM/K
Right_CAM/L
Right_CAM/M
Right_CAM/N
Right_CAM/O
Right_CAM/P
Right_CAM/Q
Right_CAM/R
Right_CAM/S
Right_CAM/T
Right_CAM/U
Right_CAM/V
Right_CAM/W
Right_CAM/X
Right_CAM/Y
Right_CAM/Z
Front_CAM/A
Front_CAM/B
Front_CAM/C
Front_CAM/D
Front_CAM/E
Front_CAM/F
Front_CAM/G
Front_CAM/H
Front_CAM/I
Front_CAM/J
Front_CAM/K
Front_CAM/L
Front_CAM/M
Front_CAM/N
Front_CAM/O
Front_CAM/P
Front_CAM/Q
Front_CAM/R
Front_CAM/S
Front_CAM/T
Front_CAM/U
Front_CAM/V
Front_CAM/W
Front_CAM/X
Front_CAM/Y
Front_CAM/Z
Left_CAM/A
Left_CAM/B
Left_CAM/C
Left_CAM/D
Left_CAM/E
Left_CAM/F
Left_CAM/G
Left_CAM/H
Left_CAM/I
Left_CAM/J
Left_CAM/K
Left_CAM/L
Left_CAM/M
Left_CAM/N
Left_CAM/O
Left_CAM/P
Left_CAM/Q
Left_CAM/R
Left_CAM/S
Left_CAM/T
Left_CAM/U
Left_CAM/V
Left_CAM/W
Left_CAM/X
Left_CAM/Y
Left_CAM/Z
Below_CAM/A
Below_CAM/B
Below_CAM/C
Below_CAM/D
Below_CAM/E
Below_CAM/F
Below_CAM/G
Below_

In [None]:
folder_path = 'drive/MyDrive/github/HandGestureRecognition/'
image_folder = 'cropped_train/'
files = []

for file in os.listdir(folder_path+image_folder):
  if file != 'labels.csv':
    label = file.split('_')[2]
    files.append([file, label])

print(files[:5])

df = pd.DataFrame(files, columns=['files', 'target']).to_csv(folder_path+image_folder+'labels.csv')

[['Front_CAM_J_P1_003.jpg', 'J'], ['Front_CAM_J_P1_002.jpg', 'J'], ['Front_CAM_J_P1_001.jpg', 'J'], ['Front_CAM_K_P4_008.jpg', 'K'], ['Front_CAM_K_P4_007.jpg', 'K']]


In [None]:
folder_path = 'drive/MyDrive/github/HandGestureRecognition/'
image_folder = 'cropped_valid/'
files = []

for file in os.listdir(folder_path+image_folder):
  if file != 'labels.csv':
    label = file.split('_')[2]
    files.append([file, label])

print(files[:5])

df = pd.DataFrame(files, columns=['files', 'target']).to_csv(folder_path+image_folder+'labels.csv')

[['Left_CAM_Q_P5_008.jpg', 'Q'], ['Left_CAM_R_P5_008.jpg', 'R'], ['Left_CAM_S_P5_008.jpg', 'S'], ['Left_CAM_T_P5_008.jpg', 'T'], ['Left_CAM_U_P5_008.jpg', 'U']]
