<a href="https://colab.research.google.com/github/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_tagging/scat_footprint/scat_footprint_train_yolo_darkflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Train YOLOv2 in Darkflow to detect scat and footprints from EOL images
---
*Last Updated 3 June 2021*  
-Runs in Python 2 with Tensorflow 1.x-   
--*Update as of 2 June 2021--Darkflow builds are no longer being updated and only support Tensorflow 1.x builds. As a result, this notebook is left in its state from 3 June 2021. Functions may become deprecated or lose functionality. For updated inference of scat and footprints with YOLOv4 in its native state, refer to [scat_footprint_train_yolov4.ipynb](https://github.com/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_tagging/scat_footprint/scat_footprint_train_yolov4.ipynb).*--     

Use images with annotations to train YOLOv2 implemented in Tensorflow (via [thtrieu's darkflow](https://github.com/thtrieu/darkflow)) to detect scat and footprints from EOL images. Detected scat and footprints will be used to add tags to images of birds (Aves), amphibians (Amphibia), reptiles (Reptilia), and mammals (Mammalia).

Datasets were downloaded to Google Drive in [scat_footprint_preprocessing.ipynb](https://github.com/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_tagging/scat_footprint/scat_footprint_preprocessing.ipynb). 

**YOLOv2 was trained for 4,000 epochs on 5 images to overfit, then for 1,000 epochs at lr=0.001 to reach a stable loss value (3), and finally for 1,000 epochs to refine learning with a slow rate at lr=0.0001.** Scat/footprint object detection models never learned despite adjusting augmentation and model hyperparameters for many training sessions. If successful approaches are found at a later date, steps for adding tags to images will be included. Custom anchor boxes were used to optimize coverage for the dataset and image augmentation was used to increase dataset size from 500 img per class to 1000 img, but loss never decreased below 3 and final mAP was <10%. 

Notes:   
* Before you you start: change the runtime to "GPU" with "High RAM"
* Change filepaths/taxon names where you see 'TO DO'     

References:   
* [Official Darkflow training instructions](https://github.com/thtrieu/darkflow)   
* [Medium Blog on training using YOLO via Darkflow in Colab](https://medium.com/coinmonks/detecting-custom-objects-in-images-video-using-yolo-with-darkflow-1ff119fa002f)

## Installs & Imports
---

In [None]:
# Mount google drive to import/export files
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
# Install libraries
# Make sure you are using Python 3.6
!python --version
!pip install tensorflow-gpu==1.15.0rc2
!pip install cython
!pip install opencv-python

# For importing/exporting files, working with arrays, etc
from google.colab import files
import os
import pathlib
import shutil
import imageio
import time
import csv
import urllib
import numpy as np
import pandas as pd

# For drawing onto and plotting the images
import matplotlib.pyplot as plt
import cv2
%config InlineBackend.figure_format = 'svg'
%matplotlib inline

## Model preparation (only run once)
---
These blocks download and set-up files needed for training object detectors. After running once, you can train and re-train as many times as you'd like.

For detailed instructions on training YOLO using a custom dataset, see the [Darkflow GitHub Repository](https://github.com/thtrieu/darkflow).

In [None]:
# Download train and test dataset annotation files and install darkflow

# TO DO: Type in the path to your working directory in form field to right
basewd = "/content/drive/MyDrive/train" #@param {type:"string"}
%cd $basewd

# Download darkflow (the tensorflow implementation of YOLO)
if os.path.exists("darkflow-master"):
    %cd darkflow-master
    !pwd

elif not os.path.exists("darkflow-master"):
    !git clone --depth 1 https://github.com/thtrieu/darkflow.git
    # Compile darkflow
    %cd darkflow
    !python setup.py build_ext --inplace
    # Rename darkflow to darkflow-master to distinguish between nested folder names
    %cd ../
    shutil.move('darkflow', 'darkflow-master')

wd = 'darkflow-master'
%cd $wd

In [None]:
# Test installation, you should see an output with different parameters for flow
!python flow --h

In [None]:
# Download other needed files for training

# Upload yolo.weights, pre-trained weights file (for YOLO v2) from Google drive 
weights_file = 'bin/yolo.weights'
if not os.path.exists('weights_file'):
    !gdown --id 0B1tW_VtY7oniTnBYYWdqSHNGSUU
    !mkdir bin
    !mv yolo.weights bin
    print('Successfully downloaded ', weights_file)

# Make new label file/overwrite existing labels.txt downloaded with darkflow
!echo 'scat' >labels.txt
!echo 'footprint' >>labels.txt

# Download model config file edited for training darkflow to identify 2 classes (yolo-2c = 2 classes)
mod_config_file = 'cfg/yolo-2c-slowlr-anch.cfg'
if not os.path.exists('mod_config_file'):
    %cd cfg
    !gdown --id 1wgKwWsnmJDOWzrimp3GTPtpKLoBGoyMg
    %cd ../
    print('Successfully downloaded ', model_config_file)

## Train the model
---

#### Build darkflow

In [None]:
# Build darkflow

%tensorflow_version 1.0

# TO DO: Type in the path to your working directory in form field to right
wd = "/content/drive/MyDrive/train/darkflow-master" #@param {type:"string"}
%cd $wd

# For the actual object detection
!python setup.py build_ext --inplace
from darkflow.net.build import TFNet

# List different parameters for flow
!python flow --h

#### Step 1) Pre-train by overfitting model on 3 images per class for n-epochs with a high learning rate (until loss gets as low as possible and accuracy gets as high as possible)


In [None]:
import lxml.etree as ET

# Make a mini dataset for pre-training
%cd ../

# Set up directory for mini dataset
# TO DO: Type in the folder you would like to contain mini dataset
folder = "pretrain" #@param {type:"string"}
if not os.path.exists(folder):
    os.makedirs(folder)
    %cd $folder
    os.makedirs("img")
    os.makedirs("ann")
    %cd ../

# Move 3 images and annotations per class to pretrain/
# To DO: Enter path to training image dataset
train_img_path = "tf2/images/" #@param {type:"string"}
train_ann_path = "tf2/annotations/" #@param
pretrain_ann_dir = 'pretrain/ann/' #@param
pretrain_img_dir = 'pretrain/img/' #@param

# Randomly pick a pool of 20 annotation files
num_files = 20 
files = os.listdir(train_ann_path)
filenames = np.random.choice(files, num_files)

# Find 3 annotations for each image class
# TO DO: Enter list of image classes
image_classes = ['scat', 'footprint'] #@param 
class0_xmls = []
class1_xmls = []
for filename in filenames:
    fpath = os.path.join(train_ann_path, filename)
    tree = ET.parse(fpath)
    root = tree.getroot()
    for item in root.iter('name'):
        if (item.text == image_classes[0]) and (len(class0_xmls) <= 2):
            class0_xmls.append(fpath)
        elif (item.text == image_classes[1]) and (len(class1_xmls) <= 2):
            class1_xmls.append(fpath)

# Move annotation files to pretrain/
class_xmls = class0_xmls + class1_xmls
for xml in class_xmls:
    try:
        shutil.move(xml, pretrain_ann_dir)
    except:
        pass
print("Found {} annotations for {} & moved to {}".format(len(class0_xmls), image_classes[0], pretrain_ann_dir))
print("Found {} annotations for {} & moved to {}".format(len(class1_xmls), image_classes[1], pretrain_ann_dir))

# Get 3 images matching randomly selected xmls for each class
def check_train_anns(train_dir, ann_dir):
    train_imgs = os.listdir(train_img_path)
    corresp_imgs = []
    # Loop through train images to see if xml for each one
    for train_img in train_imgs:
        base = os.path.splitext(os.path.basename(train_img))[0]
        train_xml = ann_dir + base + '.xml'
        if os.path.exists(train_xml):
            corresp_imgs.append(train_img)
        else:
            pass

    return corresp_imgs

# Find images
corresp_imgs = check_train_anns(train_img_path, pretrain_ann_dir)

# Move images to pretrain/
for img in corresp_imgs:
    try:
        fpath = 'images/' + img
        shutil.move(fpath, pretrain_img_dir)
    except:
        pass

print("\nFound {} images matching xmls for {} and {} & moved to {}".format(len(corresp_imgs), image_classes[0], image_classes[1], pretrain_img_dir))
print("\nSuccessfully created mini dataset for pretraining models!")

In [None]:
# Define training parameters
# Note: adjust learning rate by double clicking mod_config_file in Colab file explorer
# Note contd: Colab text editor will open and you can adjust values and save before continuing
%cd $wd

# TO DO: Set up training parameters
mod_config_file = "cfg/yolo-2c-slowlr-anch.cfg" #@param {type:"string"}
weights = "bin/yolo.weights" #@param {type:"string"}
ann_path = "/content/drive/MyDrive/train/pretrain/ann" #@param {type:"string"}
img_path = "/content/drive/My Drive/train/pretrain/img" #@param {type:"string"}
trainer = "Adam" #@param {type:"string"}
epochs = 4000 #@param {type:"integer"}
gpu = 0.8 #@param {type:"slider", min:0, max:0.8, step:0.1}

In [None]:
# Start training

# Train model (yolo-2c_slowlr_anch.cfg) using pre-trained weights from basal layers of yolo.weights, the top layer will be trained from scracth to detect scat and footprints
# Change the dataset and annotation directories to your paths in Google Drive
!python flow --model {mod_config_file} --train --trainer {trainer} --load {weights} --gpu {gpu} --epoch {epochs} --dataset {img_path} --annotation {ann_path} --savepb

In [None]:
# Resume training 
%cd $wd

# TO DO: Choose how many more epochs to train for
more_epochs = 1000 #@param {type:"integer"}

# Resume training from last checkpoint 
# useful if Google Drive timeout occurs or to train for a few more epochs
!python flow --load -1 --model {mod_config_file} --train --savepb --trainer {trainer} --gpu {gpu} --epoch {more_epochs} --dataset {img_path} --annotation {ann_path}

#### Step 2) Train on full dataset with intermediate learning rate until loss starts to stabilize (usually at a value b/w 1 - 5)

In [None]:
# Define training parameters
# Note: adjust learning rate by double clicking mod_config_file in Colab file explorer
%cd $wd

# TO DO: Set up training parameters
mod_config_file = "cfg/yolo-2c-slowlr-anch.cfg" #@param {type:"string"}
weights = "bin/yolo.weights" #@param {type:"string"}
ann_path = "/content/drive/MyDrive/train/tf2/annotations" #@param {type:"string"}
img_path = "/content/drive/MyDrive/train/tf2/images" #@param {type:"string"}
trainer = "Adam" #@param {type:"string"}
epochs = 100 #@param {type:"integer"}
gpu = 0.8 #@param {type:"slider", min:0, max:0.8, step:0.1}

In [None]:
# Train model (yolo-2c_slowlr_anch.cfg) using pre-trained weights from basal layers of yolo.weights that were pre-fit in Step 1 above
# Change the dataset and annotation directories to your paths in Google Drive
%cd $wd
!python flow --model {mod_config_file} --train --trainer {trainer} --load {weights} --gpu {gpu} --epoch {epochs} --dataset {img_path} --annotation {ann_path} --savepb

#### Step 3) Train on full dataset with low learning rate (10x lower than step 2) to get best loss/accuracy values (loss <1, accuracy as close to 100% as possible)

In [None]:
# Resume training
# Note: adjust learning rate by double clicking mod_config_file in Colab file explorer
%cd $wd

# TO DO: Choose how many more epochs to train for
more_epochs = 100 #@param {type:"integer"}

!python flow --load -1 --model {mod_config_file} --train --savepb --trainer {trainer} --gpu {gpu} --epoch {more_epochs} --dataset {img_path} --annotation {ann_path}

#### Step 4) Save trained model to protobuf file (.pb)

In [None]:
# Save the last checkpoint to protobuf file
!python flow --model {mod_config_file} --load -1 --savepb

In [None]:
# If decide want to keep training, can resume training from protobuf file using cmds below

# TO DO: Enter path to saved model protbuf file
pb_file = "built_graph/yolo-2c_slowlr_anch.pb" #@param {type:"string"}
meta_file = "built_graph/yolo-2c_slowlr_anch.meta" #@param {type:"string"}
epochs = 100 #@param {type:"integer"}

!python flow --load -1 --pbLoad {pb_file} --metaLoad {meta_file} --train --savepb --trainer {trainer} --gpu {gpu} --epoch {more_epochs} --dataset {img_dir} --annotation {ann_dir}

## Evaluate model accuracy
---

### Step 1) Export detection results as JSON

In [None]:
# Export detection results for test images as json files to calculate mAP (mean average precision, a performance measure to compare models) using calculate_error_mAP.ipynb
%cd $wd

# TO DO: Enter test images directory
test_img_dir = "/content/drive/MyDrive/train/tf2/test_images" #@param {type:"string"}

!python flow --pbLoad {pb_file} --gpu {gpu} --metaLoad {meta_file} --imgdir {test_img_dir} --json

### Step 2) Use Cartucho's mAP library to evaluate model accuracy

In [None]:
# Install the mAP repository to calculate error from detection results
%cd $wd
%cd ../
if not os.path.exists("eval"):
  !mkdir eval
  %cd eval
  !git clone https://github.com/Cartucho/mAP
  %cd ../

# Move yolo detection results (jsons exported above) to detection-results/
eval_results = test_img_dir + '/out' + '/*'
!mv $eval_results eval/mAP/input/detection-results/
eval_results = eval_results.replace('/*', '')
!rm -rf $eval_results

# Copy image annotations (xmls formatted with ground truth bounding boxes) to ground-truth/
test_ann_dir = "/content/drive/MyDrive/train/tf2/test_ann/" #@param {type:"string"}
test_ann_dir = test_ann_dir + '*'
!cp $test_ann_dir eval/mAP/input/ground-truth/

# Convert jsons to format needed for mAP calc
%cd eval/mAP/scripts/extra
!python convert_dr_darkflow_json.py
# Convert xmls to format needed for mAP calc
!python convert_gt_xml.py

# Remove sample images in input/images-optional
# cd to mAP
%cd $wd
%cd ../
%cd eval/mAP
!rm -rf input/images-optional/*

# Calculate mAP for detection results
# Output will be in mAP/results
!python main.py