<a href="https://colab.research.google.com/github/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_tagging/scat_footprint/scat_footprint_train_yolo_darkflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training YOLOv2 in Darkflow to detect scat and footprints from EOL images
---
*Last Updated 23 February 2021*   
Use images with annotations to train YOLOv2 implemented in Tensorflow (via darkflow) to detect scat and footprints from EOL images.

Datasets were downloaded to Google Drive in [scat_footprint_preprocessing.ipynb](https://github.com/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_tagging/scat_footprint/scat_footprint_preprocessing.ipynb). 

**YOLOv2 was trained for 4,000 epochs on 5 images to overfit, then for 1,000 epochs at lr=0.001 to reach a stable loss value (3), and finally for 1,000 epochs to refine learning with a slow rate at lr=0.0001.** Custom anchor boxes were used to optimize coverage for the dataset and image augmentation was used to increase dataset size from 500 img per class to 1000 img, but loss never decreased below 3 and final mAP was <10%. 

Notes:   
* Change filepaths/taxon names where you see 'TO DO' 
* Make sure to set the runtime to Python 2 with GPU Hardware Accelerator.    

References:   
* [Official Darkflow training instructions](https://github.com/thtrieu/darkflow)   
* [Medium Blog on training using YOLO via Darkflow in Colab](https://medium.com/coinmonks/detecting-custom-objects-in-images-video-using-yolo-with-darkflow-1ff119fa002f)

## Installs
---

In [1]:
# Mount google drive to import/export files
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
# Change to your working directory
%cd /content/drive/My Drive/train

# Install libraries
# Make sure you are using Python 3.6
!python --version
!pip install tensorflow-gpu==1.15.0rc2
!pip install cython
!pip install opencv-python

import os
import pathlib
import shutil 

### Only run once: Model preparation uploads to Google Drive
For detailed instructions on training YOLO using a custom dataset, see the [Darkflow GitHub Repository](https://github.com/thtrieu/darkflow).

In [None]:
# Download train and test dataset annotation files and install darkflow

# Download darkflow (the tensorflow implementation of YOLO)
if os.path.exists("darkflow-master"):
  %cd darkflow-master
  !pwd

elif not os.path.exists("darkflow-master"):
  !git clone --depth 1 https://github.com/thtrieu/darkflow.git
  # Compile darkflow
  %cd darkflow
  !python setup.py build_ext --inplace
  # Rename darkflow to darkflow-master to distinguish between folder names
  shutil.move('/content/drive/My Drive/fall19_smithsonian_informatics/train/darkflow', 
          '/content/drive/My Drive/fall19_smithsonian_informatics/train/darkflow-master')

In [None]:
# Test installation, you should see an output with different parameters for flow
!python flow --h

In [None]:
# Download other needed files for training

# Upload yolo.weights, pre-trained weights file (for YOLO v2) from Google drive 
weights_file = 'bin/yolo.weights'
if not os.path.exists('weights_file'):
  #!gdown --id 0B1tW_VtY7oniTnBYYWdqSHNGSUU
  #!mkdir bin
  #!mv yolo.weights bin
  print('double check if weights file was already downloaded')

# Make new label file/overwrite existing labels.txt downloaded with darkflow
!echo 'scat' >labels.txt
!echo 'footprint' >>labels.txt

# Download model config file edited for training darkflow to identify 2 classes (yolo-2c = 2 classes)
mod_config_file = 'cfg/yolo-2c-slowlr-anch.cfg'
if not os.path.exists('mod_config_file'):
  #%cd cfg
  print('double check if config file was already downloaded')
  #!gdown --id 1wgKwWsnmJDOWzrimp3GTPtpKLoBGoyMg
  #%cd ../

## Imports   
---

In [None]:
%cd darkflow-master
%tensorflow_version 1.0

# For importing/exporting files, working with arrays, etc
from google.colab import files
import os
import pathlib
import imageio
import time
import csv
import urllib
import numpy as np
import pandas as pd

# For the actual object detection
!python setup.py build_ext --inplace
from darkflow.net.build import TFNet

# For drawing onto and plotting the images
import matplotlib.pyplot as plt
import cv2
%config InlineBackend.figure_format = 'svg'
%matplotlib inline

## Train the model
---

In [None]:
# List different parameters for flow
!python flow --h

#### Step 1) Pre-train by overfitting model on 3 images per class for 4000 epochs (or until loss gets as low as possible and accuracy gets as high as possible)


In [None]:
# Start training

# Train model (yolo-2c_slowlr_anch.cfg) using pre-trained weights from basal layers of yolo.weights, the top layer will be trained from scracth to detect scat and footprints
# Change the dataset and annotation directories to your paths in Google Drive
%cd darkflow-master
!python flow --model cfg/yolo-2c_slowlr_anch.cfg --train --trainer adam --load bin/yolo.weights --gpu 0.8 --epoch 4000 --dataset "/content/drive/My Drive/train/pretrain/img" --annotation "/content/drive/My Drive/train/pretrain/ann" --savepb

In [None]:
# Resume training from last checkpoint (useful if Drive timeout happens or if you want to train for a few more epochs)
!python flow --load -1 --model cfg/yolo-2c_slowlr_anch.cfg --train --savepb --trainer adam --gpu 0.8 --epoch 1000 --dataset "/content/drive/My Drive/train/pretrain/img" --annotation "/content/drive/My Drive/train/pretrain/ann"

#### Step 2) Train on full dataset with high learning rate until loss starts to stabilize (usually at a value b/w 1 - 5)

In [None]:
# Train model (yolo-2c_slowlr_anch.cfg) using pre-trained weights from basal layers of yolo.weights that were pre-fit in Step 1 above
# Change the dataset and annotation directories to your paths in Google Drive
%cd darkflow-master
!python flow --model cfg/yolo-2c_slowlr_anch.cfg --train --trainer adam --load bin/yolo.weights --gpu 0.8 --epoch 100 --dataset "/content/drive/My Drive/train/images" --annotation "test/training/annotations" --savepb

#### Step 3) Train on full dataset with low learning rate (10x lower than step 1) to get best loss/accuracy values (loss <1, accuracy as close to 100% as possible)

In [None]:
# Resume training from last checkpoint #100 epochs with 0.0001, 100 with .00001
!python flow --load -1 --model cfg/yolo-2c_slowlr_anch.cfg --train --savepb --trainer adam --gpu 0.8 --epoch 100 --dataset "/content/drive/My Drive/train/images" --annotation "test/training/annotations"

#### Step 4) Save trained model to protobuf file (.pb)

In [None]:
# Save the last checkpoint to protobuf file
!python flow --model cfg/yolo-2c_slowlr_anch.cfg --load -1 --savepb

In [None]:
# If decide want to keep training, can resume training from protobuf file using cmds below
!python flow --load -1 --pbLoad built_graph/yolo-2c_slowlr_anch.pb --metaLoad built_graph/yolo-4c.meta --train --savepb --trainer adam --gpu 0.8 --epoch 3000 --dataset "/content/drive/My Drive/fall19_smithsonian_informatics/train/images" --annotation "test/training/annotations"

## Evaluate model accuracy
---

### Step 1) Export detection results as JSON

In [None]:
# Export detection results for test images as json files to calculate mAP (mean average precision, a performance measure to compare models) using calculate_error_mAP.ipynb
!python flow --pbLoad built_graph/yolo-2c_slowlr_anch.pb --gpu 0.8 --metaLoad built_graph/yolo-2c_slowlr_anch.meta --imgdir "/content/drive/My Drive/train/test_images" --json

### Step 2) Use Cartucho's mAP library to evaluate model accuracy

In [18]:
# Install the mAP repository to calculate error from detection results
import os
%cd /content/drive/My Drive/train
if not os.path.exists("eval"):
  !mkdir eval
  %cd eval
  #!git clone https://github.com/Cartucho/mAP
  pritn("check installation of mAP or working directory, should already be installed")
  %cd ../

# Move yolo detection results (jsons exported above) to detection-results/
!mv test_images/out/* eval/mAP/input/detection-results/
!rm -rf test_images/out

# Copy image annotations (xmls formatted with ground truth bounding boxes) to ground-truth/
!cp test_ann/* eval/mAP/input/ground-truth/

# Convert jsons to format needed for mAP calc
%cd /content/drive/My Drive/train/eval/mAP/scripts/extra
!python convert_dr_darkflow_json.py

# Convert xmls to format needed for mAP calc
%cd  /content/drive/My Drive/train/eval/mAP/scripts/extra
!python convert_gt_xml.py

# Remove sample images in input/images-optional
# cd to mAP
%cd  /content/drive/My Drive/train/eval/mAP
!rm -rf input/images-optional/*

# Calculate mAP for detection results
# Output will be in mAP/results
!python main.py

/content/drive/My Drive/train
/content/drive/My Drive/train/eval/mAP/scripts/extra
Conversion completed!
/content/drive/My Drive/train/eval/mAP/scripts/extra
Conversion completed!
/content/drive/My Drive/train/eval/mAP
6.50% = footprint AP 
4.49% = scat AP 
mAP = 5.49%
<Figure size 640x480 with 1 Axes>
