<a href="https://colab.research.google.com/github/asoane34/TF_object_detection/blob/master/wheat_head_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mounting Google Drive

Before beginning the training, I am going to mount my Google Drive to this colab notebook. I have already [explored the data](https://github.com/asoane34/TF_object_detection/blob/master/00EDA.ipynb) and [prepared the TFRecord files](https://github.com/asoane34/TF_object_detection/blob/master/generate_tfrecords.py) and uploaded them to my Google Drive. 

In [0]:
from google.colab import drive

In [1]:
drive.mount("/gdrive")

In [3]:
%cd /gdrive/'My Drive'/object_detection

/gdrive/My Drive/object_detection


# Specifying correct version of Tensorflow
This is extremely important. The reason I migrated this project from Kaggle's environment over to Colab is that while it is possible to install Tensorflow 1.x (necessary for object detection API) in Kaggle's environment, the base image the notebooks are built off is not compatible with GPU training with Tensorflow 1.x. The Google Colab folks recommend NOT using !pip install to specify an earlier version, but rather using this tensorflow_version [magic command](https://colab.research.google.com/notebooks/tensorflow_version.ipynb).

In [4]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [5]:
import tensorflow as tf

tf.__version__

'1.15.2'

# Inspecting TFRecord files

An issue I have run in to is encountering corruped TFRecord files. I'm not exactly sure how this happens, but judging from the number of open issues on TF's github it seems to be a frequent issue without any clear solution. So, before beginning the training process (and having the training process crash 4000 steps in, as it did the first time), I am going to quickly check the integrity of my files.

In [0]:
def validate_dataset(filenames, reader_opts=None):

    i = 0
    
    for fname in filenames:
        
        print('validating ', fname)

        record_iterator = tf.io.tf_record_iterator(path=fname, options=reader_opts)
        
        try:
            
            for _ in record_iterator:
                
                i += 1
                
        except Exception as e:
            
            print('error in {} at record {}'.format(fname, i))
            
            print(e)

In [7]:
validate_dataset(["./global-wheat-detection/validation.tfrecord", "./global-wheat-detection/train.tfrecord",
                  "./global-wheat-detection/test_images.tfrecord"])

validating  ./global-wheat-detection/validation.tfrecord
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
validating  ./global-wheat-detection/train.tfrecord
validating  ./global-wheat-detection/test_images.tfrecord


Well, besides using a deprecated method, there do not appear to be any corrupt records. Hopefully this will hold true in training.

# Installation

Now, it is time to install the dependencies for the object detection API, clone in the object detection repo, and compile the [protobufs](https://developers.google.com/protocol-buffers). There's excellent documentation of the steps [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md), or you can just follow along. The dependencies (besides obviously TF 1.x) are:
* Cython
* contextlib2
* pillow
* lxml
* jupyter
* matplotlib
* pycocotools: This one is from the COCO API, and can be accessed by cloning in the COCO API repo and copying the file from there.

I'm sure most of these are already installed but I don't feel like guessing at which. 

In [0]:
!pip install --user Cython
!pip install --user contextlib2
!pip install --user pillow
!pip install --user lxml
!pip install --user matplotlib
!pip install --user pycocotools



Well, didn't need to install any of these. Makes sense, they wrote it. Next, it is time to clone in the TF models repo.

In [0]:
!git clone https://github.com/tensorflow/models.git

Cloning into 'models'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 34705 (delta 1), reused 5 (delta 0), pack-reused 34694[K
Receiving objects: 100% (34705/34705), 512.65 MiB | 14.43 MiB/s, done.
Resolving deltas: 100% (22449/22449), done.
Checking out files: 100% (2495/2495), done.


Now, to compile the protobufs. 

In [8]:
%cd models/research 

!protoc object_detection/protos/*.proto --python_out=.

/gdrive/My Drive/object_detection/models/research


The final step in this process before running the installation test script is to add the "models/research" and "models/research/slim" directories to path: PYTHONPATH.

In [9]:
import os

os.environ['PYTHONPATH'] = os.environ['PYTHONPATH']+':/gdrive/My Drive/object_detection/models/research/slim:/gdrive/My Drive/object_detection/models/research'

os.environ['PYTHONPATH']

'/tensorflow-1.15.2/python3.6:/env/python:/gdrive/My Drive/object_detection/models/research/slim:/gdrive/My Drive/object_detection/models/research'

Now, we are ready to run the __model_builder_test__ script. If this works, we have correctly installed the object detection API. 

In [10]:
!python object_detection/builders/model_builder_test.py

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Running tests under Python 3.6.9: /usr/bin/python3
[ RUN      ] ModelBuilderTest.test_create_experimental_model
[       OK ] ModelBuilderTest.test_create_experimental_model
[ RUN      ] ModelBuilderTest.test_create_faster_rcnn_model_from_config_with_example_miner
[       OK ] ModelBuilderTest.test_create_faster_rcnn_model_from_config_with_example_miner
[ RUN      ] ModelBuilderTest.test_create_faster_rcnn_models_from_config_faster_rcnn_with_matmul
[       OK ] ModelBuilderTest.test_create_faster_rcnn_models_from_config_faster_rcnn_with_matmul
[ RUN      ] ModelBuilderTest.test_create_faster_rcnn_models_from_config_faster_rcnn_wi

Sweet, we are ready to roll. 

# Configuring the Model for Training

Rather than building and training a model from scratch, I have elected to apply transfer learning in this project and use a pretrained model from the [TF model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md). The model zoo features research models trained on the COCO dataset, the Kitti dataset, the Open Images dataset, the AVA v2.1 dataset, and the iNaturalist Species Detection Dataset. In this case, I have elected to use a Faster RCNN ResNet101 model trained on the iNaturalist Species Detection Dataset. Obviously we are not detecting species, but we are working with similar images so hopefully it will be a good selection.

In [0]:
%cd object_detection

!wget -O faster_rcnn_resnet101_fgvc_2018_07_19.tar.gz http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19.tar.gz -q
    
!tar xvzf faster_rcnn_resnet101_fgvc_2018_07_19.tar.gz

!rm faster_rcnn_resnet101_fgvc_2018_07_19.tar.gz

/gdrive/My Drive/object_detection/models/research/object_detection
faster_rcnn_resnet101_fgvc_2018_07_19/
faster_rcnn_resnet101_fgvc_2018_07_19/saved_model/saved_model.pb
faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt.meta
faster_rcnn_resnet101_fgvc_2018_07_19/pipeline.config
faster_rcnn_resnet101_fgvc_2018_07_19/saved_model/
faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt.index
faster_rcnn_resnet101_fgvc_2018_07_19/saved_model/variables/
faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt.data-00000-of-00001
faster_rcnn_resnet101_fgvc_2018_07_19/checkpoint
faster_rcnn_resnet101_fgvc_2018_07_19/frozen_inference_graph.pb


In [0]:
%cd faster_rcnn_resnet101_fgvc_2018_07_19

!mkdir export

%cd export

!mkdir Servo

%cd ../../..

/gdrive/My Drive/object_detection/models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19
/gdrive/My Drive/object_detection/models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/export
/gdrive/My Drive/object_detection/models/research


With the model loaded and the directory tree set up correctly, it is time to write the pipeline config file. This file specifies the parameters of the model as well as the paths to training and validation data and any potential data augmentations and evaluation metrics. The models in the TF zoo have [sample config files](https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs), which are an excellent jumping off point. I will be using the [base config file](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_resnet101_fgvc.config) for the model I have selected and tuning it to this particular project. First, to add a couple important directories to my path.

In [0]:
os.environ['DATA_PATH'] = '/gdrive/My Drive/object_detection/global-wheat-detection'

os.environ['MODEL_PATH'] = 'object_detection/faster_rcnn_resnet101_fgvc_2018_07_19'

In [12]:
%cd object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/

/gdrive/My Drive/object_detection/models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19


## NOTE:  I have done several iterations of training, and thus in the .config file below, the "fine tune checkpoint" file has changed several times.

In [13]:
%%writefile global_wheat_detection.config
model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_batch_size: 32
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 116
        max_total_detections: 116
      }
      score_converter: SIGMOID
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  num_steps: 4000000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 61000
            learning_rate: .00003
          }
          schedule {
            step: 100000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "/gdrive/My Drive/object_detection/models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt-60000"
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}
    
train_input_reader: {
  label_map_path: "/gdrive/My Drive/object_detection/global-wheat-detection/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "/gdrive/My Drive/object_detection/global-wheat-detection/train.tfrecord"
  }
}
    
eval_config: {
  metrics_set: "pascal_voc_detection_metrics"
  use_moving_averages: false
  num_examples: 675
}
    
eval_input_reader: {
  label_map_path: "/gdrive/My Drive/object_detection/global-wheat-detection/label_map.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "/gdrive/My Drive/object_detection/global-wheat-detection/validation.tfrecord"
  }
}

Overwriting global_wheat_detection.config


In [14]:
cd ../..

/gdrive/My Drive/object_detection/models/research


Quickly test to make sure the GPU is connected, this will take a very long time without it.

In [15]:
tf.test.is_gpu_available()

True

# Model Training

Now with the modeling pipeline configured and everything installed, it is time to train the model. Colab allows us 12 hours of GPU usage and that will be more than enough for the first round of training. 

Because my Google Drive is synced locally, I am going connect to the Tensorboard locally, but it your GDrive is not synced locally, this is also possible using Ngrok.

In [0]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -o ngrok-stable-linux-amd64.zip

--2020-05-12 22:57:22--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 52.2.129.46, 54.85.41.146, 52.6.123.150, ...
Connecting to bin.equinox.io (bin.equinox.io)|52.2.129.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13773305 (13M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip.1’


2020-05-12 22:57:25 (5.97 MB/s) - ‘ngrok-stable-linux-amd64.zip.1’ saved [13773305/13773305]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


In [0]:
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'.format("object_detection/faster_rcnn_resnet101_fgvc_2018_07_19")
)
get_ipython().system_raw('./ngrok http 6006 &')
#The link to tensorboard.
#works after the training starts.
!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

https://e7a1eaa2.ngrok.io


In [0]:
!pwd

/gdrive/My Drive/object_detection/models/research


In [0]:
!rm object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/checkpoint

In [0]:
!python object_detection/model_main.py \
    --pipeline_config_path=object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/global_wheat_detection.config \
    --model_dir=object_detection/faster_rcnn_resnet101_fgvc_2018_07_19 \
    --num_train_steps=120000 \
    --sample_1_of_n_eval_examples=1 \
    --alsologtostderr=False

# Exporting Trained Model for Inference

With the model tuned to the new dataset, it is time to export the last trained checkpoint and export the model for inference. This can be done by identifying the last model checkpoint, and then running the __export_inference_graph__ script.

In [0]:
import numpy as np
import re

ckpts = [f for f in os.listdir('object_detection/faster_rcnn_resnet101_fgvc_2018_07_19') \
       if 'model.ckpt-' in f and '.meta' in f]

ckpt_steps = np.array([int(re.findall('\d+', f)[0]) for f in ckpts])

last_model = ckpts[ckpt_steps.argmax()].replace('.meta', '')

last_model_path = os.path.join('object_detection/faster_rcnn_resnet101_fgvc_2018_07_19', last_model)

In [18]:
last_model_path

'object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt-120000'

First, I will create a destination to write the frozen inference graph to.

In [0]:
output_dir = "object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model120000/"

if not os.path.exists(output_dir):

    os.makedirs(output_dir)

In [0]:
!python object_detection/export_inference_graph.py \
   --input_type=image_tensor \
   --pipeline_config_path object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/global_wheat_detection.config \
   --output_directory object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model120000/ \
   --trained_checkpoint_prefix object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/model.ckpt-120000

Next, there is a script to run inference with the trained model on test images [here](https://github.com/tensorflow/models/blob/master/research/object_detection/inference/infer_detections.py). The command below executes that command and returns a tfrecord with the proposed detections.

In [0]:
!python object_detection/inference/infer_detections.py \
  --input_tfrecord_paths=/gdrive/'My Drive'/object_detection/global-wheat-detection/test_images.tfrecord \
  --output_tfrecord_path=/gdrive/'My Drive'/object_detection/global-wheat-detection/inferences120000.tfrecord \
  --inference_graph=object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model120000/frozen_inference_graph.pb \
  --discard_image_pixels

I also am going to save the model locally to use for inference on my Kaggle submission.

In [0]:
%cd ../..

/gdrive/My Drive/object_detection


In [0]:
!tar -cvzf global-wheat-detection/trained_model.tar.gz models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model

models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/model.ckpt.data-00000-of-00001
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/model.ckpt.index
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/checkpoint
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/model.ckpt.meta
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/frozen_inference_graph.pb
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/saved_model/
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/saved_model/variables/
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19/trained_model/saved_model/saved_model.pb
models/research/object_detection/faster_rcnn_resnet101_fgvc_2018_07_19

In [0]:
from google.colab import files

files.download('global-wheat-detection/trained_model.tar.gz')