<a href="https://colab.research.google.com/github/aubricot/computer_vision_with_eol_images/blob/master/object_detection_for_image_cropping/aves/aves_generate_crops_tf2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Faster-RCNN and SSD in Tensorflow to automatically crop images of birds
---   
*Last Updated 17 January 2025*  
-Runs in Python 3 with Tensorflow 2.0-   
Using [Faster-RCNN](https://tfhub.dev/tensorflow/faster_rcnn/resnet50_v1_640x640/1) and [SSD](https://tfhub.dev/tensorflow/ssd_mobilenet_v2/2) models pretrained on [MS COCO 2017](https://cocodataset.org/#explore) as methods to do customized, large-scale image processing with Tensorflow. Using the location and dimensions of the detected birds, images will be cropped to square dimensions that are centered and padded around the object(s) of interest (ie birds). Pre-trained models are used for "out of the box" inference on images of birds of varying dimensions and resolutions.

Code is modified from [here](https://medium.com/@nickbortolotti/tensorflow-object-detection-api-in-5-clicks-from-colaboratory-843b19a1edf1). The [Tensorflow Object Detection API Tutorial](https://github.com/tensorflow/models/tree/master/research/object_detection) was also used as a reference. The [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#tensorflow-models-installation) is used for building custom models for object detection.

Notes:
* Run code blocks by pressing play button in brackets on left
* Change parameters using form fields on right (find details at corresponding lines of code by searching '#@param')

## Installs & Imports
---

In [None]:
#@title Choose where to save results
# Use dropdown menu on right
save = "in Colab runtime (files deleted after each session)" #@param ["in my Google Drive", "in Colab runtime (files deleted after each session)"]

# Mount google drive to export image cropping coordinate file(s)
if 'Google Drive' in save:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

# Note: You can modify "filter" to choose detection results for any class of interest the model is trained on
filter = "bird" #@param ["bird"] {allow-input: true}

# Type in the path to your project wd in form field on right
basewd = "/content/drive/MyDrive/train" #@param ["/content/drive/MyDrive/train"] {allow-input: true}
# Type in the folder that you want to contain TF2 files
folder = "tf2" #@param ["tf2"] {allow-input: true}
# Define current working directory using form field inputs
cwd = basewd + '/' + folder

# Install dependencies
!pip3 install --upgrade gdown
!gdown 1fIEf387CNrWk0ziPY-ltvwN9VrRXrRkY # Download helper_funcs folder
!tar -xzvf helper_funcs.tar.gz -C .
!pip install -r requirements.txt

In [None]:
#@title Choose saved model parameters
import sys
sys.path.append('/content')
from setup import *

# Set up directory structure
setup_dirs(cwd)
%cd $cwd

# Load Pre-trained model from Tensorflow Hub (both trained on MS COCO 2017)
model = "SSD MobileNet v2" #@param ["SSD MobileNet v2", "Faster RCNN Resnet 50"] {allow-input: true}
detector, module_handle, mod_abbv = load_tfhub_detector(model)

# Load corresponding label map for MS COCO 2017
!gdown 1mWmTvaBWKZ2GBbRllDoxPkecgJl5jnFK # Download labelmap.json
label_map = convert_labelmap('labelmap.json')

In [None]:
#@title Import libraries

# For running inference on the TF-Hub module
import tensorflow as tf
import tensorflow_hub as hub

# For downloading and displaying images
import matplotlib.pyplot as plt
import tempfile
from six.moves.urllib.request import urlopen
from six import BytesIO

# For drawing onto images
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

# For measuring inference time
import time

# For working with data
import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', 1000)
pd.options.display.max_columns = None
import os
import csv
import urllib
import sys
import json

# Define EOL CV custom functions
from wrangle_data import *

# Print Tensorflow version
print('Tensorflow Version: %s' % tf.__version__)

# Check available GPU devices
print('The following GPU devices are available: %s' % tf.test.gpu_device_name())

## Generate crops: Run inference on EOL images & save resulting coordinates for cropping - Run 4X for batches A-D
---
Use 20K EOL image bundle to generate bounding boxes around each object with pre-trained object detection models. Results are saved to [crops_file].tsv. Run this section 4 times (to make batches A-D) of 5K images each to incrementally save in case of Colab timeouts.

In [None]:
#@title Define functions

# Set the maximum number of detections to keep per image
max_boxes = 10 #@param {type:"slider", min:0, max:100, step:10}

# Set the minimum confidence score for detections to keep per image
min_score = 0.6 #@param {type:"slider", min:0, max:0.9, step:0.1}

# Set filename for saving classification results
def set_outpath(crops_file, cwd):
    outpath = cwd + '/' + 'results/' + crops_file.rsplit('_',1)[0] + mod_abbv + '_' + crops_file.rsplit('_',1)[1] + '.tsv'
    print("\nSaving results to: \n", outpath)

    return outpath

# Export object detection results
def export_results(image_url, result, outfpath, im_h, im_w, filter=filter):
    with open(outfpath, 'a') as out_file:
        tsv_writer = csv.writer(out_file, delimiter='\t')
        img_id = os.path.splitext((os.path.basename(image_url)))[0]
        # Write one row per detected object with bounding box coordinates
        num_detections = min(int(result["num_detections"][0]), max_boxes)
        for i in range(0, num_detections):
            class_name = str(label_map[result["detection_classes"][0][i]])
            if filter in class_name: # Only writes rows for filtered class
                ymin = result["detection_boxes"][0][i][0]
                xmin = result["detection_boxes"][0][i][1]
                ymax = result["detection_boxes"][0][i][2]
                xmax = result["detection_boxes"][0][i][3]
                tsv_writer.writerow([img_id, class_name,
                          xmin, ymin, xmax, ymax, im_h, im_w, image_url])
        print("\nObject detection results for Image {} saved to: {}".format(image_url, outfpath))

    return img_id

# Format cropping dimensions to EOL standards
def format_crops_for_eol(df):
# {"height":"423","width":"640","crop_x":123.712,"crop_y":53.4249,"crop_width":352,"crop_height":0}
    df['crop_dimensions'] = np.nan
    for i, row in df.iterrows():
        df.loc[i, 'crop_dimensions'] = ('{{"height":"{}","width":"{}","crop_x":{},"crop_y":{},"crop_width":{},"crop_height":{}}}'
        .format(df.im_height[i], df.im_width[i], df.xmin[i], df.ymin[i], df.crop_width[i], df.crop_height[i]))

    # Add other dataframe elements from cols: identifier, dataobjectversionid, eolmediaurl, im_class, crop_dimensions
    eol_crops = pd.DataFrame(df.iloc[:,np.r_[-5,-4,-6,0,-1]])
    print("\n EOL formatted cropping dimensions: \n", eol_crops.head())

    return eol_crops

print('Model loaded and functions defined! \nGo to next steps to run inference on images.')

### Generate crops: Run inference on EOL images & save results for cropping - Run 4X for batches A-D
Use 20K EOL Aves image bundle to get bounding boxes of detected birds. Results are saved to [crops_file].tsv. Run this section 4 times (to make batches A-D) of 5K images each to incrementally save in case of Colab timeouts.

In [None]:
#@title Enter EOL image bundle and choose inference settings (change **crops_file** for each batch A-D)

# Load in EOL image bundle
bundle = "https://editors.eol.org/other_files/bundle_images/files/images_for_Aves_20K_breakdown_download_000001.txt" #@param ["https://editors.eol.org/other_files/bundle_images/files/images_for_Aves_20K_breakdown_download_000001.txt"] {allow-input: true}
df = read_datafile(bundle, sep='\t', header=None, disp_head=False)
df.columns = ['url']
print('\n EOL image bundle head:\n{}'.format(df.head()))

# Test pipeline with a smaller subset than 5k images?
run = "test with tiny subset" #@param ["test with tiny subset", "for all images"]

# Display detection results on images?
if 'tiny subset' in run:
    display_results = True
else:
    display_results = False

# Take 5k subset of bundle for running inference
# Change filename for each batch
crops_file = "aves_cropcoords_tf2_c" #@param ["aves_cropcoords_tf2_a", "aves_cropcoords_tf2_b", "aves_cropcoords_tf2_c", "aves_cropcoords_tf2_d"] {allow-input: true}
outfpath = set_outpath(crops_file, cwd)

# Write header row of output tag file
if not os.path.isfile(outfpath):
    with open(outfpath, 'a') as out_file:
              tsv_writer = csv.writer(out_file, delimiter='\t')
              tsv_writer.writerow(["img_id", "class_name", "xmin", \
                                   "ymin", "xmax", "ymax", "im_width", \
                                   "im_height", "url"])

In [None]:
#@title Choose settings to run inference on image batches A-D

# Run EOL bundle images through trained model and save results
print("Running inference on images")
all_predictions = []
start, stop, cutoff = set_start_stop(run, df)
for i, row in enumerate(df.iloc[start:stop].iterrows()):
    try:
        # Run image through object detector and export result
        image_url = df['url'][row[0]]
        image_wboxes, result, im_h, im_w = run_detector_tf(detector, image_url, outfpath, filter, label_map, max_boxes, min_score)
        img_id = export_results(image_url, result, outfpath, im_h, im_w)

        # Optional: Display detections on images
        if (i+1<=50) and display_results:
            display_image(image_wboxes)

        # Display progress message after each image
        all_predictions.append(img_id)
        print('\033[92m {}) Inference complete for image {} of {} \033[0m \n'.format(i+1, i+1, cutoff))
        if len(all_predictions)>=cutoff:
              break

    except:
        print('Check if URL from {} is valid\n'.format(df['url'][i]))

print("\n\n~~~\n\033[92m Inference complete!\033[0m \033[93m Run these steps for remaining batches A-D before proceeding.\033[0m\n~~~")

## Post-process detection results
---
Combine output files for batches A-D. Then, convert detection boxes into square, centered thumbnail cropping coordinates.

In [None]:
#@title Merge 5k image batch output files A-D

# Enter path to any inference result batch file A-D

# If you just ran "Generate crops" above, you do not need to enter anything
# If you ran "Generate crops" during a previous session, enter the path for ONE output file
if 'outfpath' not in locals() or globals():
    crops_file = "aves_cropcoords_tf2_a" #@param ["aves_cropcoords_tf2_a", "aves_cropcoords_tf2_b", "aves_cropcoords_tf2_c", "aves_cropcoords_tf2_d"] {allow-input: true}
    outfpath = set_outpath(crops_file, cwd)

# Combine 4 batches of detection box coordinates to one dataframe
basewd =  os.path.splitext(outfpath)[0].rsplit('_',1)[0] + '_'
exts = ['a.tsv', 'b.tsv', 'c.tsv', 'd.tsv']
all_filenames = [basewd + e for e in exts]
df = pd.concat([pd.read_csv(f, sep='\t', header=0, na_filter = False) for f in all_filenames], ignore_index=True)

# Write results to tsv
concat_outfpath = basewd + 'concat.tsv'
df.to_csv(concat_outfpath, sep='\t', index=False)
print("New concatenated dataframe with all 4 batches saved to: {} \n{}".format(concat_outfpath, df.head()))

In [None]:
#@title Combine individual detection boxes into one "superbox" per image

# For images with >1 detection, make a 'super box' that containings all boxes

# Read in crop file exported from "Combine output files A-D" block above
crops = read_datafile(concat_outfpath, sep='\t', header=0, disp_head=False)

# De-normalize cropping coordinates to pixel values
crops = denormalize_coords(crops)

# Make 1 superbox per image [coordinates: bottom left (smallest xmin, ymin) and top right (largest xmax, ymax)]
superboxes = make_superboxes(crops)

# Read in EOL image "breakdown" bundle dataframe from "breakdown_download" bundle used for cropping
if 'bundle' not in locals() or globals():
    bundle = "https://editors.eol.org/other_files/bundle_images/files/images_for_Aves_20K_breakdown_download_000001.txt" #@param {type:"string"}
breakdown = bundle.replace("download_", "") # Get EOL breakdown bundle url from "breakdown_download" address
bundle_info = read_datafile(breakdown, sep='\t', header=0, disp_head=False)

# Add EOL img identifying info from breakdown file to cropping data
crops_w_identifiers = add_identifiers(superboxes, bundle_info)

In [None]:
#@title Make superbox square and within image bounds (Optional: add padding)

# Pad by xx% larger crop dimension
pad = 2 #@param {type:"slider", min:0, max:10, step:2}
pad = pad/100 # Convert to percentage

# Make crops square and within bounds
df = make_square_crops(crops_w_identifiers, pad)

# Export crop coordinates to display_test.tsv to visualize results in next code block and confirm crop transformations
display_test_fpath = os.path.splitext(concat_outfpath)[0] + '_displaytest' + '.tsv'
print("\n File for displaying square crops on images will be saved to: \n", display_test_fpath)
df.to_csv(display_test_fpath, sep='\t', index=False)

# Format image and cropping dimensions for EOL standards
eol_crops = format_crops_for_eol(df)

# Write results to tsv
eol_crops_fpath = os.path.splitext(display_test_fpath)[0].rsplit('_',2)[0] + '_20k_final' + '.tsv'
eol_crops.to_csv(eol_crops_fpath, columns = eol_crops.iloc[:,:-1], sep='\t', index=False)
print("EOL formatted crops dataset saved to: {} \n{}".format(eol_crops_fpath, eol_crops.head()))

## Display cropping results on images
---

In [None]:
#@title Read in cropping file and display results on images
from wrangle_data import *
import cv2

# If you just ran "Post-process results" above, you do not need to enter anything
# If you ran "Generate crops" during a previous session, enter the path for desired cropping file
if 'display_test_fpath' not in locals() or globals():
    crops_file = "aves_cropcoords_tf2_a" #@param ["aves_cropcoords_tf2_a", "aves_cropcoords_tf2_b", "aves_cropcoords_tf2_c", "aves_cropcoords_tf2_d"] {allow-input: true}
    outfpath = set_outpath(crops_file, cwd)
    display_test_fpath =  os.path.splitext(outfpath)[0].rsplit('_',1)[0] + '_concat_displaytest' + '.tsv'
    print(display_test_fpath)
df = pd.read_csv(display_test_fpath, sep="\t", header=0)
print(df.head())

In [None]:
#@title Choose starting index for crops to display

# Adjust line to right to see up to 50 images displayed at a time
start = 0 #@param {type:"slider", min:0, max:5000, step:50}
stop = start+50

# Loop through images
for i, row in df.iloc[start:stop].iterrows():
    # Read in image
    url = df['eolMediaURL'][i]
    img = url_to_image(url)

    # Draw bounding box on image
    image_wbox, boxcoords = draw_box_on_image(df, i, img)

    # Plot cropping box on image
    _, ax = plt.subplots(figsize=(10, 10))
    ax.imshow(image_wbox)

    # Display image URL and coordinatesabove image
    # Helps with fine-tuning data transforms in post-processing steps above
    plt.title('{} \n xmin: {}, ymin: {}, xmax: {}, ymax: {}'.format(url, boxcoords[0], boxcoords[1], boxcoords[2], boxcoords[3]))