# Object Detection

Image classification involves assigning a class label to an image, whereas object localization involves drawing a bounding box around one or more objects in an image. Object detection is more challenging and combines these two tasks and draws a bounding box around each object of interest in the image and assigns them a class label.

For an excellent overview, peruse:

https://www.fritz.ai/object-detection/

Technical resources:

https://machinelearningmastery.com/object-recognition-with-deep-learning/

https://www.tensorflow.org/hub/tutorials/object_detection

https://www.tensorflow.org/hub/tutorials/tf2_object_detection

**Image classification** predicts the type (or class) of an object in an image.
* Input: an image with a single object, such as a photograph.
* Output: a class label (e.g. one or more integers that are mapped to class labels).

**Object localization** involves locating the presence of objects in an image and indicating their location with a bounding box.
* Input: an image with one or more objects, such as a photograph.
* Output: one or more bounding boxes (e.g. defined by a point, width, and height).

**Object Detection** involved locating the presence of objects with a bounding box and types (or classes) of the located objects in an image.
* Input: an image with one or more objects, such as a photograph.
* Output: one or more bounding boxes (e.g. defined by a point, width, and height), and a class label for each bounding box.

# Import **tensorflow** library

Import library and alias it:

In [None]:
import tensorflow as tf

# GPU Hardware Accelerator

To vastly speed up processing, we can use the GPU available from the Google Colab cloud service. Colab provides a free Tesla K80 GPU of about 12 GB. It’s very easy to enable the GPU in a Colab notebook:

1.	click **Runtime** in the top left menu
2.	click **Change runtime** type from the drop-down menu
3.	choose **GPU** from the Hardware accelerator drop-down menu
4.	click **SAVE**

Verify that GPU is available:

In [None]:
tf.__version__, tf.test.gpu_device_name()

# Import Requisite Libraries

Enable access to the TF-hub module:

In [None]:
import tensorflow_hub as hub

For processing an image:

In [None]:
import matplotlib.pyplot as plt
import tempfile
from six.moves.urllib.request import urlopen
from six import BytesIO

For drawing onto an image:

In [None]:
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

General library:

In [None]:
import numpy as np

# Create Functions

Display an image:

In [None]:
def display_image(image):
  fig = plt.figure(figsize=(20, 15))
  plt.grid(False)
  plt.imshow(image)
  plt.axis('off')

Draw bounding box on image:

In [None]:
def draw_bounding_box_on_image(
    image, ymin, xmin, ymax, xmax,
    color, font, thickness=4, display_str_list=()):
  """Adds a bounding box to an image."""
  draw = ImageDraw.Draw(image)
  im_width, im_height = image.size
  (left, right, top, bottom) = (
      xmin * im_width, xmax * im_width,
      ymin * im_height, ymax * im_height)
  draw.line([(left, top), (left, bottom),
             (right, bottom), (right, top),
             (left, top)],
             width=thickness, fill=color)
  # If the total height of the display strings added to the top of the bounding
  # box exceeds the top of the image, stack the strings below the bounding box
  # instead of above.
  display_str_heights = [font.getsize(ds)[1]
                         for ds in display_str_list]
  # Each display_str has a top and bottom margin of 0.05x.
  total_display_str_height = (
      1 + 2 * 0.05) * sum(display_str_heights)
  if top > total_display_str_height:
    text_bottom = top
  else:
    text_bottom = top + total_display_str_height
  # Reverse list and print from bottom to top.
  for display_str in display_str_list[::-1]:
    text_width, text_height = font.getsize(display_str)
    margin = np.ceil(0.05 * text_height)
    draw.rectangle(
        [(left, text_bottom - text_height - 2 * margin),
         (left + text_width, text_bottom)], fill=color)
    draw.text(
        (left + margin, text_bottom - text_height - margin),
        display_str, fill='black', font=font)
    text_bottom -= text_height - 2 * margin

Draw boxes:

In [None]:
def draw_boxes(
    image, boxes, class_names, scores,
    max_boxes=10, min_score=0.1):
  # Overlay labeled boxes on an image with formatted scores and label names.
  colors = list(ImageColor.colormap.values())
  one = '/usr/share/fonts/truetype/liberation/'
  two =  'LiberationSansNarrow-Regular.ttf'
  font_url = one + two
  try:
    font = ImageFont.truetype(font_url, 25)
  except IOError:
    print('Font not found, using default font.')
    font = ImageFont.load_default()
  for i in range(min(boxes.shape[0], max_boxes)):
    if scores[i] >= min_score:
      ymin, xmin, ymax, xmax = tuple(boxes[i])
      display_str = '{}: {}%'.format(
          class_names[i].decode('ascii'),
          int(100 * scores[i]))
      color = colors[hash(class_names[i]) % len(colors)]
      image_pil = Image.fromarray(
          np.uint8(image)).convert('RGB')
      draw_bounding_box_on_image(
          image_pil, ymin, xmin, ymax, xmax,
          color, font, display_str_list=[display_str])
      np.copyto(image, np.array(image_pil))
  return image

# Load a Module

Load an object detection module and apply on the downloaded image:

In [None]:
p1 = 'https://tfhub.dev/google/faster_rcnn/'
p2 = 'openimages_v4/inception_resnet_v2/1'
URL = p1 + p2
module_handle = URL
obj_detect = hub.load(module_handle).signatures['default']

# Load an Image from Google Drive

Mount Google Drive to Colab:

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Be sure that the image is in the *appropriate directory* in **your** Google Drive!

Access and display the image:

In [None]:
img_path = 'gdrive/My Drive/Colab Notebooks/images/cats_dogs.jpg'
pil_image = Image.open(img_path)
display_image(pil_image)

Convert the JPEG image to a PIL image and display it. The Python Imaging Library (PIL) is a library that supports opening, manipulating, and saving many different image file formats. It is also known as the Pillow library.

Check image size:

In [None]:
pil_image.size

# Prepare the Image

Generate a temporary path for the image file:

In [None]:
_, filename = tempfile.mkstemp(suffix='.jpg')
filename

Prepare the image for processing and save it to the temporary file path:

In [None]:
pil_image_rgb = pil_image.convert('RGB')
pil_image_rgb.save(filename, format='JPEG', quality=90)
print('Image downloaded to %s.' % filename)
display_image(pil_image)

# Run Object Detection on the Image

Create a function to load the image:

In [None]:
def load_img(path):
  img = tf.io.read_file(path)
  img = tf.image.decode_jpeg(img, channels=3)
  return img

 The function loads the image and prepares it for the pretrained model.

Create a function to run object detection:

In [None]:
def run_detector(detector, path):
  img = load_img(path)
  converted_img  = tf.image.convert_image_dtype(
      img, tf.float32)[tf.newaxis, ...]
  result = detector(converted_img)
  result = {key:value.numpy()
            for key,value in result.items()}
  print("Found %d objects." %\
        len(result["detection_scores"]))
  image_with_boxes = draw_boxes(
      img.numpy(), result["detection_boxes"],
      result["detection_class_entities"],
      result["detection_scores"])
  display_image(image_with_boxes)

Invoke the detector:

In [None]:
run_detector(obj_detect, filename)

The detector did really well with this image!

Let's try another one:

In [None]:
img_path = 'gdrive/My Drive/Colab Notebooks/images/butterfly.jpg'
pil_image = Image.open(img_path)
display_image(pil_image)

Process:

In [None]:
_, filename = tempfile.mkstemp(suffix='.jpg')
pil_image_rgb = pil_image.convert('RGB')
pil_image_rgb.save(filename, format='JPEG', quality=90)
print('Image downloaded to %s.' % filename)

Run detector:

In [None]:
run_detector(obj_detect, filename)

# Download Images from Wikimedia Commons

We have **already located images** from Wikimedia Commons!

## Get Your Own Images

However, you can locate your own images from Wikimedia Commons by following a few simple steps:

1. go to the following URL: https://commons.wikimedia.org/wiki/Main_Page
2. click **Images**
3. click on an image
4. right click the image
5. select 'Copy link address' from the drop-down menu
6. paste the link address into a code cell
7. surround the link address with single or double quotes
8. assign to a variable 

## Create a Function to Download an Image

Create a function to download, process, and save an image to a temporary file path:

In [None]:
def download_and_resize_image(
    url, new_width=256, new_height=256,
    display=False):
  _, filename = tempfile.mkstemp(suffix='.jpg')
  response = urlopen(url)
  image_data = response.read()
  image_data = BytesIO(image_data)
  pil_image = Image.open(image_data)
  pil_image = ImageOps.fit(
      pil_image, (new_width, new_height),
      Image.ANTIALIAS)
  pil_image_rgb = pil_image.convert('RGB')
  pil_image_rgb.save(
      filename, format='JPEG', quality=90)
  print('Image downloaded to %s.' % filename)
  if display:
    display_image(pil_image)
  return filename

The function generates a temporary path for the image file. It then reads the image file from the supplied URL. The function continues by converting the image file to a PIL image. The PIL image is then resized, converted to RGB, and saved to the temporary file path.

## Load an Image from a URL

Load an image from a Wikimedia Commons URL:

In [None]:
p1 = 'https://upload.wikimedia.org/wikipedia/commons/7/79/'
p2 = 'At_taverna_under_the_church%2C_Ano_Potamia%2C_Naxos%'
p3 = '2C_190574.jpg'
URL = p1 + p2 + p3

downloaded_image_path = download_and_resize_image(
    URL, 1280, 856, True)

The source for the image is located at:

https://commons.wikimedia.org/wiki/File:At_taverna_under_the_church,_Ano_Potamia,_Naxos,_190574.jpg

# Run Object Detection

Run object detection with the function we created earlier in this notebook:

In [None]:
run_detector(obj_detect, downloaded_image_path)

Pretty good. But, not perfect.

Let's try some more images. Piece together some paths:

In [None]:
p1 = 'https://upload.wikimedia.org/wikipedia/commons/4/45/'
p2 = 'Green_Dragon_Tavern_%2836196%29.jpg'
tavern = p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/3/31/'
p2 = 'Circus_Circus_Hotel-Casino_sign.jpg'
casino = p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/9/91/'
p2 = 'Leon_hot_air_balloon_festival_2010.jpg'
balloon = p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/d/d8/'
p2 = '2012_Festival_of_Sail_-_7943922284.jpg'
sail = p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/a/ab/'
p2 = '17_mai_2018.jpg'
flag = p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/4/43/'
p2 = 'Fruit_baskets.jpg'
basket= p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/c/c7/'
p2 = 'Fruit_stands%2C_Rue_de_Seine%2C_Paris_22_May_2014.jpg'
stand= p1 + p2

p1 = 'https://upload.wikimedia.org/wikipedia/commons/9/95/'
p2 = 'Wine_tasting_%40_brown_brothers.jpg'
wine = p1 + p2

Create a function to detect images:

In [None]:
def detect_img(image_url):
  image_path = download_and_resize_image(image_url, 640, 480)
  run_detector(obj_detect, image_path)

Run object detection on one of the images:

In [None]:
detect_img(wine)

Try another one:

In [None]:
detect_img(sail)

Try some of the other scenes.

# Find the Source

We can translate the JPEG link to find the source:

1. substitute **commons** for *upload*
2. change *wikipedia* to **wiki**
3. substitute *commons/(number)/(number)* for **File:**
4. translate the *%(number)* to **HTML encoded equivalent**

Find the encoded equivalent:

https://krypted.com/utilities/html-encoding-reference/

## The First One

Let's try the tavern image:

https://upload.wikimedia.org/wikipedia/commons/4/45/Green_Dragon_Tavern_%2836196%29.jpg

Substitute **commons**:

https://commons.wikimedia.org/wikipedia/commons/4/45/Green_Dragon_Tavern_%2836196%29.jpg

Change to **wiki**:

https://commons.wikimedia.org/wiki/commons/4/45/Green_Dragon_Tavern_%2836196%29.jpg

Change to **File:**

https://commons.wikimedia.org/wiki/File:Green_Dragon_Tavern_%2836196%29.jpg

Translate:

https://commons.wikimedia.org/wiki/File:Green_Dragon_Tavern_(36196).jpg

From the HTML Encoding Reference, **%28:(** and **%29:)**.

Here is the resource for the image:

In [None]:
# https://commons.wikimedia.org/wiki/File:Green_Dragon_Tavern_(36196).jpg

Just copy (sans the hash symbol) and paste into your favorite browser to find the resource for the image! Sometimes the URL doesn't translate correctly in a text cell. So we placed it in a code cell and commented it out.

## The Second One

Let's try the next one.

Result:

In [None]:
# https://commons.wikimedia.org/wiki/File:Circus_Circus_Hotel-Casino_sign.jpg

This one was easy because we didn't need to translate.

## The Rest

Results:

In [None]:
'''
https://commons.wikimedia.org/wiki/File:Leon_hot_air_balloon_festival_2010.jpg
https://commons.wikimedia.org/wiki/File:2012_Festival_of_Sail_-_7943922284.jpg
https://commons.wikimedia.org/wiki/File:17_mai_2018.jpg
https://commons.wikimedia.org/wiki/File:Fruit_baskets.jpg
https://commons.wikimedia.org/wiki/File:Fruit_stands,_Rue_de_Seine,_Paris_22_May_2014.jpg
https://commons.wikimedia.org/wiki/File:Wine_tasting_@_brown_brothers.jpg
'''

Just copy and paste to a browser.