# Object Detection
1. A basic understanding of python programming and deep learning.

Check out this [tutorial](https://makmlclub.github.io/colab.html) to get familiar with Colab, and deep learning basics.

---
Notes on Object detection: </br>
[Object detection](https://en.wikipedia.org/wiki/Object_detection) is a machine learning technique under computer vision that involves locating the presence of objects with a bounding box and types or classes of the located objects in an image.
*   Input: An image with one or more objects, such as a photograph.
*   Output: One or more bounding boxes (e.g. defined by a point, width, and height), and a class label for each bounding box.

In this notebook, we start off with data visualization for object detection.</br>
**Highlights**
* Plotting dataset images with the corresponding bounding boxes.
* Converting bounding boxes among the different labelling formats.


## Notes on Labeling Formats
Listed below are some of the popular labelling formats and how they are used.
* The `coco` format [x_min, y_min, width, height], e.g. [97, 12, 150, 200].
* The `pascal_voc` format [x_min, y_min, x_max, y_max], e.g. [97, 12, 247, 212].
* The `albumentations` format is like pascal_voc, but normalized, in other words: [x_min,
y_min, x_max, y_max]‘, e.g. [0.2, 0.3, 0.4, 0.5].
* The `yolo` format [x, y, width, height], e.g. [0.1, 0.2, 0.3, 0.4]; x, y - normalized bbox center;
width, height - normalized bbox width and height.

Further Reading:
1. [https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch](https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch)

# Data Loading
Along with this notebook, a labeled dataset is provided. The labels are included in [`JSON`](https://www.tutorialspoint.com/json/index.htm) file from [Labelbox](https://labelbox.com). In this section, we shall load the dataset.zip file.

The dataset directory tree is as below:
```
dataset
├─ labels.json
└─ images
```

---
**TODO:**
1. Upload the attached `dataset.zip` file into the current working directory.
2. Unzip the file.

---

**TODO Walk Through:**
2. Notebooks allow the use of terminal commands, however such commands have to be prefixed with the exclamation mark (!) `i.e. !mkdir directory`


In [None]:
# TODO
# 1. Upload the attached dataset zipped file.

In [None]:
# TODO
# 2. Unzip the uploaded file, using the unzip command.

# Data Parsing
The unzipped dataset directory contains a `labels.json`. This file was created using Labelbox which uses a similar format to the `coco` format, but using different keys i.e. `xmin` and `ymin` are represented by `left` and `top` respectively. For this notebook, only `External ID`, `value`, `bbox` keys will be used.

Notes on JSON: </br>
JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.
```
Sample JSON format.
{'key': 'value'}
{
  'name': 'Doe',
  'age': 23
}
```
In the above format, `name` and `age` are the keys, while `Doe` and `23` are the values. The labels.json takes the same format with alot more key and value pairs.


Generally a bounding box  is defined as follows:
```
bbox = ['xmin', 'ymin', 'width', 'height']
```
**NOTE: `xmin` and `ymin` refer to the top left axis of the image.**

---
TODO:
1. Open the `labels.json` file on your local computer using a text editor, and familiarize yourself with the key value pairs. 
2. Find the `objects` key under which you will find a list of each image and a bounding box, one at a time.


## Load and Parse the `.json` File
From the labels.json, we extract a couple of key value pairs and shall them in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) in our working environment.

---
**TODO:**
1. Write a function to extract the following keys and values for each image object from the `.json` file. `External ID`, `value` and `bbox`. The extracted values should be stored in `pandas.DataFrame`.

---
**TODO Walk Through:**
1. This function takes the a json object of the `.json` object as an argument, extracts the key and value pairs and returns a `pandas.DataFrame`.

* Start by initailizing an empty pandas dataframe with the following columns
```
['file_name', 'class', 'xmin', 'ymin', 'width', 'height']
```
External ID == file_name, class == value, bbox contains the bounding box information, `xmin` etc.
* Using a loop extract the image data, please refer to this https://labelbox.com/docs/exporting-data/export-format-detail for the particular keys to use.
* Append each extraction to the dataframe.
* Return the dataframe.
---
Further Reading:
1. https://docs.python.org/3/library/json.html
2. https://labelbox.com/docs/exporting-data/export-format-detail


In [None]:
# Utility imports
import os
import json
import pandas as pd

def load_json(path_to_json):
  """
  This function loads a .json file and returns a json object.
  """
  data = dict()
  if os.path.isfile(path_to_json):
    f = open(path_to_json, 'r')
    data = json.load(f)
    f.close()
  return data

In [None]:
# TODO:
# 1. Write a function to extract the following keys and values for each 
# image object from the `.json` file. `External ID`, `value` and `bbox`. 
# The extracted values should be stored in `pandas.DataFrame`.

json_object = load_json('path_to_json')

def extract_key_value_pairs(json_object):
  """
  This functions takes a json object and returns a dataframe of the required
  key value pairs as columns and the values as rows.
  """
  raise NotImplementedError

# Remove the raise line, after your implementation for the function to work.

In [None]:
# Run to view the dataframe.
image_data = extract_key_value_pairs(json_object)
image_data.head()

## Create Different Label Formats DataFrames
We shall create a `pandas.DataFrame` for both the `coco` format and the `pascal_voc` format. Please refer to the _Notes on Labeling_ for details on labeling formats.

---
**TODO:**
1. Write a function to convert the bounding boxes from the `coco` format to `pascal_voc` format, and returns the corresponding `pandas.DataFrame`.

---
**TODO Walk Through:** </br>
1. The dataframe returned from the `extract_key_value_pairs` function is already in the coco format. 
We can convert from the `coco` format to `pascal_voc` format by;
```
xmax = xmin + width
ymax = ymin + height
```
Refer to _Notes on Labeling Formats_ for details.

In [None]:
# The dataframe image_data, contains bounding boxes already in the coco 
# labelling format.

# TODO:
# 1. Write a function to convert the bounding boxes from the coco format 
#    to pascal_voc format.
def create_pascal_voc_dataframe(image_data):
  """
  This functions converts the bounding box labels from coco format
  to pascal_voc format.
  """
  raise NotImplementedError

In [None]:
# Run to view the output.
pascal_voc_dataframe = create_pascal_voc_dataframe(image_data)
pascal_voc_dataframe.head()

# Plotting images with the bounding boxes.
In this section, we shall load the images and plot them with the corresponding bounding boxes.

---

**TODO:**
1. Write a function that takes as arguments `full_filepaths`, `coco_labels_dataframe`, and the `n_images`, the number of images you wish to plot.

---

**TODO Walk Through:**
1. This is a fairly long function, so we shall break it down into three, `process_images`, `display_image`, `display_images`.
2. `display_image`, this function displays a single image with it's bounding boxes. This has been implemented for you.
3. `display_images`, this function displays `n_images`. This function has also been implemented for you.
4. `process_images`, this function extracts the filepaths and bounding boxes for each image and returns a tuple in the format `(full_filepath`, `[list_of_all_bboxes_for_image])`.

In [None]:
# Utility imports
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches

In [None]:
# Display utilities
def process_images(filepaths, labels, r_images):
  """
  this function extracts the filepaths and bounding boxes for each image 
  and returns a tuple in the format 
  (full_filepath`, `[list_of_all_bboxes_for_image]).
  """
  raise NotImplementedError

# display_image and display_images have been implemented for you.
def display_image(image, bboxes, subplot):
    """Display a single image."""
    ax = plt.subplot(*subplot)
    plt.axis('off')
    
    for bbox in bboxes:
        rect = patches.Rectangle((bbox[0], bbox[1]), bbox[2], bbox[3], \
                                 linewidth=1, edgecolor='w', facecolor='none')
        ax.add_patch(rect)
    ax.imshow(image)
    return (subplot[0], subplot[1], subplot[2] + 1)

def display_images(files):
    """Displays a batch of images."""
    if not isinstance(files, (tuple, list)):
      raise TypeError("Files should be of type of tuple or list.")
    
    rows = int(math.sqrt(len(files)))  
    cols = len(files) // rows
    
    FIGSIZE = 13.0
    SPACING = 0.1
    
    subplot = (rows, cols, 1)
    if rows < cols:
        plt.figure(figsize=(FIGSIZE, FIGSIZE / cols * rows))
    else:
        plt.figure(figsize=(FIGSIZE / rows * cols, FIGSIZE))
    
    for file, bboxes in files[:rows * cols]:
        image = Image.open(file)
        subplot = display_image(image, bboxes, subplot)
        
    plt.tight_layout()
    plt.show()

In [None]:
# Contains all the images in the images directory
filenames = os.listdir('path_to_images')

# TODO:
# a. Create a list of full file paths using the filenames from os.listdir()
#      and name it filepaths.
filepaths = []

In [None]:
# Display the images with the bounding boxes.
files = process_images(filepaths, image_data, 9)
display_images(files)


# Save Label Data to File.
Finally, for this notebook we shall save three files of label data to file.
1. `coco_label.csv`, a csv file containing `coco` format labels for each image.
2. `pascal_voc_labels.csv`, a csv file containing `pascal_voc` format labels for each image.
3. `instances_train.json`, this is a json file containing the label data in the official coco format. Refer to [https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch](https://www.immersivelimit.com/tutorials/create-coco-annotations-from-scratch) for details.

---

**TODO:**
1. Create the `coco_label.csv` and `pascal_voc_labels.csv`.
2. Write a function that returns an dictionary object representating label data in the official coco format.
3. Write a file to save the coco dictionary object to file.

---
**TODO Walk Through:**
1. Creating the `.csv` is fairly straight forward, refer to the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) for details.
2. Read and understand thoroughly about the official COCO format before writing this function. Refer to the link included in 3 above.
3. The function to write to file is implemented for you.

In [None]:
# 1. Create and save the coco_labels.csv and the pascal_voc_labels.csv

In [None]:
# Notes: The official COCO format saves each bounding box with the id of
#        it's corresponind category. For this set we only have one category.
#        The categories list is one-indexed, meaning the first category is 1,2 etc

categories =  ['brownspot'] 

# 2. Create coco file
def make_coco_file(labels, categories, filenames):
    """Creates a COCO format data structure."""
    raise NotImplementedError
    
# 3. The last function is implemented for you.       
def create_file(coco_data, output_file):
    """Create JSON file of the COCO_DATA."""
    if not isinstance(coco_data, dict):
      raise TypeError("coco_data should be of type dictionary.")
    f = open(output_file, 'w')
    json_str = json.dumps(coco_data, indent=4)
    f.write(json_str)
    f.close()

In [None]:
# The directory to which to save the output file
output_file = 'path/to/folder'

coco_data = make_coco_file(image_data, categories, filenames)
create_file(coco_data, output_file)