<table class="tfo-notebook-buttons" align="center">
  <td>
    <a target="_blank" href=""><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

# Cloud APIs for Computer Vision: Up and Running in 15 Minutes

This code is part of [Chapter 8- Cloud APIs for Computer Vision: Up and Running in 15 Minutes ](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch08.html).

In this file we will compile the intermediate files that we need for the benchmarking.

## Setup

Please download:

- Gensim, which we will be using for comparing word similarity between ground truth with predicted class. Unzip and place the `GoogleNews-vectors-negative300.bin` within `data_path`. Download at: https://github.com/mmihaltz/word2vec-GoogleNews-vectors
- The validation 2017 split from the MSCOCO website: http://cocodataset.org/#download.

In [1]:
!wget -nc -q -O tmp.zip http://images.cocodataset.org/annotations/annotations_trainval2017.zip && unzip -n tmp.zip && rm tmp.zip
!mkdir data-may-2020
!mv annotations data-may-2020

Archive:  tmp.zip
mkdir: cannot create directory ‘data-may-2020’: File exists
mv: cannot move 'annotations' to 'data-may-2020/annotations': Directory not empty


The directory structure should look as follows: 

```
1-setup.ipynb
2-compile-ground-truth-tags.ipynb
3-upload-validation-images-to-cloud-providers.ipynb
4-compile-results-tags.ipynb
data-may-2020/
|___________ GoogleNews-vectors-negative300.bin
|___________ annotations/
|___________ val2017/

```

In [2]:
data_path = <PATH_TO_IMAGES>
annotation_filename = data_path + '/annotations/instances_val2017.json'

In [3]:
import json
from pprint import pprint

with open(annotation_filename) as data_file:
    annotations = json.load(data_file)

# dictionary mapping from category to name
class_data = annotations["categories"]
dict_class_to_name = dict(
    [(class_data[i]["id"], class_data[i]["name"]) for i in range(len(class_data))]
)

In [4]:
pprint(dict_class_to_name)

{1: 'person',
 2: 'bicycle',
 3: 'car',
 4: 'motorcycle',
 5: 'airplane',
 6: 'bus',
 7: 'train',
 8: 'truck',
 9: 'boat',
 10: 'traffic light',
 11: 'fire hydrant',
 13: 'stop sign',
 14: 'parking meter',
 15: 'bench',
 16: 'bird',
 17: 'cat',
 18: 'dog',
 19: 'horse',
 20: 'sheep',
 21: 'cow',
 22: 'elephant',
 23: 'bear',
 24: 'zebra',
 25: 'giraffe',
 27: 'backpack',
 28: 'umbrella',
 31: 'handbag',
 32: 'tie',
 33: 'suitcase',
 34: 'frisbee',
 35: 'skis',
 36: 'snowboard',
 37: 'sports ball',
 38: 'kite',
 39: 'baseball bat',
 40: 'baseball glove',
 41: 'skateboard',
 42: 'surfboard',
 43: 'tennis racket',
 44: 'bottle',
 46: 'wine glass',
 47: 'cup',
 48: 'fork',
 49: 'knife',
 50: 'spoon',
 51: 'bowl',
 52: 'banana',
 53: 'apple',
 54: 'sandwich',
 55: 'orange',
 56: 'broccoli',
 57: 'carrot',
 58: 'hot dog',
 59: 'pizza',
 60: 'donut',
 61: 'cake',
 62: 'chair',
 63: 'couch',
 64: 'potted plant',
 65: 'bed',
 67: 'dining table',
 70: 'toilet',
 72: 'tv',
 73: 'laptop',
 74: 'mo

In [5]:
with open(data_path + "/class-id-to-name.json", "w") as outfile:
    json.dump(dict_class_to_name, outfile)

# COCO Image ID to category ID

In [6]:
# Dictionary mapping from category to name
image_id_to_category_id = []
for key in annotations["annotations"]:
    image_id_to_category_id.append([key["image_id"], key["category_id"]])

In [7]:
print(len(image_id_to_category_id))

36781


In [8]:
import csv

output_filename = data_path + "/coco-image-id-to-category-id.csv"

with open(output_filename, "w", newline="\n") as f:
    wr = csv.writer(f)
    wr.writerows(image_id_to_category_id)