# Cloud APIs for Computer Vision: Up and Running in 15 Minutes

This code is part of [Chapter 8- Cloud APIs for Computer Vision: Up and Running in 15 Minutes ](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch08.html).

## Compile Results for Image Tagging

In this file we will compile the results using the ground truth and the collected data for all the test images. You will need to edit the following: 

1. Please edit `PATH_TO_IMAGES` with the path to the test images that have been used for the experiments. 
2. If you used different filenames for the prediction filenames, please edit the filenames accordingly.
3. Please download Gensim, which we will be using for comparing word similarity between ground truth with predicted class. The paths to the `GoogleNews-vectors-negative300.bin` will need to be updated in the following cells. You can even try using ConceptNet, a feature [currently unavailble in Gensim](https://github.com/RaRe-Technologies/gensim/issues/1296).

In [8]:
import json

Load the ground truth JSON file. 

In [9]:
with open("./data/final_gt_tags.json") as json_file:
    ground_truth = json.load(json_file)

In [10]:
# helper functions to get image name from image id and converse.
def get_id_from_name(name):
    return int(name.split("/")[-1].split(".jpg")[0])


def get_name_from_id(imgId):
    filename = "PATH_TO_IMAGES" + \
        "000000" + str(imgId) + ".jpg"
    return filename

In [11]:
# Class ids to their string equivalent
with open('./data/clsid_name.json') as f:
    clsid_name = json.load(f)

In [12]:
# Class names to their class ids
with open('./data/name_clsid.json') as f:
    name_clsid = json.load(f)

In [13]:
print(clsid_name)

{'57': 'carrot', '7': 'train', '24': 'zebra', '53': 'apple', '20': 'sheep', '59': 'pizza', '90': 'toothbrush', '56': 'broccoli', '48': 'fork', '5': 'airplane', '61': 'cake', '34': 'frisbee', '70': 'toilet', '44': 'bottle', '8': 'truck', '84': 'book', '13': 'stop sign', '22': 'elephant', '79': 'oven', '55': 'orange', '88': 'teddy bear', '64': 'potted plant', '77': 'cell phone', '6': 'bus', '85': 'clock', '78': 'microwave', '32': 'tie', '31': 'handbag', '72': 'tv', '35': 'skis', '3': 'car', '50': 'spoon', '62': 'chair', '65': 'bed', '16': 'bird', '75': 'remote', '15': 'bench', '4': 'motorcycle', '67': 'dining table', '80': 'toaster', '10': 'traffic light', '18': 'dog', '21': 'cow', '36': 'snowboard', '81': 'sink', '51': 'bowl', '76': 'keyboard', '47': 'cup', '89': 'hair drier', '14': 'parking meter', '49': 'knife', '39': 'baseball bat', '17': 'cat', '19': 'horse', '40': 'baseball glove', '86': 'vase', '37': 'sports ball', '58': 'hot dog', '1': 'person', '73': 'laptop', '74': 'mouse', '23

## Helper functions

In [14]:
def convert_clsid_to_str(l):
    # l is of the format [clsid, clsid ..]
    result = []
    for clsid in l:
        result.append(clsid_name[str(clsid)])
    #assert len(l) == len(result)
    return result

In [15]:
def parse(l):
    l1 = []
    for each in l:
        if len(each) >= 2:
            l1.append(each.lower())
    return l1

In [16]:
def get_cls_from_pred(l):
    # l = [[cls, 33], [cls, 88], ..]
    return list([item[0] for item in l])

Please download Gensim, which we will be using for comparing word similarity between ground truth with predicted class.

In [17]:
import gensim
from gensim.models import Word2Vec
model = gensim.models.KeyedVectors.load_word2vec_format(
    'PATH TO GoogleNews-vectors-negative300.bin', binary=True)

In [19]:
def check_gensim(word, pred):
    # get similarity between word and all predicted words in returned predictions
    similarity = 0
    for each_pred in pred:
        # check if returned prediction exists in the Word2Vec model
        if each_pred not in model:
            continue
        current_similarity = model.similarity(word, each_pred)
        #print("Word=\t", word, "\tPred=\t", each_pred, "\tSim=\t", current_similarity)
        if current_similarity > similarity:
            similarity = current_similarity
    return similarity



### Parsing

Each cloud provider sends the results in slightly different formats and we need to parse each of them correctly. So, we will develop a parsing function unique to each cloud provider.

#### Microsoft Specific Parsing


In [20]:
#mystring.replace(" ", "_")

def msft_name(imgId):
    return "000000" + str(imgId) + ".jpg"

def parse_msft_inner(word):
    b = word.replace("_", " ")
    c = b.lower().strip().split()
    return c

def parse_msft(l):
    result = []
    b = ""
    for each in l["categories"]:
        a = each["name"]
        result.extend(parse_msft_inner(a))
    for each in l["tags"]:
        a = each["name"]
        result.extend(parse_msft_inner(a))
        if "hint" in each:
            a = each["hint"]
            result.extend(parse_msft_inner(a))
    return list(set(result))

#### Amazon Specific Parsing

In [21]:
def parse_ama(l):
    result = []
    for each in l:
        result.append(each.lower())
    return list(set(result))

#### Google specific parsing

In [22]:
def parse_goog(l):
    l1 = []
    for each in l:
        l1.append(each[0].lower())
        if len(each[0].split()) > 1:
            l1.extend(each[0].split())
    return l1

The `threshold` defines how much similar do two words (ground truth and predicted category name) need to be according to Word2Vec for the prediction to be a correct prediction. You can play around with the `threshold`.

In [23]:
threshold = .3

# variables to compute average number of predictions
avg_gt_len = 0
avg_ama_len = 0
avg_msft_len = 0
avg_goog_len = 0


def calculate_score(ground_truth, predictions, arg):
    total = 0
    correct = 0
    avg_gt_len = 0
    avg_ama_len = 0
    avg_msft_len = 0
    avg_goog_len = 0
    for each in ground_truth.keys():
        gt = list(set(convert_clsid_to_str(ground_truth[each])))
        if gt == None or len(gt) < 1:
            continue
        total += len(gt)
        avg_gt_len += len(gt)
        if arg == "goog":
            pred = predictions[get_name_from_id(each)]
            if pred == None or len(pred) <= 0:
                continue
            pred = parse_goog(predictions[get_name_from_id(each)])
            avg_goog_len += len(pred)
        elif arg == "msft":
            pred = predictions[msft_name(each)]
            if pred == None or len(pred) <= 0:
                continue
            pred = parse_msft(predictions[msft_name(each)])
            avg_msft_len += len(pred)
        elif arg == "ama":
            pred = predictions[get_name_from_id(each)]
            if pred == None or len(pred) <= 0:
                continue
            pred = parse_ama(predictions[get_name_from_id(each)])
            avg_ama_len += len(pred)
        match = 0
        match_word = []
        for each_word in gt:
            # Check if ground truth exists "as is" in the entire list of predictions
            if each_word in pred:
                correct += 1
                match += 1
                match_word.append(each_word)
            # Also, ensure that ground truth exists in the Word2Vec model
            elif each_word not in model:
                continue
            # Otherwise, check for similarity between the ground truth and the predictions
            elif check_gensim(each_word, pred) >= threshold:
                correct += 1
                match += 1
                match_word.append(each_word)
    if arg == "goog":
        print("Google's Stats\nTotal number of tags returned = ", avg_goog_len,
              "\nAverage number of tags returned per image = ",
              avg_goog_len * 1.0 / len(ground_truth.keys()))
    elif arg == "ama":
        print("Amazon's Stats\nTotal number of tags returned = ", avg_ama_len,
              "\nAverage number of tags returned per image = ",
              avg_ama_len * 1.0 / len(ground_truth.keys()))
    elif arg == "msft":
        print("Microsoft's Stats\nTotal number of tags returned = ",
              avg_msft_len, "\nAverage number of tags returned per image = ",
              avg_msft_len * 1.0 / len(ground_truth.keys()))
    print("\nGround Truth Stats\nTotal number of Ground Truth tags = ", total,
          "\nTotal number of correct tags predicted = ", correct)
    print("\nScore = ", float(correct) / float(total))

Now, we are ready to load the predictions that we obtained by using APIs!

In [24]:
# Google
with open('./data/google_tags.json') as f:
    google = json.load(f)

In [25]:
# Get Google Score
calculate_score(ground_truth, google, "goog")

Google's Stats
Total number of tags returned =  59959 
Average number of tags returned per image =  14.602776424744278

Ground Truth Stats
Total number of Ground Truth tags =  12081 
Total number of correct tags predicted =  5754

Score =  0.47628507573876333


**Note**: Microsoft's API for object classification has two versions. The results from both the APIs are different. 

If you want to check out Microsoft's outdated (v1) API then use the `microsoft_tags.json` file. We will be using the latest version (i.e., `microsoft_tags_DESCRIPTION.json`) for our experiments.

In [27]:
# Microsoft
with open('../data_files/microsoft_tags_DESCRIPTION.json') as f:
    microsoft = json.load(f)

In [28]:
# Get Microsoft score
calculate_score(ground_truth, microsoft, "msft")

Microsoft's Stats
Total number of tags returned =  34398 
Average number of tags returned per image =  8.377496346809547

Ground Truth Stats
Total number of Ground Truth tags =  12081 
Total number of correct tags predicted =  6033

Score =  0.4993791904643655


In [29]:
# Amazon
with open('../data_files/amazon_tags.json') as f:
    amazon = json.load(f)

In [30]:
# Get Amazon score
calculate_score(ground_truth, amazon, "ama")

Amazon's Stats
Total number of tags returned =  58512 
Average number of tags returned per image =  14.250365319045299

Ground Truth Stats
Total number of Ground Truth tags =  12081 
Total number of correct tags predicted =  7859

Score =  0.6505256187401706
