In [1]:
import json

with open("instances_train2014.json", "r") as read_file:
    data = json.load(read_file)
    
'''
COCO dataset is split into two parts. The annotations and the images section. The annotations section contains a bounding
box of the objects in absolute pixel coordinates. The images section contains the image width and height as pixel values. 

In the following blocks of code, I create the annotations dictionary and image dictionary in order to normalize the 
annotation bounding boxes from absolute pixel coordinates to relative pixel coordinates (x=100, y= 100) -> (x=.5,y=.5) 
Relative pixel coordinates means the relative position of the point regards to the image. 
If x=.5 and y=.5 it is 50% of the width, 50% of the height at the midpoint of the image.

The reason is that we cannot ensure that the image will always be the same size as the annotations. It might be
corrupted or slightly smaller. In those situations the bounding boxes will be completely wrong.
'''

# Turn to dictionary, because look up time is 1
annotations_dict = {}
image_dict = {}

In [2]:
data['images'][0]

{'license': 5,
 'file_name': 'COCO_train2014_000000057870.jpg',
 'coco_url': 'http://images.cocodataset.org/train2014/COCO_train2014_000000057870.jpg',
 'height': 480,
 'width': 640,
 'date_captured': '2013-11-14 16:28:13',
 'flickr_url': 'http://farm4.staticflickr.com/3153/2970773875_164f0c0b83_z.jpg',
 'id': 57870}

In [3]:
'''
Creates the image dataset with information of the images. I have commented the 2014 and 2017 jsons. 
These json files follow different formats. Choose either depending on which version of the instance.json you are using.

image_dict(Key = ImageID) = {Values, FileName, Image Height, Image Width, CocoURL}


CocoURL is used for downloading the image form the internet to use in ImageVerify.
'''

for i in range(len(data['images'])):
    # For 2014 Json
    file_name = data['images'][i]['file_name'][15:]
    
    # For 2017 Json
    #file_name = data['images'][i]['file_name']
    
    image_id_name = int(''.join([ i.lstrip('0') for i in file_name ]).split('.')[0])
    im_height = data['images'][i]['height']
    im_width = data['images'][i]['width']
    url = data['images'][i]['coco_url']   
    image_dict[image_id_name] = {"filename": file_name, "height":im_height, "width":im_width, "coco_url":url}
    

In [4]:
len(data['images'])

82783

In [5]:
#len(data['images'])
len(image_dict)

68997

In [27]:
len(data['annotations'])

604907

In [28]:
''' 
Creates the annotation dataset with information of the images. 

I check if the image id is inside the image set.
If it is, I normalize the annotation bounding box with the image data set's width and height of the image.

I need to check if the image id is in the image set in case there is some mistake in the annotations or missing data.

Annotations_dict[Key = ImageID] = [Value, An array of dictionaries containing the {Category ID and Bounding Box information}]
'''
annotLength = len(data['annotations'])
print("Annotatioan")
print(data['annotations'][4])
for i in range(annotLength):
    image_id = data['annotations'][i]['image_id']
    bbox = data['annotations'][i]['bbox']
    category_id = data['annotations'][i]['category_id']
    if (image_id in image_dict):
        im_width = image_dict[image_id]['width']
        im_height = image_dict[image_id]['height']
        bbox = [bbox[0]/im_width, bbox[1]/im_height, bbox[2]/im_width, bbox[3]/im_height]
        
#         print(image_id)
        
        if annotations_dict.get(image_id) == None:
            annotations_dict[image_id] = [{"category_id":category_id, "bbox": bbox}]
        else:
            annotations_dict[image_id].append({"category_id":category_id, "bbox": bbox})
    
    

Annotatioan
{'segmentation': [[294.94, 115.52, 255.61, 135.18, 228.58, 163.45, 224.89, 181.88, 210.14, 188.02, 213.83, 191.71, 211.37, 210.14, 197.86, 262.99, 195.4, 292.48, 192.94, 303.54, 195.4, 303.54, 190.48, 336.72, 185.57, 398.17, 194.17, 414.14, 199.08, 439.95, 226.12, 486.65, 259.3, 508.77, 285.11, 507.54, 315.83, 513.69, 344.1, 511.23, 357.61, 505.08, 385.88, 459.61, 405.54, 419.06, 403.08, 417.83, 404.31, 304.77, 404.31, 285.11, 395.71, 281.42, 393.25, 231.04, 396.94, 232.27, 394.48, 223.66, 388.34, 174.51, 385.88, 165.9, 380.96, 138.87, 361.3, 119.2, 361.3, 119.2, 346.55, 105.69, 329.35, 102.0, 328.12, 108.14, 317.06, 93.4, 288.8, 104.46, 294.94, 109.37]], 'area': 72576.18295, 'iscrowd': 0, 'image_id': 15307, 'bbox': [185.57, 93.4, 219.97, 420.29], 'category_id': 58, 'id': 116}


In [8]:
print(len(annotations_dict))

49459


In [9]:
print(len(image_dict))

68997


In [10]:
# My personal check to see what the annotation_dict values look like
count = 0
for value in annotations_dict:
    print(value)
    print(annotations_dict[value])
    if count > 5:
        break
    count = count + 1

142589
[{'category_id': 58, 'bbox': [0.5768, 0.04818666666666667, 0.42319999999999997, 0.8835466666666666]}, {'category_id': 51, 'bbox': [0.0, 0.07746666666666667, 0.52588, 0.8973333333333333]}]
328812
[{'category_id': 58, 'bbox': [0.31325333333333333, 0.39156, 0.3426933333333333, 0.30924]}, {'category_id': 1, 'bbox': [0.003013333333333333, 0.00396, 0.9969866666666667, 0.9841799999999999]}, {'category_id': 1, 'bbox': [0.0, 0.25918, 0.32162666666666667, 0.46236]}, {'category_id': 54, 'bbox': [0.0, 0.8397, 0.20453333333333334, 0.14572]}]
46298
[{'category_id': 58, 'bbox': [0.4613333333333333, 0.21346875, 0.24266666666666667, 0.6519999999999999]}, {'category_id': 58, 'bbox': [0.3776875, 0.234609375, 0.15164583333333334, 0.6408125]}, {'category_id': 58, 'bbox': [0.19785416666666666, 0.167734375, 0.20931249999999998, 0.7311875]}, {'category_id': 58, 'bbox': [0.5448125, 0.12903125, 0.34981249999999997, 0.666671875]}, {'category_id': 58, 'bbox': [0.16816666666666666, 0.255625, 0.0480416666666

In [11]:
514546 in image_dict

True

In [12]:
'''
The following code loads the results.txt from our darknet running. I add these values into a dictionary 
and parse them accordingly to match the json information. This code goes through the results.txt line by line.
If the image name is in the annotations dictionary, it sets the flag to true. The following lines are added as 
values to the image name key. 

If the flag is false, then following lines don't get added.
'''

"\nThe following code loads the results.txt from our darknet running. I add these values into a dictionary \nand parse them accordingly to match the json information. This code goes through the results.txt line by line.\nIf the image name is in the annotations dictionary, it sets the flag to true. The following lines are added as \nvalues to the image name key. \n\nIf the flag is false, then following lines don't get added.\n"

In [13]:
result_yolov3 = open("result.txt", "r")
result_yolov3_read = result_yolov3.readlines()

In [14]:
count_of_images = 0
num_in_annot = 0

result_dict = {}
found = False

id_found = ""

for line in (result_yolov3_read):
    if "image_name:" in line:
        split_line = line.split(" ")
        split_line_id = int(split_line[1].lstrip('0'))
        #print(split_line_id)
        
        if (split_line_id in annotations_dict):
            num_in_annot = num_in_annot + 1
            
            id_found = split_line_id
            found = True
        else:
            found = False
        count_of_images = count_of_images + 1
    else:
        if (found):
            # Parsing
            line_class_bbox = line.split(",")
            classvalue = line_class_bbox[0].split(" ")[1]
            bboxes = line_class_bbox[1].split(" ")
            x = bboxes[1].split(":")[1]
            y = bboxes[2].split(":")[1]
            w = bboxes[3].split(":")[1]
            h = bboxes[4].split(":")[1][0:-2] #Get rid of the \n
            parsed_line = [classvalue, x, y, w, h]
            if (result_dict.get(id_found) == None):
                result_dict[int(id_found)] = [parsed_line]
            else:
                result_dict[id_found].append(parsed_line)          
    

In [15]:
count_of_images

15989

In [16]:
num_in_annot

6735

In [17]:
len(result_dict)

6652

In [18]:
result_dict

{573815: [['68', '0.246449', '0.903357', '0.158839', '0.12227'],
  ['65', '0.598297', '0.615164', '0.132560', '0.07309'],
  ['64', '0.332449', '0.558978', '0.573376', '0.68089'],
  ['63', '0.564415', '0.291399', '0.355272', '0.45124'],
  ['61', '0.390397', '0.618344', '0.679133', '0.72098'],
  ['42', '0.672371', '0.492103', '0.086674', '0.18850']],
 357511: [['59', '0.757061', '0.046083', '0.155071', '0.09766'],
  ['58', '0.768611', '0.458580', '0.269338', '0.38052'],
  ['57', '0.337773', '0.645175', '0.258740', '0.14374']],
 86192: [['66', '0.437539', '0.455702', '0.260235', '0.33273'],
  ['58', '0.501911', '0.555986', '0.834900', '0.79531'],
  ['40', '0.496816', '0.226324', '0.151100', '0.34123']],
 217186: [['74', '0.190174', '0.588641', '0.052317', '0.02336'],
  ['66', '0.372037', '0.237782', '0.055579', '0.09239'],
  ['58', '0.396611', '0.737496', '0.346709', '0.38761'],
  ['58', '0.817049', '0.828492', '0.378342', '0.32990'],
  ['1', '0.491548', '0.603418', '0.297856', '0.80957']

In [19]:
'''
SINGLE IMAGE TEST - 

I chose image id: 374458. There is a single bounding box found here. Easy to compare and check.
See ImageVerify from the CVProject_ImageValidation folder to visualize the boxes on the image.

In order to check, download the image from the coco image url, and then change the image name in the ImageVerify
code to try a different image. Copy paste the BBox from the ground truth into the ground truth section of the Image Verify.
When copying over the Darknet box, only copy array values 1-4. The 0th element is the category ID.
'''

# Teddy bear
print("Ground Truth - BBOX[Left, Top, W, H]")
print(annotations_dict[374458])
print("---------------")
print("Our Dark Net Results - BBox[Class ID, Middle X, Middle Y, W, H]")
print(result_dict[374458])



Ground Truth - BBOX[Left, Top, W, H]
[{'category_id': 88, 'bbox': [0.2278933333333333, 0.33012, 0.42048, 0.41424]}]
---------------
Our Dark Net Results - BBox[Class ID, Middle X, Middle Y, W, H]
[['78', '0.429364', '0.531142', '0.424895', '0.39797']]


In [20]:
image_dict[374458]

{'filename': '000000374458.jpg',
 'height': 500,
 'width': 375,
 'coco_url': 'http://images.cocodataset.org/train2014/COCO_train2014_000000374458.jpg'}

In [21]:
def groundTruthParse(bbox):
    x_gt=float(bbox_gt[0])
    y_gt=float(bbox_gt[1])
    w_gt=float(bbox_gt[2])
    h_gt=float(bbox_gt[3])
    
    left_gt = float(x_gt)
    right_gt = float((x_gt + w_gt))
    top_gt = float(y_gt)
    bot_gt = float((y_gt + h_gt))
    print(left_gt, top_gt, right_gt, bot_gt)
    box_groundtruth = [left_gt, top_gt, right_gt, bot_gt]
    return box_groundtruth

def yoloBoxParse(bbox):
    x=float(bbox[1])
    y=float(bbox[2])
    w=float(bbox[3])
    h=float(bbox[4])

    left = float((x - w/2))
    right = float((x + w/2))
    top = float((y - h/2))
    bot = float((y + h/2))

    box_yolo = [left, top, right, bot]
    return box_yolo
    

In [22]:
result_dict[374458][0]

['78', '0.429364', '0.531142', '0.424895', '0.39797']

In [23]:
'''
SINGLE IMAGE IOU CALCULATIONS

Calculating the IOU Score from this example. The code is generalizable. The coordinates need to follow the conversions
below for the left_gt,top_gt,right_gt,bot_gt as well as the conversions for left,top,right,bot.
IOU score matches from the calculations in ImageVerify.

#######

Ground truth format from the annotations: top left corner x, top left corner y, width, height
Our Darknet format from the our result_dict: midpoint x, midpoint y, width, height

Functions "groundTruthParse" and "yoloBoxParse" take this into account and get them into the proper format for the
IOU function for calculations. 

#######
'''

# Ground Truth Values
bbox_gt = annotations_dict[374458][0]['bbox']

'''
# Manual Calculations and normalizing

x_gt=bbox_gt[0]
y_gt=bbox_gt[1]
w_gt=bbox_gt[2]
h_gt=bbox_gt[3]
left_gt = float(x_gt)
right_gt = float((x_gt + w_gt))
top_gt = float(y_gt)
bot_gt = float((y_gt + h_gt))
box_groundtruth = [left_gt, top_gt, right_gt, bot_gt]
'''

box_groundtruth = groundTruthParse(bbox_gt)


# Yolo Calculated

'''
# Manual Calculations and normalizing

x = float(result_dict[374458][0][1])
y = float(result_dict[374458][0][2])
w = float(result_dict[374458][0][3])
h = float(result_dict[374458][0][4])
left = float((x - w/2))
right = float((x + w/2))
top = float((y - h/2))
bot = float((y + h/2))
box_yolo = [left, top, right, bot]
'''

box_yolo = yoloBoxParse(result_dict[374458][0])

0.2278933333333333 0.33012 0.6483733333333334 0.74436


In [24]:
'''
IOU CALCULATION FUNCTION
https://gist.github.com/meyerjo/dd3533edc97c81258898f60d8978eddc

Takes in two bounding boxes, each with format [left, top, right, bot]
Returns the IOU score.
''' 

def bb_intersection_over_union(boxA, boxB):
    # determine the (x, y)-coordinates of the intersection rectangle
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])

    # compute the area of intersection rectangle
    interArea = abs(max((xB - xA, 0)) * max((yB - yA), 0))
    if interArea == 0:
        return 0
    # compute the area of both the prediction and ground-truth
    # rectangles
    boxAArea = abs((boxA[2] - boxA[0]) * (boxA[3] - boxA[1]))
    boxBArea = abs((boxB[2] - boxB[0]) * (boxB[3] - boxB[1]))

    # compute the intersection over union by taking the intersection
    # area and dividing it by the sum of prediction + ground-truth
    # areas - the interesection area
    iou = interArea / float(boxAArea + boxBArea - interArea)

    # return the intersection over union value
    return iou

In [25]:
iou = bb_intersection_over_union(box_groundtruth,box_yolo)
print(iou)

'''
The value here is slightly different from the ImageVerify because the ImageVerify multiplies by the 
image width and height then converts to an int. The int cuts a bit of the values off so it is slightly smaller.
'''

0.9225918736128936


'\nThe value here is slightly different from the ImageVerify because the ImageVerify multiplies by the \nimage width and height then converts to an int. The int cuts a bit of the values off so it is slightly smaller.\n'

In [26]:
'''
NEXT STEPS:

1) BIG PROBLEM - NEED FIXING: 
The issue is the category ID's from the coco's json files do not match our assigned category ID's from our darknet.

CoCo's Json Files Category ID's: https://github.com/nightrome/cocostuff/blob/master/labels.md
Our Category ID's: https://gist.github.com/AruniRC/7b3dadd004da04c80198557db5da4bda

We need to figure out a way for them to match for us to move forward. 2014 and 2017 json files follow the SAME category 
annotation names. Our darknet is different. We need to either change the names in the darknet or come up with something else.


2) Right now, I only get the IOU for a single image with a single bounding box. We need to get the IOU for all the images.

Pseudo code: 
    For imageIDKey in Results_dict
        groundtruth_boundingbox = annotations_dict[imageIDKey][0][bbox]
        yolo_boundingbox = result_dict[imageIDKey][0]
        iou = bb_intersection_over_union(groundtruth_boundingbox, yolo_boundingbox)
        
NOTE: That there may return multiple values from a single imageIDKey because there are multiple object detections 
in a single image. In this case, you will need to match the category ID's with each other and calculate the IOU.

NOTE: Now there may be multiple objects with the same category ID. You will need to do a N^2 comparison 
of the values with each other. We can assume that the highest IOU between a pair of bounding boxes should
be matched with each other.

NOTE: There may also be cases where there are extra bounding boxes which are not matched. We need to consider this.
Perhaps ignore them, perhaps take some sort of note. 

3) Once these IOU's are done, take the values and turn them into the pandas dataframe Venky mentioned.
4) Take the pandas into a pickle.

5) Run our code with yolov3, yolov2, and yolo-tiny. Getting the different results.txt


'''

"\nNEXT STEPS:\n\n1) BIG PROBLEM - NEED FIXING: \nThe issue is the category ID's from the coco's json files do not match our assigned category ID's from our darknet.\n\nCoCo's Json Files Category ID's: https://github.com/nightrome/cocostuff/blob/master/labels.md\nOur Category ID's: https://gist.github.com/AruniRC/7b3dadd004da04c80198557db5da4bda\n\nWe need to figure out a way for them to match for us to move forward. 2014 and 2017 json files follow the SAME category \nannotation names. Our darknet is different. We need to either change the names in the darknet or come up with something else.\n\n\n2) Right now, I only get the IOU for a single image with a single bounding box. We need to get the IOU for all the images.\n\nPseudo code: \n    For imageIDKey in Results_dict\n        groundtruth_boundingbox = annotations_dict[imageIDKey][0][bbox]\n        yolo_boundingbox = result_dict[imageIDKey][0]\n        iou = bb_intersection_over_union(groundtruth_boundingbox, yolo_boundingbox)\n      