Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

[OD] 6.0 inference regression between CPU and GPU. #2955

Merged
merged 6 commits into from
Jan 28, 2020

Conversation

jakesabathia2
Copy link
Collaborator

@jakesabathia2 jakesabathia2 commented Jan 24, 2020

close #2865

6.0's CPU and GPU has prediction regression.
For the same image,
GPU:

#predictions
{'confidence': 0.6841502785682678, 'type': 'rectangle', 'coordinates': {'y': 167.98015236854553, 'x': 289.0607535839081, 'height': 238.21115112304688, 'width': 301.1848449707031}, 'label': 'Croissant'}
{'confidence': 0.5636424422264099, 'type': 'rectangle', 'coordinates': {'y': 168.73490810394287, 'x': 351.99418142437935, 'height': 183.61944580078125, 'width': 161.6607208251953}, 'label': 'Croissant'}
{'confidence': 0.5182375311851501, 'type': 'rectangle', 'coordinates': {'y': 101.34550631046295, 'x': 123.91261160373688, 'height': 176.6996612548828, 'width': 241.56607055664062}, 'label': 'Croissant'}
{'confidence': 0.4028134346008301, 'type': 'rectangle', 'coordinates': {'y': 210.60317158699036, 'x': 199.7049316763878, 'height': 175.12136840820312, 'width': 237.15415954589844}, 'label': 'Croissant'}
{'confidence': 0.1619817614555359, 'type': 'rectangle', 'coordinates': {'y': 106.30784332752228, 'x': 233.0048382282257, 'height': 219.98683166503906, 'width': 288.35015869140625}, 'label': 'Croissant'}
{'confidence': 0.10118640214204788, 'type': 'rectangle', 'coordinates': {'y': 213.20754289627075, 'x': 361.471713334322, 'height': 101.99461364746094, 'width': 127.2068862915039}, 'label': 'Croissant'}
{'confidence': 0.027622569352388382, 'type': 'rectangle', 'coordinates': {'y': 65.32773971557617, 'x': 268.4330843389034, 'height': 119.54254913330078, 'width': 164.8693084716797}, 'label': 'Croissant'}
{'confidence': 0.010357137769460678, 'type': 'rectangle', 'coordinates': {'y': 84.39711928367615, 'x': 118.78975331783295, 'height': 107.74749755859375, 'width': 172.99261474609375}, 'label': 'Croissant'}
{'confidence': 0.00467892037704587, 'type': 'rectangle', 'coordinates': {'y': 90.58223068714142, 'x': 315.67183434963226, 'height': 175.4208984375, 'width': 256.6126708984375}, 'label': 'Croissant'}

#evaluation
{'average_precision': {'Coffee': 0.0, 'Croissant': 0.23888888955116272, 'Waffle': 0.0, 'Bagel': 0.0, 'Egg': 0.0, 'Banana': 0.0}}

CPU:

{'confidence': 0.6783925890922546, 'type': 'rectangle', 'coordinates': {'y': 167.71613359451294, 'x': 288.49523663520813, 'height': 239.19659423828125, 'width': 301.3323974609375}, 'label': 'Croissant'}
{'confidence': 0.5581294298171997, 'type': 'rectangle', 'coordinates': {'y': 101.20467245578766, 'x': 123.50158989429474, 'height': 175.7798309326172, 'width': 240.85716247558594}, 'label': 'Croissant'}
{'confidence': 0.5499178171157837, 'type': 'rectangle', 'coordinates': {'y': 168.42872500419617, 'x': 351.87367647886276, 'height': 183.6824951171875, 'width': 160.7376708984375}, 'label': 'Croissant'}
{'confidence': 0.4112585783004761, 'type': 'rectangle', 'coordinates': {'y': 210.60165166854858, 'x': 200.05664974451065, 'height': 175.11065673828125, 'width': 237.96539306640625}, 'label': 'Croissant'}
{'confidence': 0.18294312059879303, 'type': 'rectangle', 'coordinates': {'y': 106.45350515842438, 'x': 233.51837396621704, 'height': 220.2215576171875, 'width': 289.7857971191406}, 'label': 'Croissant'}
{'confidence': 0.11228813976049423, 'type': 'rectangle', 'coordinates': {'y': 213.52950632572174, 'x': 361.25523895025253, 'height': 100.96805572509766, 'width': 126.85247039794922}, 'label': 'Croissant'}
{'confidence': 0.06214667111635208, 'type': 'rectangle', 'coordinates': {'y': 88.37236762046814, 'x': 264.0897363424301, 'height': 159.9544677734375, 'width': 179.37753295898438}, 'label': 'Croissant'}
{'confidence': 0.010909981094300747, 'type': 'rectangle', 'coordinates': {'y': 84.82984900474548, 'x': 119.08041089773178, 'height': 108.2442855834961, 'width': 171.14401245117188}, 'label': 'Croissant'}
{'confidence': 0.004227335564792156, 'type': 'rectangle', 'coordinates': {'y': 109.82267260551453, 'x': 322.89858385920525, 'height': 192.29393005371094, 'width': 212.18304443359375}, 'label': 'Croissant'}

#evaluation
{'average_precision': {'Coffee': 0.0, 'Croissant': 0.28333333134651184, 'Waffle': 0.0, 'Bagel': 0.0, 'Egg': 0.0, 'Banana': 0.0}}

The regression from both predict() and evaluate() are not neglectable.

[First Step] Compare 5.8's CPU, GPU and 6.0's CPU, GPU
I found that only 6.0's CPU has inference regression, that is to say we have issue in tensorflow.

[Second Step] Compare tensorflow with mxnet
I loaded the same weight to tf and mxnet model, and compare the output layer by layer,
the max error for the output feature tensor is 5 * 1e-5 magnitude, which is good.

[Third Step] Compare raw output from tensorflow and mxnet through tc's API
I compared the raw output tensor before the nms for tensorflow and mxnet through tc's API,
surprising the output tensor it self has error up to 0.7.

[Fourth Step] Compare raw input for tensorfow and mxnet model
I compared the input tensor for tensorflow and mxnet and found out they have error up to 0.17 hmm.

[Fifth Step] Mock out the augmenter
I resize all images to 412 * 412 beforehand to mock out the effect of the tf image augmenter, and observed perfect aligned result across tf and mxnet.

[Sixth Step] TF's resize bilinear has regression haha
Found some existing issue report for tensorflow's resize bilinear has regression from other open source API like cv2, mxnet haha.
So I replace the tf resize by cv2.

Now finally we got the same predictions and map!!
GPU:

{'confidence': 0.6841502785682678, 'type': 'rectangle', 'coordinates': {'y': 167.98015236854553, 'x': 289.0607535839081, 'height': 238.21115112304688, 'width': 301.1848449707031}, 'label': 'Croissant'}
{'confidence': 0.5636424422264099, 'type': 'rectangle', 'coordinates': {'y': 168.73490810394287, 'x': 351.99418142437935, 'height': 183.61944580078125, 'width': 161.6607208251953}, 'label': 'Croissant'}
{'confidence': 0.5182375311851501, 'type': 'rectangle', 'coordinates': {'y': 101.34550631046295, 'x': 123.91261160373688, 'height': 176.6996612548828, 'width': 241.56607055664062}, 'label': 'Croissant'}
{'confidence': 0.4028134346008301, 'type': 'rectangle', 'coordinates': {'y': 210.60317158699036, 'x': 199.7049316763878, 'height': 175.12136840820312, 'width': 237.15415954589844}, 'label': 'Croissant'}
{'confidence': 0.1619817614555359, 'type': 'rectangle', 'coordinates': {'y': 106.30784332752228, 'x': 233.0048382282257, 'height': 219.98683166503906, 'width': 288.35015869140625}, 'label': 'Croissant'}
{'confidence': 0.10118640214204788, 'type': 'rectangle', 'coordinates': {'y': 213.20754289627075, 'x': 361.471713334322, 'height': 101.99461364746094, 'width': 127.2068862915039}, 'label': 'Croissant'}
{'confidence': 0.027622569352388382, 'type': 'rectangle', 'coordinates': {'y': 65.32773971557617, 'x': 268.4330843389034, 'height': 119.54254913330078, 'width': 164.8693084716797}, 'label': 'Croissant'}
{'confidence': 0.010357137769460678, 'type': 'rectangle', 'coordinates': {'y': 84.39711928367615, 'x': 118.78975331783295, 'height': 107.74749755859375, 'width': 172.99261474609375}, 'label': 'Croissant'}
{'confidence': 0.00467892037704587, 'type': 'rectangle', 'coordinates': {'y': 90.58223068714142, 'x': 315.67183434963226, 'height': 175.4208984375, 'width': 256.6126708984375}, 'label': 'Croissant'}
{'average_precision': {'Coffee': 0.0, 'Croissant': 0.23888888955116272, 'Waffle': 0.0, 'Bagel': 0.0, 'Egg': 0.0, 'Banana': 0.0}}

CPU:

{'confidence': 0.6868035197257996, 'type': 'rectangle', 'coordinates': {'y': 167.95871257781982, 'x': 289.0740305185318, 'height': 238.32237243652344, 'width': 301.1378479003906}, 'label': 'Croissant'}
{'confidence': 0.5671409368515015, 'type': 'rectangle', 'coordinates': {'y': 168.71926188468933, 'x': 351.95813924074173, 'height': 183.58172607421875, 'width': 161.74087524414062}, 'label': 'Croissant'}
{'confidence': 0.5187163352966309, 'type': 'rectangle', 'coordinates': {'y': 101.33549273014069, 'x': 123.91094863414764, 'height': 176.7119598388672, 'width': 241.39181518554688}, 'label': 'Croissant'}
{'confidence': 0.4052538275718689, 'type': 'rectangle', 'coordinates': {'y': 210.60553193092346, 'x': 199.70719814300537, 'height': 175.0858612060547, 'width': 237.0017547607422}, 'label': 'Croissant'}
{'confidence': 0.1624731868505478, 'type': 'rectangle', 'coordinates': {'y': 106.30392730236053, 'x': 233.04037749767303, 'height': 220.01197814941406, 'width': 288.4682312011719}, 'label': 'Croissant'}
{'confidence': 0.10168814659118652, 'type': 'rectangle', 'coordinates': {'y': 213.19506168365479, 'x': 361.4436239004135, 'height': 101.99732971191406, 'width': 127.12554931640625}, 'label': 'Croissant'}
{'confidence': 0.027944037690758705, 'type': 'rectangle', 'coordinates': {'y': 65.33297896385193, 'x': 268.4473067522049, 'height': 119.56062316894531, 'width': 164.90599060058594}, 'label': 'Croissant'}
{'confidence': 0.010488401167094707, 'type': 'rectangle', 'coordinates': {'y': 84.40530896186829, 'x': 118.79546642303467, 'height': 107.73643493652344, 'width': 173.0646209716797}, 'label': 'Croissant'}
{'confidence': 0.004655329044908285, 'type': 'rectangle', 'coordinates': {'y': 90.5926376581192, 'x': 315.64265191555023, 'height': 175.38531494140625, 'width': 256.51568603515625}, 'label': 'Croissant'}
{'average_precision': {'Coffee': 0.0, 'Croissant': 0.23888888955116272, 'Waffle': 0.0, 'Bagel': 0.0, 'Egg': 0.0, 'Banana': 0.0}}

@jakesabathia2 jakesabathia2 changed the title change tf resize augmenter. [OD] 6.0 inference regression between CPU and GPU. Jan 24, 2020
@guihao-liang
Copy link
Collaborator

OMG! I'm impressed by your effort to lock this down!

One fundamental question, if the input resizing produces different results, the output should be different between 5.8 and 6.0. Did we do any functional comparison between 5.8 and 6.0?

@@ -8,6 +8,7 @@
from __future__ import absolute_import as _

import numpy as np
import cv2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong, but I'm not sure we have taken a dependency on OpenCV. Is there a compelling reason for us to take a dependency on OpenCV here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
Tensorflow's resize function is not aligned with mps and Mxnet's,
which causes the inference regression as I show in the summary.
Numpy itself doesn't has resize method for image,
and that's why I'm using cv2's resize method now,
which is consistent between mps and mxnet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to get legal approval to depend on cv2, and it's a huge dependency to pull in if it's needed only for image resizing. We already resize images in many other places (see, e.g., the image_deep_feature_extraction code path).

If there is a way to use PIL or one of our C++ image resizing utilities instead, we should do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoytak Thanks for this suggestion!

@jakesabathia2
Copy link
Collaborator Author

OMG! I'm impressed by your effort to lock this down!

One fundamental question, if the input resizing produces different results, the output should be different between 5.8 and 6.0. Did we do any functional comparison between 5.8 and 6.0?

I guess this should be done by benchmark's back comp.
But we only look at evaluation results (map) on the whole dataset, we might not be able to find this regression, since we average through the whole dataset.
So yes I think it is a great suggestion to compare predictions between 5.8 and 6.0 for each image in our benchmark pipeline @TobyRoseman .

@jakesabathia2
Copy link
Collaborator Author

jakesabathia2 commented Jan 27, 2020

MAP difference comparison :

#gpu
{'average_precision_50': {'Coffee': 0.3550470471382141, 'Croissant': 0.560469388961792, 'Waffle': 0.6576665043830872, 'Bagel': 0.6644784808158875, 'Egg': 0.6086002588272095, 'Banana': 0.9863040447235107}}

#Opencv
{'average_precision_50': {'Coffee': 0.3550470471382141, 'Croissant': 0.5609853863716125, 'Waffle': 0.6552671790122986, 'Bagel': 0.665098249912262, 'Egg': 0.6083458662033081, 'Banana': 0.9863040447235107}}

#Turicreate build in resize
{'average_precision_50': {'Coffee': 0.3550470471382141, 'Croissant': 0.5899662971496582, 'Waffle': 0.6519489288330078, 'Bagel': 0.6636723279953003, 'Egg': 0.6076997518539429, 'Banana': 0.9895591139793396}}

#Tensorflow
{'average_precision_50': {'Coffee': 0.3550470471382141, 'Croissant': 0.5936917066574097, 'Waffle': 0.6564058661460876, 'Bagel': 0.6663496494293213, 'Egg': 0.5886093974113464, 'Banana': 0.9898990392684937}}

#PIL
{'average_precision_50': {'Coffee': 0.3774999976158142, 'Croissant': 0.5606743693351746, 'Waffle': 0.6497138142585754, 'Bagel': 0.6666355729103088, 'Egg': 0.5981951355934143, 'Banana': 0.98731929063797}}

@nickjong @srikris seems like if we are not using opencv, the build in image_resize function in turicreae produces the closest result.

np_img /= 255.
return np_img

def resize_turicretae_image(image, output_shape):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*turicreate

Even then, I don't love the name of this function. But in the long-run, if this approach proves stable and accurate, we should move this resizing to the C++ side anyway. There's no point in calling the C++ resizing code from Python from C++, once we converge on the right algorithm

@jakesabathia2
Copy link
Collaborator Author

pass gitlab.

@@ -8,6 +8,8 @@
from __future__ import absolute_import as _

import numpy as np
from PIL import Image
import PIL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to import PIL? Would the line above this one suffice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good!

@jakesabathia2 jakesabathia2 merged commit a3b38c6 into apple:master Jan 28, 2020
@jakesabathia2 jakesabathia2 deleted the od_eval_regression branch January 28, 2020 22:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Numerical differences in OD evaluation (in 5.8 and 6.0) on CPU and GPU
5 participants