Skip to content
This repository has been archived by the owner on Nov 28, 2022. It is now read-only.

Divergence in classification between digits and GRE #27

Closed
rperdon opened this issue Dec 20, 2017 · 26 comments
Closed

Divergence in classification between digits and GRE #27

rperdon opened this issue Dec 20, 2017 · 26 comments

Comments

@rperdon
Copy link

rperdon commented Dec 20, 2017

When running a digits exported model into the GRE, I am getting some divergence in the values of the classification when compared to Digits. In some cases, it is flipping the classification completely to the opposite classification with regards to a binary classifier. When using the Digits REST (non GRE), I am getting identical classifications to Digits so I am wondering where the divergence lies.

I altered the first line in the deploy.prototext of my exported model from
input: "data"
input_shape {
dim: 1
dim: 3
dim: 227
dim: 227
}

To

name: "AlexNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 3 dim: 227 dim: 227 } }
}

I mimicked this based off the deploy.prototext in the GRE model folder and altered the shape to coincide with my model.

I saw in a previous issue that there was a problem with varying confidence and am wondering if it is related. #3

@flx42
Copy link
Member

flx42 commented Dec 20, 2017

Are you using the caffe version of GRE?

@rperdon
Copy link
Author

rperdon commented Dec 20, 2017

I am using the caffe version of GRE.

@flx42
Copy link
Member

flx42 commented Dec 21, 2017

The issue you mentioned had varying confidence results for the same image. Is this the case?

@rperdon
Copy link
Author

rperdon commented Dec 21, 2017

It was actually the 2nd post of that thread that piqued my interest regarding a difference in confidence results.

@flx42
Copy link
Member

flx42 commented Dec 21, 2017

Ah yes. It might be the preprocessing that's the culprit. Unfortunately the current code is not very generic to this regard.

@rperdon
Copy link
Author

rperdon commented Dec 21, 2017

I'm hoping we can identify the cause of the divergence as the GRE would work perfectly for what I need it to do.

@flx42
Copy link
Member

flx42 commented Dec 21, 2017

Unfortunately, it's challenging to debug since I don't have access to your model. Do you know what DIGITS is using in the pre-processing steps? Crops? resizes? Augmentation?

@rperdon
Copy link
Author

rperdon commented Dec 21, 2017

For the Model portion; Subtract Mean, no crop option. On the Database, the default it had was squash image size 256x256, color image encoding png. On the deploy.prototxt, the dimensions are as above for the exported model. Let me know if you require more details.

I just finished running a comparison of the false positives of GRE vs Digits REST for the same model;
Sample set of random non-trained images: 6323

Digits REST False positive: 101 GRE False positive: 366

If I can get the GRE to match up to how the Digits REST API outputs it would greatly improve our classification times on our material. Due to limitations of our current image carving tool and the Digits REST api, large volumes of calls overwhelms the REST API. The GRE solves this problem for us.

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

Any thoughts on this? I'm hoping Griffin and I can work out a solution with you.

@flx42
Copy link
Member

flx42 commented Dec 28, 2017

If you pre-resize the images to the right size (227x227?) before sending them to GRE, does it change anything?

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

Just toss them into photoshop, do a bilinear resize to 227x227 for all the images prior to GRE? In the classification code you have, do you use opencv for that part? I wonder if I can do it there before its classified.

@flx42
Copy link
Member

flx42 commented Dec 28, 2017

Yes I use OpenCV for that.
For now I'm just trying to find out if it's the resize that's the culprit. So take this 227x227 image and send it to both GRE and DIGITS.

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

I can test it out.

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

Image blilinear resize to 227x227

Digits classify.py:
0: 21.5443
1: 78.4557%

GRE:
0: 0.5819
1: 0.4181

Digits unaltered image:
0: 21.8740
1: 78.1260

GRE unaltered image:
0: 0.553
1: 0.447

I find it interesting that a pre-resize of the image alters the values in Digits as well.

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

I will try a different approach; I'm thinking since I've worked a lot with the classify.py file from digits, I can try to modify that to see if I can reproduce the same value that the GRE produces.

@flx42
Copy link
Member

flx42 commented Dec 28, 2017

Ok. And it should be pretty straightforward to compare the pre-processing steps from DIGITS and gpu-rest-engine. It's just a few lines of Python/C++ code.
I think the resize is the only piece that might differ (since you can have multiple interpolation algorithms). The other steps should produce the same results, but since we still have a divergence, it's probably that it's not the same operations that are used.

@rperdon
Copy link
Author

rperdon commented Dec 28, 2017

Griffin has mentioned something about mapping values or weights being done differently with regards to DeepDetect. I'm wondering if something similar is happening here. I'll have to check back with him on this idea of what was happening.

@laceyg
Copy link

laceyg commented Dec 28, 2017

Originally I thought there may an issue with something like dimension ordering (e.g. NKHW) but it seems unlikely that this is the culprit here. I agree with @flx42 that something like resizing may be the problem - something like a different interpolation being used.

Let me see if I can reproduce this divergence with a simple example.

@flx42
Copy link
Member

flx42 commented Dec 28, 2017

I think we ruled out the resize now, since @rperdon tried with images that are already at 227x227. So the culprit is probably something else: mean subtraction, cropping, etc.

@rperdon
Copy link
Author

rperdon commented Dec 29, 2017

I have played further with some other images which were flipping the classifications.

I resized in photoshop, baseline compression, (previous was png, no compression), bilinear to 227x227.

Digits:
0 95.79
1 4.21

GRE
0 0.501
1 0.499

Unaltered: (1024x1024)
Digits:
0 92.2
1 7.8

GRE
0 0.042
1 0.958

I made a mistake in the Digits classification example.py compared to Digits REST comparisons I was using: The default is squash and subtract pixel for Digits Classification example.py while in Digits I was working with subtract image. The model I am working as with the desired results is running from Digits REST, not the example.py so I will have to do some work to ensure that example.py is modified to represent what Digits is doing before being able to properly compare resizing code changes.

@rperdon
Copy link
Author

rperdon commented Dec 29, 2017

NVIDIA/DIGITS#169

I think this may provide key insight to adapting GRE to work like Digits. I'm just wrapping my head around the implications and hopefully we can sort this out.

Edit: I retrained the model using mean pixel instead of mean image then reloaded the model into GRE.
GRE was still consistent with the images it flipped for classifications. So image vs pixel isn't the missing link.

@rperdon
Copy link
Author

rperdon commented Jan 2, 2018

I did a test of the differences of the resize function of PIL which the classification example of Digits uses and OpenCV's resize function:

min; mean, max:

OpenCV
(15, 160.10562983950786, 254)
SciPy - used by PIL
(15, 160.21342674351661, 254)

Noting a discrepancy in the mean value of the image, I think this is one part of the divergence problem. I do suspect more reasons but to me its feeling like there is a particular "order" of operations that each image load, mean subtract (pixel/image), crop, resize order that there is no defined standard to follow.

I thought this was an interesting read about "unknown" operations done by various resize operations of matrices. In this example, opencv vs matlab

https://stackoverflow.com/questions/21997094/why-opencv-cv2-resize-gives-different-answer-than-matlab-imresize

@flx42
Copy link
Member

flx42 commented Jan 2, 2018

I would expect that a difference in resize could explain a divergence in the order of 10^-2 or 10^-3, not more.

@rperdon
Copy link
Author

rperdon commented Jan 3, 2018

I did some further testing between scipy, opencv and skimage and found each produced different results leading me to believe something like anti-aliasing could be a problem.

Digits classification example uses scipy which its resize function is deprecated now.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.imresize.html

I want to read more into opencv's and skimage to see if I can get them to generate the same results.

@rperdon
Copy link
Author

rperdon commented Jan 4, 2018

Some more reading info on differences.

short version: scipy does a conversion to uint8, is now deprecated
Something with a 90% accuracy in scipy resize, drops to 60% with skimage resize and drops to 53% accuracy in opencv.

@flx42
Copy link
Member

flx42 commented Jul 20, 2018

There has been other back and forth on email about this issue, I will close this.

@flx42 flx42 closed this as completed Jul 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants