Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect tiny_yolov2 predictions #6730

Closed
sjaiswal25 opened this issue Nov 17, 2018 · 16 comments
Closed

incorrect tiny_yolov2 predictions #6730

sjaiswal25 opened this issue Nov 17, 2018 · 16 comments
Labels
Bug Bugs and problems DL4J General DeepLearning4j issues

Comments

@sjaiswal25
Copy link

sjaiswal25 commented Nov 17, 2018

I have trained a custom pedestrian detection tiny_yolov2 model using the instructions provided in https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data
The model configuration file can be found in the link : https://gist.github.com/sjaiswal25/df6c1b6b0a87f3cab8e663d22c8fb771 .
The weights are then converted to keras weights using the information provided in here : https://github.com/allanzelener/YAD2K.
I used the keras weight to create a .zip file using the code in the link https://gist.github.com/sjaiswal25/fb3c0eb1fcb425856dfa4c5b479b4c28
When I try to use the converted model with the parallel inference framework as in the code here: https://gist.github.com/sjaiswal25/06ab93868542212d0c07e8a9e8e264ed ,
the output of the getpredictedobjects method is weird: DetectedObject(exampleNumber=0, centerX=0.8525803685188293, centerY=4.001579834730364, width=0.6479467153549194, height=89.76138305664062,...).
Since the input to the model is a 224x224 image, it will result in output predictions of dimension 7x7x30.Hence, it is must that the height cannot be greater than 7.
What could be the cause to this behaviour? Also, I have tried using the same model. weights with python and there are no issues there. I guess I am missing out on something.
Also I would like to bring to your knowledge that if i use my model with the computationalgraph.outputsingle method followed by the yoloutils.getpredicted objects on a single image, this problem doesnot persist, and the results are absolutely ok.
The link to my trained weights : https://drive.google.com/file/d/1qkOwZccVFeZ73jvYVVBPcbnadlX77vSg/view?usp=sharing

@agibsonccc
Copy link
Contributor

cc @maxpumperla @saudet

@maxpumperla
Copy link
Contributor

@sjaiswal25

Also I would like to bring to your knowledge that if i use my model with the computationalgraph.outputsingle method followed by the yoloutils.getpredicted objects on a single image, this problem doesnot persist, and the results are absolutely ok.

do I understand correctly that you use your imported model here, too? just want to narrow down the source of error.

@sjaiswal25
Copy link
Author

Yes

@maxpumperla
Copy link
Contributor

@sjaiswal25 getPredictedObjects expects input of shape [minibatch, 5 * boxSize + channels, height, width]. could you tell us what comes out of the PW call from your gist:

                INDArray result = parallelInference.output(prepareImage(v[0],224,224));

how does that compare to your single example? if you build a mini-batch of examples from your working example, does it suddenly fail?

@sjaiswal25
Copy link
Author

sjaiswal25 commented Nov 17, 2018

@maxpumperla I have not checked that. I will check and get back to you.

@sjaiswal25
Copy link
Author

sjaiswal25 commented Nov 19, 2018

@maxpumperla Following are the outputs you had asked for:
Parallel inference output : - Order: c Shape: [1,30,7,7], stride: [1470,49,7,1]
Single example output :- Order: c Shape: [1,30,7,7], stride: [1470,49,7,1]

Also, if I give getPredictedObjects a minibatch of examples, it doesnot fail.

@maxpumperla
Copy link
Contributor

@sjaiswal25 thanks, I'd need to look closer into this. @saudet have you encounter this or have a clue what might be going on? you've been worker closer on the actual yolo implementation.

@AlexDBlack
Copy link
Contributor

Also, if I give getPredictedObjects a minibatch of examples, it doesnot fail.

This stands out as an important detail.
Maybe there's some sort of ND4J indexing edge case with leading 1s in the shape? If so, it probably isn't related to Keras import.
I'll try to take a look at this today.

@AlexDBlack AlexDBlack added DL4J General DeepLearning4j issues Bug Bugs and problems labels Nov 20, 2018
@AlexDBlack
Copy link
Contributor

So I started to look at this, and I can't work out how to run your code using the provided trained weights. I assume these weights are the original darknet weights?
If so, can you provided the DL4J (.zip) model? That would make this much quicker/easier for me to run (I'm assuming it isn't an issue specific to import).

@sjaiswal25
Copy link
Author

@AlexDBlack I've attached the model links. please have a look into it.
tiny_224.zip
tiny_416.zip

@AlexDBlack
Copy link
Contributor

@sjaiswal25 thanks. I'll take a look and will post here once I've worked out what's happening.

@AlexDBlack AlexDBlack self-assigned this Nov 23, 2018
@AlexDBlack
Copy link
Contributor

@sjaiswal25 so I still can't make any progress at all debugging this
I can't get either of the provided networks to predict anything at all on the VOC dataset (or at least the first few thousand or so images using the threshold you have in your code)
Though the output activations are the same for standard ComputationGraph and PI in all cases.
Here's what I've attempted to use: https://gist.github.com/AlexDBlack/a98a50ce1e2db323e0dcc4d8a3a9ea30

Can you please provide a complete example I can run (perhaps adapted from my gist above if that helps) along with at least one test image/file to reproduce this?
Trying to piece something together to reproduce the problem like this is... inefficient, to put it mildly.

Also post your DL4J/ND4J version, backend used, and GPU (if applicable).

@sjaiswal25
Copy link
Author

sjaiswal25 commented Nov 23, 2018

@AlexDBlack The network is trained for pedestrian detection only.I will also share link to one of the videos we have tested it on. Please find the link below:
https://drive.google.com/file/d/1wRsnQ2jucqpNcZRfNHzINYJ0qfvQ-8Dv/view?usp=sharing (the codes in the above link are adapted and modified from a github project: https://github.com/PacktPublishing/Java-Deep-Learning-Projects/tree/master/Chapter06).

The same model was tested on single image with the link to the codes below:
https://drive.google.com/file/d/19Qcbxx3DE1Hud2ZRDA6QyvTrBhR-NrBW/view?usp=sharing
(the codes in the above link are adapted and modified from a github project : https://github.com/jesuino/java-ml-projects/tree/master/utilities/yolo-dl4j)

The gpu that is being used is nvidia gtx 1050Ti.

@AlexDBlack
Copy link
Contributor

@sjaiswal25 OK, I was finally able to reproduce this problem on 1.0.0-beta3 using a modified version of your code/models/data
Of note, it's limited to CUDA + ParallelInference only and is rare/intermittent/stochastic
I have a pretty good idea why (CUDA multithreading issues), I'm testing a solution now.

@AlexDBlack
Copy link
Contributor

@sjaiswal25 So, it wasn't exactly what I thought, but it is fixed here: #6774

A workaround that should work for 1.0.0-beta3 is to add an Nd4j.getExecutor().commit() before you pass data to the ParallelInference instance (for example, add that as the final line of your prepareImage method).

As for what was happening here: in simple terms, a race condition between asynchronous CUDA execution of normalization (CUDA op execution is tied to specific threads) and passing the input array between threads. That's why it was PI + CUDA only.

@lock
Copy link

lock bot commented Dec 28, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Dec 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Bugs and problems DL4J General DeepLearning4j issues
Projects
None yet
Development

No branches or pull requests

4 participants