incorrect tiny_yolov2 predictions #6730

sjaiswal25 · 2018-11-17T07:09:56Z

I have trained a custom pedestrian detection tiny_yolov2 model using the instructions provided in https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data
The model configuration file can be found in the link : https://gist.github.com/sjaiswal25/df6c1b6b0a87f3cab8e663d22c8fb771 .
The weights are then converted to keras weights using the information provided in here : https://github.com/allanzelener/YAD2K.
I used the keras weight to create a .zip file using the code in the link https://gist.github.com/sjaiswal25/fb3c0eb1fcb425856dfa4c5b479b4c28
When I try to use the converted model with the parallel inference framework as in the code here: https://gist.github.com/sjaiswal25/06ab93868542212d0c07e8a9e8e264ed ,
the output of the getpredictedobjects method is weird: DetectedObject(exampleNumber=0, centerX=0.8525803685188293, centerY=4.001579834730364, width=0.6479467153549194, height=89.76138305664062,...).
Since the input to the model is a 224x224 image, it will result in output predictions of dimension 7x7x30.Hence, it is must that the height cannot be greater than 7.
What could be the cause to this behaviour? Also, I have tried using the same model. weights with python and there are no issues there. I guess I am missing out on something.
Also I would like to bring to your knowledge that if i use my model with the computationalgraph.outputsingle method followed by the yoloutils.getpredicted objects on a single image, this problem doesnot persist, and the results are absolutely ok.
The link to my trained weights : https://drive.google.com/file/d/1qkOwZccVFeZ73jvYVVBPcbnadlX77vSg/view?usp=sharing

agibsonccc · 2018-11-17T07:57:08Z

cc @maxpumperla @saudet

maxpumperla · 2018-11-17T14:28:04Z

@sjaiswal25

Also I would like to bring to your knowledge that if i use my model with the computationalgraph.outputsingle method followed by the yoloutils.getpredicted objects on a single image, this problem doesnot persist, and the results are absolutely ok.

do I understand correctly that you use your imported model here, too? just want to narrow down the source of error.

sjaiswal25 · 2018-11-17T14:30:47Z

Yes

maxpumperla · 2018-11-17T15:58:01Z

@sjaiswal25 getPredictedObjects expects input of shape [minibatch, 5 * boxSize + channels, height, width]. could you tell us what comes out of the PW call from your gist:

                INDArray result = parallelInference.output(prepareImage(v[0],224,224));

how does that compare to your single example? if you build a mini-batch of examples from your working example, does it suddenly fail?

sjaiswal25 · 2018-11-17T16:08:51Z

@maxpumperla I have not checked that. I will check and get back to you.

sjaiswal25 · 2018-11-19T04:55:32Z

@maxpumperla Following are the outputs you had asked for:
Parallel inference output : - Order: c Shape: [1,30,7,7], stride: [1470,49,7,1]
Single example output :- Order: c Shape: [1,30,7,7], stride: [1470,49,7,1]

Also, if I give getPredictedObjects a minibatch of examples, it doesnot fail.

maxpumperla · 2018-11-19T11:53:52Z

@sjaiswal25 thanks, I'd need to look closer into this. @saudet have you encounter this or have a clue what might be going on? you've been worker closer on the actual yolo implementation.

AlexDBlack · 2018-11-20T00:25:23Z

Also, if I give getPredictedObjects a minibatch of examples, it doesnot fail.

This stands out as an important detail.
Maybe there's some sort of ND4J indexing edge case with leading 1s in the shape? If so, it probably isn't related to Keras import.
I'll try to take a look at this today.

AlexDBlack · 2018-11-20T02:02:10Z

So I started to look at this, and I can't work out how to run your code using the provided trained weights. I assume these weights are the original darknet weights?
If so, can you provided the DL4J (.zip) model? That would make this much quicker/easier for me to run (I'm assuming it isn't an issue specific to import).

sjaiswal25 · 2018-11-22T06:06:08Z

@AlexDBlack I've attached the model links. please have a look into it.
tiny_224.zip
tiny_416.zip

AlexDBlack · 2018-11-22T23:56:28Z

@sjaiswal25 thanks. I'll take a look and will post here once I've worked out what's happening.

AlexDBlack · 2018-11-23T01:35:24Z

@sjaiswal25 so I still can't make any progress at all debugging this
I can't get either of the provided networks to predict anything at all on the VOC dataset (or at least the first few thousand or so images using the threshold you have in your code)
Though the output activations are the same for standard ComputationGraph and PI in all cases.
Here's what I've attempted to use: https://gist.github.com/AlexDBlack/a98a50ce1e2db323e0dcc4d8a3a9ea30

Can you please provide a complete example I can run (perhaps adapted from my gist above if that helps) along with at least one test image/file to reproduce this?
Trying to piece something together to reproduce the problem like this is... inefficient, to put it mildly.

Also post your DL4J/ND4J version, backend used, and GPU (if applicable).

sjaiswal25 · 2018-11-23T09:47:27Z

@AlexDBlack The network is trained for pedestrian detection only.I will also share link to one of the videos we have tested it on. Please find the link below:
https://drive.google.com/file/d/1wRsnQ2jucqpNcZRfNHzINYJ0qfvQ-8Dv/view?usp=sharing (the codes in the above link are adapted and modified from a github project: https://github.com/PacktPublishing/Java-Deep-Learning-Projects/tree/master/Chapter06).

The same model was tested on single image with the link to the codes below:
https://drive.google.com/file/d/19Qcbxx3DE1Hud2ZRDA6QyvTrBhR-NrBW/view?usp=sharing
(the codes in the above link are adapted and modified from a github project : https://github.com/jesuino/java-ml-projects/tree/master/utilities/yolo-dl4j)

The gpu that is being used is nvidia gtx 1050Ti.

AlexDBlack · 2018-11-27T09:16:16Z

@sjaiswal25 OK, I was finally able to reproduce this problem on 1.0.0-beta3 using a modified version of your code/models/data
Of note, it's limited to CUDA + ParallelInference only and is rare/intermittent/stochastic
I have a pretty good idea why (CUDA multithreading issues), I'm testing a solution now.

AlexDBlack · 2018-11-27T10:45:02Z

@sjaiswal25 So, it wasn't exactly what I thought, but it is fixed here: #6774

A workaround that should work for 1.0.0-beta3 is to add an Nd4j.getExecutor().commit() before you pass data to the ParallelInference instance (for example, add that as the final line of your prepareImage method).

As for what was happening here: in simple terms, a race condition between asynchronous CUDA execution of normalization (CUDA op execution is tied to specific threads) and passing the input array between threads. That's why it was PI + CUDA only.

lock · 2018-12-28T00:27:27Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

AlexDBlack added DL4J General DeepLearning4j issues Bug Bugs and problems labels Nov 20, 2018

AlexDBlack self-assigned this Nov 23, 2018

AlexDBlack mentioned this issue Nov 27, 2018

Misc DL4J/ND4J fixes #6774

Merged

AlexDBlack closed this as completed in #6774 Nov 27, 2018

lock bot locked and limited conversation to collaborators Dec 28, 2018

eclipsewebmaster unassigned AlexDBlack Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect tiny_yolov2 predictions #6730

incorrect tiny_yolov2 predictions #6730

sjaiswal25 commented Nov 17, 2018 •

edited

agibsonccc commented Nov 17, 2018

maxpumperla commented Nov 17, 2018

sjaiswal25 commented Nov 17, 2018

maxpumperla commented Nov 17, 2018

sjaiswal25 commented Nov 17, 2018 •

edited

sjaiswal25 commented Nov 19, 2018 •

edited

maxpumperla commented Nov 19, 2018

AlexDBlack commented Nov 20, 2018

AlexDBlack commented Nov 20, 2018

sjaiswal25 commented Nov 22, 2018

AlexDBlack commented Nov 22, 2018

AlexDBlack commented Nov 23, 2018

sjaiswal25 commented Nov 23, 2018 •

edited

AlexDBlack commented Nov 27, 2018

AlexDBlack commented Nov 27, 2018

lock bot commented Dec 28, 2018

incorrect tiny_yolov2 predictions #6730

incorrect tiny_yolov2 predictions #6730

Comments

sjaiswal25 commented Nov 17, 2018 • edited

agibsonccc commented Nov 17, 2018

maxpumperla commented Nov 17, 2018

sjaiswal25 commented Nov 17, 2018

maxpumperla commented Nov 17, 2018

sjaiswal25 commented Nov 17, 2018 • edited

sjaiswal25 commented Nov 19, 2018 • edited

maxpumperla commented Nov 19, 2018

AlexDBlack commented Nov 20, 2018

AlexDBlack commented Nov 20, 2018

sjaiswal25 commented Nov 22, 2018

AlexDBlack commented Nov 22, 2018

AlexDBlack commented Nov 23, 2018

sjaiswal25 commented Nov 23, 2018 • edited

AlexDBlack commented Nov 27, 2018

AlexDBlack commented Nov 27, 2018

lock bot commented Dec 28, 2018

sjaiswal25 commented Nov 17, 2018 •

edited

sjaiswal25 commented Nov 17, 2018 •

edited

sjaiswal25 commented Nov 19, 2018 •

edited

sjaiswal25 commented Nov 23, 2018 •

edited