New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not load Mxnet trained model #189
Comments
Hi, can you share the model or the code to obtain the model? I can try to rerpoduce that. |
The script the model was learned with train_yolo3.py: The trained model model-0000.params: The json file of the model model-symbol.json: |
@thhart Do you know which mxnet version you are using to train the model (pip package)? Is that MXNet 1.7 or MXNet 1.5 (or lower?) |
Name: mxnet-cu102 Name: gluoncv |
After trying to load this model in python, I got the following issue:
It seemed some of the layers weight are not saved. There is something you can do, try to follow the steps: To save your model into a Symbol compatible way To reproduce the above python issue
If the problem persist, I guess we have to dig through the block to see which part may not be hybridized. |
Cool, thanks for checking, I was under the consumption the net was already hybridized but in fact it wasn't. So I converted now and it is loading. Sorry not have checked carefully. However now I run into an other problem the layers not feeded well. I have changed the criteria like this: Criteria criteria = Criteria.builder() .setTypes(Image.class, DetectedObjects.class) // defines input and output data type .optDevice(Device.cpu()) .optTranslator(new YoloTranslator( YoloTranslator.builder() .optSynsetArtifactName("synset.txt") .setPipeline(new Pipeline()) )) .optModelUrls("file:///tmp") // search models in specified path .optModelName("yolo3_darknet53") This is the error I receive: Exception in thread "main" ai.djl.engine.EngineException: MXNet engine call failed: MXNetError: Error in operator darknetv30_conv0_fwd: Shape inconsistent, Provided = [32,3,3,3], inferred shape=(32,608,3,3) at ai.djl.mxnet.jna.JnaUtils.checkCall(JnaUtils.java:1808) at ai.djl.mxnet.jna.JnaUtils.cachedOpInvoke(JnaUtils.java:1785) at ai.djl.mxnet.engine.CachedOp.forward(CachedOp.java:135) at ai.djl.mxnet.engine.MxSymbolBlock.forward(MxSymbolBlock.java:178) at ai.djl.nn.Block.forward(Block.java:117) at ai.djl.inference.Predictor.predict(Predictor.java:117) at ai.djl.inference.Predictor.batchPredict(Predictor.java:157) at ai.djl.inference.Predictor.predict(Predictor.java:112) at com.itth.okra.axle.AxleDetectorMxnetDjl.(AxleDetectorMxnetDjl.java:38) at com.itth.okra.axle.AxleDetectorMxnetDjl.main(AxleDetectorMxnetDjl.java:48) This is the image (608x608) loading: final ZooModel model = ModelZoo.loadModel(criteria); Predictor predictor = model.newPredictor(); final File input = new File("/tmp/sample.jpg"); BufferedImageFactory factory = new BufferedImageFactory(); DetectedObjects detection = predictor.predict(factory.fromImage("sample.jpg"); for (Classification item : detection.items()) { System.err.println(item.getClassName() + ": " + item.getProbability()); } |
I can take a look too, is seemed like the input shape are not YoLo network looking for. What is the shape for your input? Usually during the training we do normalize and resize the image. 608 seemed to be the upper limit for the model, maybe you can do Resize() in the pipeline to resize it to some value between 320 - 608. Can you send me the model files again, I can help discover that in the meantime |
BTW, we do have some pretrained gluoncv yolo model in DJL. You may want to take a look. Some tricks we do for hybridization is, before we do the export, we do a forward with a dummy image (nd.ones((1, 3, size, size)), and use the size as the standard image size to feed. I assume you are doing something similiar |
You can also find the script that was used for the model zoo yolo models at https://github.com/awslabs/djl/blob/master/mxnet/mxnet-model-zoo/src/main/scripts/exportYolo.py |
Hi Lanking, sorry for this late answer but I checked meanwhile an alternative approach with YoloV5. With this I achieved inference over an Onnx bridge (1.5.1). |
@thhart Hi Thomas, we do have ONNX Runtime support: http://docs.djl.ai/onnxruntime/onnxruntime-engine/index.html Could you please try it out? This should work with the majority ONNX model designed for Deep Learning Thanks |
Using following code produces the error below, maybe a simple input encoding problem? Any hint? YoloTranslator looks the same as in my solution by the way but did not get this far yet...
Exception in thread "main" ai.djl.engine.EngineException: ai.onnxruntime.OrtException: Error code - ORT_INVALID_ARGUMENT - message: Unexpected input data type. Actual: (N11onnxruntime17PrimitiveDataTypeIaEE) , expected: (N11onnxruntime17PrimitiveDataTypeIfEE) at ai.djl.onnxruntime.engine.OrtSymbolBlock.forward(OrtSymbolBlock.java:102) at ai.djl.nn.Block.forward(Block.java:117) at ai.djl.inference.Predictor.predict(Predictor.java:117) at ai.djl.inference.Predictor.batchPredict(Predictor.java:157) at ai.djl.inference.Predictor.predict(Predictor.java:112) at com.itth.okra.axle.AxleDetectorOnnxDjl.(AxleDetectorOnnxDjl.java:34) at com.itth.okra.axle.AxleDetectorOnnxDjl.main(AxleDetectorOnnxDjl.java:44) Caused by: ai.onnxruntime.OrtException: Error code - ORT_INVALID_ARGUMENT - message: Unexpected input data type. Actual: (N11onnxruntime17PrimitiveDataTypeIaEE) , expected: (N11onnxruntime17PrimitiveDataTypeIfEE) at ai.onnxruntime.OrtSession.run(Native Method) at ai.onnxruntime.OrtSession.run(OrtSession.java:288) at ai.onnxruntime.OrtSession.run(OrtSession.java:231) at ai.onnxruntime.OrtSession.run(OrtSession.java:200) at ai.djl.onnxruntime.engine.OrtSymbolBlock.forward(OrtSymbolBlock.java:99) ... 6 more |
I should add I feed the model with normalized floats. |
I haven't seen this before
I will take a look [Update] Datatype might be the pain. Try with converting data to float32 |
Sure, but how to feed data or influence the conversion when using your framework image input chain as above? |
Looks like BaseImageTranslator is feeding INT by default, maybe worth to check overwriting... |
@thhart You can use the pipeline to add a ToTensor method. Like we tried here: #238 (comment) |
Got it working, need some NMS (non max supression) still, is there something available already in DJL for DetectedObjects? |
Please check following PR: Tested successful with custom trained network. |
Fixes deepjavalibrary#184 Uses workaround described in diffplug/spotless#834 and found in https://github.com/diffplug/spotless/tree/main/plugin-gradle#google-java-format.
Description
Can not load a pre trained Yolo model from Mxnet. I have a param file and a symbol.json. MxModel seems to fail to handle the params file. If interested I might be able to share the model private on request.
The model was trained in a mxnet/gluoncv python environment.
Debugging the code I can see the key value is stages.0.0.0.weight which is supposed to be split by ":" which obviously fails.
Error Message
How to Reproduce?
I try to load the model with following code:
Environment Info
djl: 0.8.0
mxnet-engine: 0.8.0
mxnet-native-mkl: 1.7.0
The text was updated successfully, but these errors were encountered: