Is it possible to compile ONNX models? #59

davidas1 · 2020-01-08T12:52:12Z

There are mentions of this capability in some docs + list of supported ops, but there's no example of how to do it in practice.
I tried compiling a simple pretrained resnet model from https://github.com/onnx/models/ and it failed with:

01/08/2020 12:51:20 PM ERROR [neuron-cc]: ***************************************************************
01/08/2020 12:51:20 PM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
01/08/2020 12:51:20 PM ERROR [neuron-cc]: ***************************************************************
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Error message:  A process in the process pool was terminated abruptly while the future was running or pending.
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Error location: pipeline.compile.0
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Command line:   /home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile --framework ONNX /home/ubuntu/resnet18v1.onnx --output /home/ubuntu/onnx_test/output.neff
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Internal details:
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 207, in neuroncc.driver.Job.runSingleInputFn
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 247, in neuroncc.driver.Job.SingleInputJob.run
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 252, in neuroncc.driver.Job.SingleInputJob.run
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/concurrent/futures/_base.py", line 432, in result
01/08/2020 12:51:20 PM ERROR [neuron-cc]:     return self.__get_result()
01/08/2020 12:51:20 PM ERROR [neuron-cc]:   File "/home/ubuntu/anaconda3/envs/aws_neuron_tensorflow_p36/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
01/08/2020 12:51:20 PM ERROR [neuron-cc]:     raise self._exception
01/08/2020 12:51:20 PM ERROR [neuron-cc]: 
01/08/2020 12:51:20 PM ERROR [neuron-cc]: Version information:
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   Neuron Compiler version 1.0.5939.0+5849551057
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   HWM version 1.0.720.0-5848815573
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   NEFF version 0.6
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   TVM version 1.0.1416.0+5849176296
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   NumPy version 1.17.4
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   MXNet not available
01/08/2020 12:51:21 PM ERROR [neuron-cc]:   TF version 1.15.0
01/08/2020 12:51:21 PM ERROR [neuron-cc]: 
01/08/2020 12:51:21 PM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/neuroncc-ft4i1tln

The text was updated successfully, but these errors were encountered:

aws-taylor · 2020-01-08T18:58:53Z

Hello David,

It is definitely possible to compile ONNX models.

The particular model you are attempting to compile uncovered a few bugs on our end.

Specifically:

If the version of ONNX used to train the model is different than the version of ONNX installed then a segfault may occur and you receive this useless error message. I have opened an internal ticket for this issue. Minimally, we will be improving our error messages in a future release.
If you omit the ‘—io-config’ flag when attempting to compile, then you likewise receive a useless error message. I have opened another internal ticket for this issue and we will likewise be improving our error messages in a future release.

Beyond these two issues, the particular pre-trained model mentioned may have problems. I’m not sure precisely from where you downloaded this model, but the resnet18v1 model from https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet18v1/resnet18v1.onnx appears to have incorrectly named operators and other issues (#59). Since you mentioned you just picked a random model, I did not spend too much time investigating. If using this specific model is important, could you attach the .onnx model you were using to this issue?

That being said, here’s an example of compilation using resnet50 using the model at https://github.com/onnx/models/tree/master/vision/classification/resnet/resnet50.

neuron-cc compile --framework ONNX resnet50/model.onnx --output /tmp/onnx.neff --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

Notice how the inputs and outputs are specified. For this model, the github page above conveniently specifies the input and output names and dimensions. For a more general ONNX model, you may find the net_drawer.py script provided by ONNX useful for visualizing the network.

python3 /usr/local/lib/python3.6/dist-packages/onnx/tools/net_drawer.py --input resnet50/model.onnx --output model.dot --embed_docstring
dot -Tpng model.dot -o model.png

Hopefully this helps. Please let us know if you experience any further issues.

Regards,
Taylor

davidas1 · 2020-01-23T14:01:00Z

Just got around to testing your suggested solution, and I get the same error message with the resnet50 models as well (I tested all models from the link you gave - opset3 up to opset9)

About ONNX versions - I have installed onnx 1.6.0 and onnxruntime 1.1.0
What else can I check in my environment? I'm running DLAMI 26, aws_neuron_tensorflow_p36 conda env, updated as suggested in the DLAMI with Neuron Release Notes

aws-taylor · 2020-01-23T17:23:53Z

Hello David,

After some debugging, it appears the issue may be related to onnx 1.6.0. I was able to reproduce the issue when using onnx 1.6.0, but compilation works fine when downgrading to 1.5.0.

python3 -m pip install neuron-cc onnx=1.5.0
wget -q https://s3.amazonaws.com/download.onnx/models/opset_9/resnet50.tar.gz
tar xvf resnet50.tar.gz
neuron-cc compile \
  --framework ONNX resnet50/model.onnx \
  --output onnx.neff \
  --io-config '{"inputs":{"gpu_0/data_0":[[1,3,224,224], "float32"]},"outputs":["gpu_0/softmax_1"]}'

ls -la onnx.neff

I'll continue to investigate and try to figure out why onnx 1.6.0 is problematic.

-Taylor

aws-taylor · 2020-01-23T20:57:55Z

Hello again David,

I have some new information - the issue appears to be related to how the Onnx 1.6 binary wheel was compiled and the version of libprotobuf used. Looking at a corefile, I see the SEGFAULT coming from:

x00007f1b44b60a35 in pybind11::enum_<onnx::OpSchema::SupportType>::value(char const*, onnx::OpSchema::SupportType, char const*) ()
   from /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so

Notably, this file has a dependency on libprotobuf, and I've found some other github issues that alude to this file being sensitive to protobuf version.

ldd /usr/local/lib/python3.6/dist-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so
...
libprotobuf.so.10 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.10 (0x00007f610b038000)

I'm still investigating, but in the mean time if you do a source install of onnx then you ought to be able to use 1.6.

python3 -m pip install --force-reinstall --no-binary onnx onnx

-Taylor

aws-taylor · 2020-01-23T21:44:26Z

Seems like the same issue: schyun9212/maskrcnn-benchmark#3

davidas1 · 2020-01-27T13:02:18Z

Thanks, that seems to solve the issue and enables me to run a sanity check of my setup.

The actual model I'm trying to compile includes an Upsample op (which looks to be supported, based on ONNX supported ops) + I assume you support opset 9, since Upsample was deprecated in newer ONNX versions.

For some reason the compilation now fails with:
Error message: check_upsampling() takes at least 4 positional arguments (1 given)

I've attached the log and a visualization of one of the Upsample modules in Netron, which is very simple:
neuroncc.log

If needed, I can open an issue with AWS support and share additional data (ONNX file, compiler artifacts, etc..)

aws-taylor · 2020-01-27T16:30:40Z

Thanks David,

I have opened an issue internally to track this error. We'll report back once we know more.

Regards,
Taylor

aws-zejdaj · 2020-02-29T02:02:31Z

David, could you please share the model with us? Full or a small version that contains the upsample operator. That will speed up our debug process.

Thank you,
Jindrich

awsrjh · 2020-03-09T18:24:13Z

Closing

Release notes for Neuron SDK Release - August 5, 2020

awsrjh closed this as completed Mar 9, 2020

RobertLucian mentioned this issue Apr 29, 2020

Loading ONNX neuron-compiled models #107

Closed

aws-mesharma pushed a commit that referenced this issue Sep 22, 2020

Neuron SDK Release - August 5, 2020 (#59)

f089247

Release notes for Neuron SDK Release - August 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to compile ONNX models? #59

Is it possible to compile ONNX models? #59

davidas1 commented Jan 8, 2020

aws-taylor commented Jan 8, 2020

davidas1 commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

davidas1 commented Jan 27, 2020

aws-taylor commented Jan 27, 2020

aws-zejdaj commented Feb 29, 2020

awsrjh commented Mar 9, 2020

Is it possible to compile ONNX models? #59

Is it possible to compile ONNX models? #59

Comments

davidas1 commented Jan 8, 2020

aws-taylor commented Jan 8, 2020

davidas1 commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

aws-taylor commented Jan 23, 2020

davidas1 commented Jan 27, 2020

aws-taylor commented Jan 27, 2020

aws-zejdaj commented Feb 29, 2020

awsrjh commented Mar 9, 2020