[MXNET-310] [ONNX-MXNet] API to import ONNX models into Gluon. #10605

anirudhacharya · 2018-04-18T17:10:27Z

Description

API and corresponding tests to import ONNX models into Gluon. And changes to match ONNX's op set 7.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

New import to gluon API
changes to match ONNX's op set 7.

Comments

@spidydev @Roshrini @anirudh2290

piiswrong · 2018-04-18T17:44:38Z

This is unnecessary. You can use gluon's SymbolBlock to directly load symbol models.

rajanksin · 2018-04-18T17:21:13Z

python/mxnet/contrib/onnx/_import/import_onnx.py

+        data_names = [input_tensor[0] for input_tensor in metadata['input_tensor_data']]
+        data_inputs = [symbol.var(data_name) for data_name in data_names]
+
+        from ....gluon import SymbolBlock


do we need this here ? can we move it to top ?

rajanksin · 2018-04-18T17:27:16Z

tests/python-pytest/onnx/gluon_backend_test.py

+#test_elu_example
+#test_leakyrelu_example
+
+#GLUON_TEST.include('test_elu_example')


remove lines

rajanksin · 2018-04-18T17:37:32Z

python/mxnet/contrib/onnx/_import/import_model.py

+    except ImportError:
+        raise ImportError("Onnx and protobuf need to be installed. "
+                          + "Instructions to install - https://github.com/onnx/onnx")
+    model_proto = onnx.load(model_file)


add check for if file exists and throw appropriate error message

i will add the file check

rajanksin · 2018-04-18T17:39:48Z

python/mxnet/contrib/onnx/_import/import_onnx.py

+        data_inputs = [symbol.var(data_name) for data_name in data_names]
+
+        from ....gluon import SymbolBlock
+        net = SymbolBlock(outputs=sym, inputs=data_inputs)


Add few comments to explain what is the logic here

rajanksin · 2018-04-18T17:40:36Z

python/mxnet/contrib/onnx/_import/import_to_gluon.py

+    except ImportError:
+        raise ImportError("Onnx and protobuf need to be installed. Instructions to"
+                          + " install - https://github.com/onnx/onnx#installation")
+    model_proto = onnx.load(model_file)


file exists check

not needed.

rajanksin · 2018-04-18T17:45:51Z

tests/python-pytest/onnx/gluon_backend.py

+        else:
+            raise NotImplementedError("Only CPU context is supported for now")
+
+        if node.op_type in ['Conv']:


Add comment why we are doing this for 'conv'?

marcoabreu · 2018-04-18T23:48:06Z

tests/python-pytest/onnx/gluon_backend.py

+        if device == 'CPU':
+            ctx = mx.cpu()
+        else:
+            raise NotImplementedError("Only CPU context is supported for now")


Shouldn't ONNX be implementation independent and thus not care about the used device type?

ONNX is implementation independent. Here we are running a particular ONNX model using gluon, so we need to specify the context. In the CI pipeline these tests are running on a CPU, hence the above assignment. Saying "GPU is not implemented" is pretty misleading. I will correct this in the code.

ThomasDelteil · 2018-04-19T17:47:30Z

python/mxnet/contrib/onnx/_import/import_onnx.py

+        for param in aux_params:
+            if param in net_params:
+                net_params[param].shape = aux_params[param].shape
+                net_params[param]._load_init(aux_params[param], ctx=cpu())


why defaulting to CPU ? Can we not import the model on GPU straight away? We should let the user pass in a ctx argument that default to CPU

yes, will make this change.

zhreshold · 2018-05-21T22:51:54Z

Since we are recompose the network using symbol, why specifically targeting Gluon?
It make more sense to me if it is API to import ONNX model to mxnet.
It is always simple to convert a symbol to a SymbolBlock to use with Gluon.

anirudhacharya · 2018-05-23T21:30:17Z

@zhreshold there is an import API to mxnet - https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/onnx/_import/import_model.py#L24 This PR is to facilitate directly loading the ONNX model and the parameters into gluon interface. I think @piiswrong also made the same comment above.

gluonImport # Conflicts: # python/mxnet/contrib/onnx/__init__.py # python/mxnet/contrib/onnx/_import/import_model.py # python/mxnet/contrib/onnx/_import/import_onnx.py

…to gluonImport

rajanksin · 2018-05-30T17:51:47Z

Please create a folder inside python-pytest/onnx for "import", and move all the import specific files in there. "export" specific backend will be added pretty soon.

Roshrini · 2018-05-30T17:58:09Z

python/mxnet/contrib/onnx/_import/translation_utils.py

        if op_name == 'broadcast_add':
-            op_sym = symbol.broadcast_add(op_sym, inputs[0])
+            op_sym = symbol.broadcast_add(inputs[0], op_sym)
        elif op_name == 'broadcast_mul':


Calling "_fix_broadcast()" function for all broadcast_add, broadcast_sub, broadcast_mul, broadcast_div operations.
Here, you are handling only broadcas_add and broadcast_mul

Roshrini · 2018-05-30T18:00:24Z

tests/python-pytest/onnx/mxnet_backend_rep.py

@@ -16,18 +16,19 @@
 # under the License.

 # coding: utf-8
-"""backend rep for onnx test infrastructure"""
+"""MXNet backend rep for onnx test infrastructure"""
 from collections import namedtuple


Not using anywhere, remove?

rajanksin · 2018-05-30T17:53:18Z

python/mxnet/contrib/onnx/_import/import_model.py

+    except ImportError:
+        raise ImportError("Onnx and protobuf need to be installed. "
+                          + "Instructions to install - https://github.com/onnx/onnx")
+    model_proto = onnx.load(model_file)


rajanksin · 2018-05-30T18:00:57Z

tests/python-pytest/onnx/gluon_backend.py

+    """Gluon backend for ONNX"""
+
+    @staticmethod
+    def make_graph(node, inputs):


Do we need this function ?

rajanksin · 2018-05-30T18:03:04Z

tests/python-pytest/onnx/mxnet_backend.py

@@ -74,80 +74,6 @@ def make_graph(node, inputs):

        return graph_proto



def make_graph(node, inputs) function not needed any more

Roshrini · 2018-05-30T19:44:19Z

python/mxnet/contrib/onnx/_import/import_model.py

+
+    metadata = graph.get_graph_metadata(model_proto.graph)
+    return metadata
+
 def get_model_metadata(model_file):


this method is repeated?

Roshrini · 2018-05-30T19:48:29Z

python/mxnet/contrib/onnx/_import/import_onnx.py

+        data_names = [input_tensor[0] for input_tensor in metadata['input_tensor_data']]
+        data_inputs = [symbol.var(data_name) for data_name in data_names]
+
+        ctx = gpu() if context == 'GPU' else cpu()


from .... import cpu, gpu

rajanksin · 2018-05-30T21:34:45Z

ci/docker/runtime_functions.sh

@@ -514,8 +514,9 @@ integrationtest_ubuntu_cpu_onnx() {
 	set -ex
 	export PYTHONPATH=./python/
 	python example/onnx/super_resolution.py
-	pytest tests/python-pytest/onnx/onnx_backend_test.py
+	pytest tests/python-pytest/onnx/mxnet_backend_test.py


shoudnt this be onnx/import/mxnet_backend_test.py

rajanksin · 2018-05-30T21:34:55Z

ci/docker/runtime_functions.sh

 	pytest tests/python-pytest/onnx/onnx_test.py
+	pytest tests/python-pytest/onnx/gluon_backend_test.py


Roshrini · 2018-05-30T23:04:46Z

Can you rename 'tests/python-pytest/onnx/import/onnx_test.py' to 'tests/python-pytest/onnx/import/onnx_import_test.py'?

rajanksin

lgtm

…to gluonImport

anirudh2290 · 2018-05-31T17:44:20Z

python/mxnet/contrib/onnx/_import/import_onnx.py

@@ -33,6 +34,9 @@ def __init__(self):
        self._params = {}
        self._num_input = 0
        self._num_param = 0
+        self.auxDict = {}
+        self.argDict = {}


make it consistent with snake case elsewhere

anirudh2290 · 2018-05-31T18:34:44Z

python/mxnet/contrib/onnx/_import/op_translations.py

-    return 'elemwise_div', new_attr, inputs
+        broadcast_axis = attrs['axis']
+        op_value = translation_utils._fix_broadcast('broadcast_div', inputs,
+                                                    broadcast_axis, cls)


can you explain what is broadcast_axis here. what will the broadcast_axis be when adding two tensors of shape (4,5) and (1,1)

broadcast_axis comes from the ONNX's axis attribute in operators that support broadcasting - https://github.com/onnx/onnx/blob/master/docs/Changelog.md#attributes-103

With OP_SET version 6 broadcasting (1,1) on (4,5) would not be permissible. If we are broadcasting (5,) on (4,5) the broadcast_axis will be equal to 1. On the other hand if we broadcast (4,) on (4,5) broadcast axis will be equal to 0.

ONNX with their OP_SET version 7 are updating the broadcast rules to be aligned with numpy broadcasting rules. When that gets consistently updated in ONNX repo we will also update the translation code in mxnet.

anirudh2290 · 2018-05-31T18:37:02Z

python/mxnet/contrib/onnx/_import/op_translations.py

@@ -43,32 +43,42 @@ def add(attrs, inputs, cls):
    """Adding two tensors"""
    new_attr = {}
    if 'broadcast' in attrs and attrs['broadcast'] == 1:
-        op_value = translation_utils._fix_bias_shape('broadcast_add', inputs, cls)
+        broadcast_axis = attrs['axis']


Won't block here, but add, subtract, multiply divide functions code can be reused.

anirudh2290 · 2018-05-31T18:37:28Z

python/mxnet/contrib/onnx/_import/translation_utils.py

+        input0_shape = get_input_shape(inputs[0], cls)
+        #creating reshape shape
+        reshape_shape = list(len(input0_shape) * (1,))
+        reshape_shape[broadcast_axis] = -1


is the broadcast_axis always going to be a scalar ?

yes, broadcast_axis comes from the ONNX's axis attribute in operators that support broadcasting - https://github.com/onnx/onnx/blob/master/docs/Changelog.md#attributes-103

anirudh2290 · 2018-05-31T18:40:03Z

python/mxnet/contrib/onnx/_import/translation_utils.py

+        elif op_name == 'broadcast_sub':
+            op_sym = symbol.broadcast_sub(inputs[0], op_sym)
+        elif op_name == 'broadcast_div':
+            op_sym = symbol.broadcast_div(inputs[0], op_sym)


can change the above if else logic to :

op_sym = getattr(symbol, op_name)(inputs[0], op_sym)

anirudh2290 · 2018-05-31T18:46:31Z

python/mxnet/contrib/onnx/_import/translation_utils.py

@@ -148,21 +152,29 @@ def _fix_bias(op_name, attrs, num_inputs):
        raise ValueError("Unexpected number of inputs for: {}".format(op_name))
    return attrs

-def _fix_bias_shape(op_name, inputs, cls):
+def _fix_broadcast(op_name, inputs, broadcast_axis, cls):


Shouldn't this be obj instead of cls? Same question for everywhere cls is used in op_translations and translation_utils

no I meant to use cls, its just a convention, but the same way self is used to access an attribute inside the object (class) itself.cls is often used to reference class and instance variables outside the object.

Using cls for an instance is misleading. cls is the preferred variable name for anything that is meant to be class. For your case, since it is an instance of a class that is passed to _fix_broadcast and elsewhere, it should be something that indicates an instance like obj.

okay, i will rename it to proto_obj

…to gluonImport

anirudh2290 · 2018-06-01T18:31:03Z

tests/python-pytest/onnx/import/gluon_backend_test.py

+    BACKEND_TESTS.include(basic_model_test)
+
+BACKEND_TESTS.exclude('.*broadcast.*')
+BACKEND_TESTS.exclude('.*bcast.*')


Why excluding broadcast tests ?

Because there is an issue with the way broadcast operator tests are written in ONNX.

For example, if we try to broadcast (5,) dim array on (3,4,5) dim array then mxnet's forward pass will fail because the mxnet's interface expects the same batch size on the two arrays, i.e. (1,5) and (1,3,4,5)

So

x = mx.nd.array(np.random.rand(3,4,5)) y = mx.nd.array(np.random.rand(5,)) mx.nd.broadcast_add(x,y)

will pass, but the following will fail

xvar = mx.sym.var('x') yvar = mx.sym.var('y') bcast_add = mx.sym.broadcast_add(xvar, yvar)

There are broadcast operators in the various models that are being tested and they work fine, as the data in such models come with a valid batch_size.

xvar = mx.sym.var('x') yvar = mx.sym.var('y') bcast_add = mx.sym.broadcast_add(xvar, yvar)

This should not fail. Can you give a minimal reproducible script which uses mx.sym.broadcast_add and fails.

import mxnet as mx import numpy as np from collections import namedtuple x = mx.nd.array(np.random.rand(3,4,5)) y = mx.nd.array(np.random.rand(5,)) xvar = mx.sym.var('x') yvar = mx.sym.var('y') bcast_add = mx.sym.broadcast_add(xvar, yvar) data_names = ['x', 'y'] data_shapes = [] data_shapes.append(('x', x.shape)) data_shapes.append(('y', y.shape)) print("data shapes", data_shapes) mod = mx.mod.Module(symbol=bcast_add, context=mx.cpu(), data_names=data_names, label_names=None) mod.bind(for_training=False, data_shapes=data_shapes, label_shapes=None) mod.set_params(arg_params=None, aux_params=None) mod.init_params() data_forward = [] data_forward.append(x) data_forward.append(y) print("data forward", data_forward[0].shape) mod.forward(mx.io.DataBatch(data_forward)) result = mod.get_outputs() print("Model Result", result)

okay i didnt know that module api was being used to test individual operators in onnx.
bcast_add.bind call followed by forward() on the executor should work just fine. As discussed, please write a special test for broadcast if we cannot test it using the backend testing fw.

densenet121 which is ran successfully on CI, has broadcast multiply and add and tests for broadcasting - https://s3.amazonaws.com/download.onnx/models/opset_6/densenet121.tar.gz

anirudh2290 · 2018-06-02T00:18:07Z

tests/python-pytest/onnx/import/onnx_import_test.py

+@with_seed()
+def test_broadcast():
+    """Test for broadcasting in onnx operators."""
+    input1 = np.random.rand(1, 3, 4, 5).astype("float32")


Do we need to prepend the 1 ? Does the test pass with tensors of shape (3, 4, 5) and (5) ?

It will not, if it did then we could have used onnx tests itself

anirudh2290 · 2018-06-02T02:27:38Z

@piiswrong do you have any concerns ?

…e#10605) * gluon import * gluon tests * shape issues. * remove the dim_change list * onnx backend tests * changes to match onnx op set version 7 * fix * lint fix * add new folder * fix * fix * rename test file * comments * comment fix * check for opset differences. * fix * bcast test

Anirudh Acharya added 5 commits April 11, 2018 18:44

gluon import

ba43304

gluon tests

16dca96

shape issues.

3213dbc

remove the dim_change list

4d081ec

onnx backend tests

422c4ef

anirudhacharya requested a review from szha as a code owner April 18, 2018 17:10

rajanksin suggested changes Apr 18, 2018

View reviewed changes

marcoabreu reviewed Apr 18, 2018

View reviewed changes

ThomasDelteil reviewed Apr 19, 2018

View reviewed changes

szha requested review from yzhliu and zhreshold and removed request for szha May 21, 2018 22:31

Anirudh Acharya added 6 commits May 23, 2018 14:35

Merge branch 'master' of https://github.com/apache/incubator-mxnet into

159e376

gluonImport # Conflicts: # python/mxnet/contrib/onnx/__init__.py # python/mxnet/contrib/onnx/_import/import_model.py # python/mxnet/contrib/onnx/_import/import_onnx.py

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

17456ce

…to gluonImport

changes to match onnx op set version 7

7c63b5c

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

7aace0a

…to gluonImport

fix

9128cf0

lint fix

016b4d1

Roshrini reviewed May 30, 2018

View reviewed changes

rajanksin reviewed May 30, 2018

View reviewed changes

add new folder

a238d8e

Roshrini reviewed May 30, 2018

View reviewed changes

fix

59e6ad2

Roshrini reviewed May 30, 2018

View reviewed changes

rajanksin reviewed May 30, 2018

View reviewed changes

fix

31e3d7e

anirudhacharya force-pushed the gluonImport branch from 8c3d763 to 31e3d7e Compare May 30, 2018 22:08

rajanksin approved these changes May 30, 2018

View reviewed changes

Anirudh Acharya added 2 commits May 31, 2018 08:46

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

933ec17

…to gluonImport

rename test file

ed7ab27

anirudh2290 reviewed May 31, 2018

View reviewed changes

Anirudh Acharya added 5 commits May 31, 2018 13:39

comments

76df6da

comment fix

9d250fe

check for opset differences.

d40bb1d

Merge branch 'master' of https://github.com/apache/incubator-mxnet in…

d8e087d

…to gluonImport

fix

0365964

rajanksin mentioned this pull request Jun 1, 2018

[ONNX] Added Unsqueeze operator import support #11106

Merged

7 tasks

anirudh2290 reviewed Jun 1, 2018

View reviewed changes

bcast test

4a0787f

anirudh2290 reviewed Jun 2, 2018

View reviewed changes

anirudh2290 approved these changes Jun 2, 2018

View reviewed changes

anirudh2290 merged commit f754498 into apache:master Jun 4, 2018

anirudhacharya deleted the gluonImport branch June 4, 2018 23:27

		@@ -74,80 +74,6 @@ def make_graph(node, inputs):

		return graph_proto

		pytest tests/python-pytest/onnx/onnx_test.py
		pytest tests/python-pytest/onnx/gluon_backend_test.py

[MXNET-310] [ONNX-MXNet] API to import ONNX models into Gluon. #10605

[MXNET-310] [ONNX-MXNet] API to import ONNX models into Gluon. #10605

Conversation

anirudhacharya commented Apr 18, 2018 • edited

Description

Checklist

Essentials

Changes

Comments

piiswrong commented Apr 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasDelteil Apr 19, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhreshold commented May 21, 2018

anirudhacharya commented May 23, 2018

rajanksin commented May 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roshrini commented May 30, 2018

rajanksin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anirudh2290 commented Jun 2, 2018

anirudhacharya commented Apr 18, 2018 •

edited

ThomasDelteil Apr 19, 2018 •

edited