SSD support in NNVM #1214

kevinthesun · 2018-05-31T20:27:23Z

Add SSD support in NNVM. Create a tutorial to show how to deploy MXNet SSD model on CPU. This PR also enhances nnvm.compiler.build so that user can specify the last operator of the network to be compiled on a different target device. This can be a short-term solution for SSD to run on GPU, since currently it's not easy to directly implement multibox operators on GPU. After multibox_prior implemented on GPU(@Laurawly), we can add this into tutorial to show how to run major part of SSD on GPU and the final multi box_detection operator on CPU.

tqchen · 2018-05-31T21:05:14Z

cc @zhreshold @PariksheetPinjari909 @srkreddy1238

kevinthesun · 2018-05-31T21:28:14Z

I added SSD inference model json file to avoid git clone mxnet repo and then create symbol. Do we have a better place to store it? dmlc/web?

tqchen · 2018-05-31T21:33:33Z

do not include json file in the repo. The test environment do have mxnet installed so you can directly import mxnet

kevinthesun · 2018-05-31T21:53:18Z

We need to call some API in mxnet example folder. Or we can export the inference model and save it to somewhere.

tqchen · 2018-05-31T21:57:23Z

OK, then maybe it makes sense to create a github repo, or gist to host the json. In the long term, it would be nice to just use the standard gluon-cv API

tqchen · 2018-05-31T21:58:13Z

@ZihengJiang can you also help review?

Laurawly · 2018-05-31T22:08:26Z

Can we write something like: block = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
net, params = nnvm.frontend.from_mxnet(block)
to directly load ssd model from gluoncv model zoo?

kevinthesun · 2018-05-31T23:27:31Z

It seems like that gluoncv implements ssd a slightly different. multibox_detection is split into smaller operators and multibox_prior is replaced by a gluon block. Also the gluon block has three outputs now. In this case from_mxnet(block) will give error. For now maybe we just use the legacy implementation.

tqchen · 2018-05-31T23:51:13Z

In that case, it would be REALLY nice if we could simply upgrade from_mxnet to support the model from gluoncv, this would make user's life much easier

zhreshold · 2018-06-01T00:06:02Z

This is the list of operators exported by gluoncv SSD network.

      "op": "Activation", 
      "op": "Concat", 
      "op": "Convolution", 
      "op": "Flatten", 
      "op": "L2Normalization", 
      "op": "Pooling", 
      "op": "Reshape", 
      "op": "SliceChannel", 
      "op": "_contrib_box_nms", 
      "op": "_div_scalar", 
      "op": "_greater_scalar", 
      "op": "_mul_scalar", 
      "op": "_plus_scalar", 
      "op": "broadcast_add", 
      "op": "broadcast_mul", 
      "op": "elemwise_add", 
      "op": "elemwise_sub", 
      "op": "exp", 
      "op": "null", 
      "op": "ones_like", 
      "op": "slice_axis", 
      "op": "slice_like", 
      "op": "softmax", 
      "op": "transpose", 
      "op": "where", 
      "op": "zeros_like",

where

"op": "_contrib_box_nms" is a versatile bounding box suppression op extracted from multibox_detection operator.
"op": "slice_like" is used to bypass the infer_shape limitation if image size is not determined. It's a variant of slice so should be very simple to implement.

As long as we have NMS extracted in TVM, we are good to load gluoncv SSD models.

srkreddy1238 · 2018-06-01T13:30:15Z

.gitignore

@@ -98,7 +98,6 @@ build_*
 Win32
 *.dir
 perf
-nnvm


Could rebase to master, this is already merged.

srkreddy1238 · 2018-06-01T13:32:11Z

nnvm/python/nnvm/compiler/build_module.py

+            for input_name in extra_op_graph.symbol.list_input_names():
+                if input_name in params:
+                    extra_op_params[input_name] = params[input_name]
+                    params.remove(input_name)


extra_lib op can be used generically.
Removing params !! How about param is being shared ??

A bit unrealistic, but making it to be aware.

srkreddy1238 · 2018-06-01T13:33:32Z

nnvm/src/top/vision/nms.cc

+  CHECK_EQ(in_attrs->size(), 2U) << "Inputs: [data, valid_count]";
+  TShape dshape = in_attrs->at(0);
+  TShape vshape = in_attrs->at(1);
+  CHECK_EQ(dshape.ndim(), 3U) << "Provided: " << dshape;


error msg could be more informative to differentiate.

srkreddy1238 · 2018-06-01T13:33:50Z

nnvm/src/top/vision/ssd/mutibox_op.cc

+  TShape lshape = in_attrs->at(1);
+  TShape ashape = in_attrs->at(2);
+  CHECK_EQ(cshape.ndim(), 3U) << "Provided: " << cshape;
+  CHECK_EQ(lshape.ndim(), 2U) << "Provided: " << lshape;


err msg could be more informative to differentiate.

srkreddy1238 · 2018-06-01T13:38:55Z

@kevinthesun Thanks, you brought the point of dmlc/web.

@tqchen
We have many generated models and data (test images ...etc.) across various test cases and tutorials.
How about bringing all these under one project (under DMLC) ?

zhreshold · 2018-06-02T18:47:53Z

nnvm/python/nnvm/compiler/graph_util.py

@@ -146,3 +147,39 @@ def gradients(ys, xs, grad_ys=None):
        if isinstance(xs, list) else len(xs.list_output_names())
    ret = [grad_g.symbol[i] for i in range(nx)]
    return ret
+
+def split_last_op(graph):


hopefully we have a formal graph splitting function in the future rather than this hacky function

Yes. A general subgraph interface requires more thoughts and a more complete design. This is a short term solution for limited use cases.

zhreshold · 2018-06-03T02:52:42Z

nnvm/include/nnvm/top/nn.h

+      .describe("Clip out-of-boundary boxes.");
+    DMLC_DECLARE_FIELD(threshold).set_default(0.01)
+    .describe("Threshold to be a positive prediction.");
+    DMLC_DECLARE_FIELD(nms_threshold).set_default(0.5)


is NMS stripped out of this op or not? I saw a standalone NMS operator below.

Yes. NMS is now a standalone operator. multi box_detection is still preserved so that MulriboxDetection in MXNet can be directly converted.

The conversion process does not have to be one to one, so we can break it into sequence, maybe @zhreshold can have specific thouoghts on what that sequence is.

PariksheetPinjari909 · 2018-06-01T14:07:33Z

nnvm/python/nnvm/compiler/build_module.py

@@ -294,7 +331,9 @@ def build(graph, target=None, shape=None, dtype="float32",
        if params is None:
            params = {}
        params.update(init_var)
-    return graph, libmod, params
+    if not build_extra:
+        return graph, libmod, params


I think this should be opposite as default is to return 3 params.

A.T.

Better return 4 params always, no need of if check.

A.T.

I agree for the long term a consistent interface would be better. For now separating number of returns is for backward compatibility. We don't need to modify every existing tutorial.

PariksheetPinjari909 · 2018-06-01T14:21:33Z

nnvm/python/nnvm/compiler/graph_util.py

+    """
+    graph_idx = graph.index
+    last_op_node = graph_idx.nodes[-1]
+    last_op_func = getattr(sym, last_op_node["op"])


Add a failsafe case where last node doesn't have any op.

A.T.

PariksheetPinjari909 · 2018-06-01T14:38:35Z

nnvm/tests/python/compiler/test_build.py


 if __name__ == "__main__":
    test_precompute_prune()
    test_compile()
    test_run()
    test_dtypes()
+    test_compile_extra_lib()


Please add a test case for verifying the proper graph splitting in case of extra_lib_target eanabled.

A.T.

This test case covers graph splitting. It compiles the major network with cuda and the last op with llvm. Then it compares the result of split graph with the result of running the whole graph on cpu. Is there anything missing here?

@kevinthesun : what i meant is validating the graph output from build() post split, not the runtime output, runtime output validation is good in your test case.
You can create a sample model with small nodes, and then run your build for split, and check the graph output, you can refer other test cases in nnvm/tests/python/compiler.

PariksheetPinjari909 · 2018-06-01T14:44:39Z

nnvm/python/nnvm/compiler/build_module.py

@@ -32,6 +32,8 @@ class BuildConfig(object):
    defaults = {
        "opt_level": 2,
        "add_pass": None,
+        "extra_lib_op": None,


How about name as split_lib_op . The term extra doesn't fit well, basically it's not extra, it's split part of original graph. Accordingly rename extra_lib_target as well.

A.T.

One reason to use extra here is that instead of returning a single lib, now there is an extra lib which compiles from the split operator. Both of the names make sense in some aspects. Since this is not a general complete solution, I name it as extra for now.

PariksheetPinjari909 · 2018-06-04T06:27:26Z

nnvm/python/nnvm/compiler/graph_util.py

@@ -146,3 +147,39 @@ def gradients(ys, xs, grad_ys=None):
        if isinstance(xs, list) else len(xs.list_output_names())
    ret = [grad_g.symbol[i] for i in range(nx)]
    return ret
+
+def split_last_op(graph):


I also agree with @zhreshold , lets not limit the split only to the last op node, we can make the function more generic and take input from use like "op_name" and "sequence in the occurrence from the last node", so that the function can be used for formal splitting at any place not only last node...

For example: op_name="flatten", sequence="1" it will split the first occurrence of flatten from the last node. In this way last node split and other node split also can be achieved.

~/A.T.

Agree. But we need a more complete design to cover all cases, especially for networks which have many branches. Simple logics like the last X operators might not work well. A rough idea could be: User selects a set of nodes as subgraph inputs and another set of nodes as outputs. Then we do some checks to make sure this is a valid splitting and compile different subgraphs separately.

@kevinthesun : you are right, it is little tricky, there are some cases where we can not split the graph as simple, there are many cases needs to be handled, lets create a prototype and discuss on top of that, i am not sure whether in your PR needs to be handled, but i brought the point as it is a very important feature. Thanks!
@tqchen : please provide your opinion on this, Thanks!

My suggestion is to name it more general, but for now only support split on last op (raise exception otherwise). Once you create an API, you just lost chance to remove or rename it.

PariksheetPinjari909 · 2018-06-04T06:48:48Z

nnvm/src/top/vision/nms.cc

+  TShape oshape = TShape(3);
+  oshape[0] = dshape[0];
+  oshape[1] = dshape[1];
+  oshape[2] = 6;  // [id, prob, xmin, ymin, xmax, ymax]


Assign dshape[2], as value 6 is already checked in previous statements.
I feel no need to assign in 3 statements, directly assign dshape to oshape.

tqchen · 2018-06-06T16:45:53Z

nnvm/include/nnvm/top/nn.h

+      .describe("Clip out-of-boundary boxes.");
+    DMLC_DECLARE_FIELD(threshold).set_default(0.01)
+    .describe("Threshold to be a positive prediction.");
+    DMLC_DECLARE_FIELD(nms_threshold).set_default(0.5)


The conversion process does not have to be one to one, so we can break it into sequence, maybe @zhreshold can have specific thouoghts on what that sequence is.

tqchen · 2018-06-06T16:53:29Z

Thanks for all the reviewer's comments, I see two major things that we want to improve in here.

The nms operator is separated in the code, but not being separated in the nnvm.
- As @zhreshold suggested, we might want to split the multiboxprior into several ops so that nms can be its own op and we don't have to introduce the bulk op in here
- We need to support gluon-cv models
The split extra op API is a bit hacky, this is not the fault of this PR but reveals a limitation in current graph runtime. A proper solution would add cross-device API to NNVM, I will add that in an RFC

tqchen · 2018-06-06T17:10:05Z

created #1242

sanallen · 2018-06-07T08:20:47Z

@kevinthesun i test a mxnet ssd model，all layer is normal except SoftmaxActivation（mode=channel）. Did you find this problem?

kevinthesun · 2018-06-07T17:49:05Z

@sanallen Did you test with ssd in mxnet example? That should work since the model is used in tutorial. SoftmaxActivation conversion is also added in this PR.

kaishijeng · 2018-06-16T02:51:40Z

@kevinthesun

Test deploy_ssd.py with mxnet mobilenet-ssd-512 model and got the following error: _contrib_MultiBoxTarget is not supported in nnvm

Traceback (most recent call last):
File "./deploy_ssd.py", line 82, in
net, params = from_mxnet(sym, arg_params, aux_params)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 402, in from_mxnet
sym = _from_mxnet_impl(symbol, {})
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 353, in _from_mxnet_impl
return [_from_mxnet_impl(s, graph) for s in symbol]
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 365, in _from_mxnet_impl
childs = [_from_mxnet_impl(childs[i], graph) for i in range(len(childs.list_outputs()))]
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 367, in _from_mxnet_impl
node = _convert_symbol(op_name, childs, attr)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 326, in _convert_symbol
_raise_not_supported('Operator: ' + op_name)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 24, in _raise_not_supported
raise NotImplementedError(err)
NotImplementedError: Operator: _contrib_MultiBoxTarget is not supported in nnvm.

Any idea?

Thanks,

tqchen · 2018-06-16T03:12:03Z

@kaishijeng you are more than welcomed to bring more discussions to https://discuss.tvm.ai

kevinthesun · 2018-06-16T03:17:27Z

_contrib_MultiBoxTarget is for training. You need to use inference model.

kaishijeng · 2018-06-16T03:38:29Z

@kevinthesun

After running deploy.py to remove _contrib_MultiBoxTarget, I got a new error:

NotImplementedError: Operator: L2Normalization is not supported in nnvm.

kevinthesun · 2018-06-16T04:40:17Z

#1157 #1223 L2Normalization related issue and PR.

ndcuong91 · 2018-07-06T03:33:32Z

@kevinthesun can we compile last operator of the network on a different target device now?

2909638069 · 2019-01-22T01:43:17Z

_contrib_MultiBoxTarget is for training. You need to use inference model.

It occurs to me, too.
I just run the tutorial deploy SSD, try to compile it to rasp3b.
NotImplementedError: Operator: _contrib_MultiBoxTarget is not supported in nnvm.
I am newbie, and need more details.

2909638069 · 2019-01-22T07:25:46Z

OK, then maybe it makes sense to create a github repo, or gist to host the json. In the long term, it would be nice to just use the standard gluon-cv API

gluon-cv depends on scipy, which is not supported by ARMv7l.
I built mxnet on rasp3b, but gluoncv failed .
I hope that I could deploy my code based on gluoncv to rasp3b, which failed to pip3 install gluoncv.
It took me 2 weeks to try tvm and not yet!
It is hard for me to bridge the gap between local and Pi!
So, think about how to deploy on rasp3b.
Keep it simple and stupid!

kevinthesun · 2019-01-23T00:03:55Z

You can follow the tutorial of deploying to edge device. You don't necessarily need to use rpc. You can simply upload compiled graph json, lib.so and param files to rasp and install tom runtime on rasp to load and run.

kevinthesun changed the title ~~SSD support~~ SSD support in NNVM May 31, 2018

tqchen added the status: need review label May 31, 2018

srkreddy1238 reviewed Jun 1, 2018

View reviewed changes

zhreshold reviewed Jun 2, 2018

View reviewed changes

zhreshold reviewed Jun 3, 2018

View reviewed changes

PariksheetPinjari909 reviewed Jun 3, 2018

View reviewed changes

PariksheetPinjari909 reviewed Jun 4, 2018

View reviewed changes

kevinthesun force-pushed the SSDSupport branch from b2eb8f9 to 30eb5c7 Compare June 4, 2018 19:08

tqchen requested changes Jun 6, 2018

View reviewed changes

tqchen added status: review in progress and removed status: need review labels Jun 6, 2018

Wang and others added 11 commits June 14, 2018 14:44

Address comments

58f47a7

Fix typo

307032c

Split multibox_detection

1bba66a

Fix output formant of multibox_transform_loc

bb30396

Remove build extra lib

837fc7a

Fix lint

3b31541

Fix tutorial title

e998878

Remove split_last_op

a9e2c7d

Move download to testing

355fae1

Fix lint

5db4aba

Minor fix

2983847

kevinthesun force-pushed the SSDSupport branch from bd2758e to 9c4a9d3 Compare June 14, 2018 21:45

Update dowmload docstring

78e5b45

kevinthesun force-pushed the SSDSupport branch from 9c4a9d3 to 78e5b45 Compare June 14, 2018 21:48

Fix lint

3cc2d43

tqchen approved these changes Jun 14, 2018

View reviewed changes

tqchen merged commit 6ab4da6 into apache:master Jun 14, 2018

tqchen pushed a commit to tqchen/tvm that referenced this pull request Jul 6, 2018

SSD support in NNVM (apache#1214)

42520a2

mnuyens pushed a commit to mnuyens/tvm that referenced this pull request Jul 10, 2018

SSD support in NNVM (apache#1214)

c418e91

kevinthesun deleted the SSDSupport branch July 18, 2018 22:30

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

SSD support in NNVM (apache#1214)

3cfc087

@@ @@ -98,7 +98,6 @@ build_* @@
               Win32
               *.dir
               perf
-              nnvm

SSD support in NNVM #1214

SSD support in NNVM #1214

Conversation

kevinthesun commented May 31, 2018 • edited Loading

tqchen commented May 31, 2018

kevinthesun commented May 31, 2018 • edited Loading

tqchen commented May 31, 2018

kevinthesun commented May 31, 2018

tqchen commented May 31, 2018

tqchen commented May 31, 2018

Laurawly commented May 31, 2018 • edited Loading

kevinthesun commented May 31, 2018

tqchen commented May 31, 2018

zhreshold commented Jun 1, 2018

srkreddy1238 Jun 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srkreddy1238 commented Jun 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinthesun Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinthesun Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PariksheetPinjari909 Jun 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jun 6, 2018

tqchen commented Jun 6, 2018

sanallen commented Jun 7, 2018

kevinthesun commented Jun 7, 2018

kaishijeng commented Jun 16, 2018

tqchen commented Jun 16, 2018 • edited Loading

kevinthesun commented Jun 16, 2018

kaishijeng commented Jun 16, 2018

kevinthesun commented Jun 16, 2018

ndcuong91 commented Jul 6, 2018

2909638069 commented Jan 22, 2019

2909638069 commented Jan 22, 2019

kevinthesun commented Jan 23, 2019

kevinthesun commented May 31, 2018 •

edited

Loading

kevinthesun commented May 31, 2018 •

edited

Loading

Laurawly commented May 31, 2018 •

edited

Loading

srkreddy1238 Jun 1, 2018 •

edited

Loading

kevinthesun Jun 4, 2018 •

edited

Loading

kevinthesun Jun 4, 2018 •

edited

Loading

PariksheetPinjari909 Jun 4, 2018 •

edited

Loading

tqchen commented Jun 16, 2018 •

edited

Loading