Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSD support in NNVM #1214

Merged
merged 22 commits into from
Jun 14, 2018
Merged

SSD support in NNVM #1214

merged 22 commits into from
Jun 14, 2018

Conversation

kevinthesun
Copy link
Contributor

@kevinthesun kevinthesun commented May 31, 2018

Add SSD support in NNVM. Create a tutorial to show how to deploy MXNet SSD model on CPU. This PR also enhances nnvm.compiler.build so that user can specify the last operator of the network to be compiled on a different target device. This can be a short-term solution for SSD to run on GPU, since currently it's not easy to directly implement multibox operators on GPU. After multibox_prior implemented on GPU(@Laurawly), we can add this into tutorial to show how to run major part of SSD on GPU and the final multi box_detection operator on CPU.

@kevinthesun kevinthesun changed the title SSD support SSD support in NNVM May 31, 2018
@tqchen
Copy link
Member

tqchen commented May 31, 2018

@kevinthesun
Copy link
Contributor Author

kevinthesun commented May 31, 2018

I added SSD inference model json file to avoid git clone mxnet repo and then create symbol. Do we have a better place to store it? dmlc/web?

@tqchen
Copy link
Member

tqchen commented May 31, 2018

do not include json file in the repo. The test environment do have mxnet installed so you can directly import mxnet

@kevinthesun
Copy link
Contributor Author

We need to call some API in mxnet example folder. Or we can export the inference model and save it to somewhere.

@tqchen
Copy link
Member

tqchen commented May 31, 2018

OK, then maybe it makes sense to create a github repo, or gist to host the json. In the long term, it would be nice to just use the standard gluon-cv API

@tqchen
Copy link
Member

tqchen commented May 31, 2018

@ZihengJiang can you also help review?

@Laurawly
Copy link
Contributor

Laurawly commented May 31, 2018

Can we write something like: block = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained=True)
net, params = nnvm.frontend.from_mxnet(block)
to directly load ssd model from gluoncv model zoo?

@kevinthesun
Copy link
Contributor Author

It seems like that gluoncv implements ssd a slightly different. multibox_detection is split into smaller operators and multibox_prior is replaced by a gluon block. Also the gluon block has three outputs now. In this case from_mxnet(block) will give error. For now maybe we just use the legacy implementation.

@tqchen
Copy link
Member

tqchen commented May 31, 2018

In that case, it would be REALLY nice if we could simply upgrade from_mxnet to support the model from gluoncv, this would make user's life much easier

@zhreshold
Copy link
Member

This is the list of operators exported by gluoncv SSD network.

      "op": "Activation", 
      "op": "Concat", 
      "op": "Convolution", 
      "op": "Flatten", 
      "op": "L2Normalization", 
      "op": "Pooling", 
      "op": "Reshape", 
      "op": "SliceChannel", 
      "op": "_contrib_box_nms", 
      "op": "_div_scalar", 
      "op": "_greater_scalar", 
      "op": "_mul_scalar", 
      "op": "_plus_scalar", 
      "op": "broadcast_add", 
      "op": "broadcast_mul", 
      "op": "elemwise_add", 
      "op": "elemwise_sub", 
      "op": "exp", 
      "op": "null", 
      "op": "ones_like", 
      "op": "slice_axis", 
      "op": "slice_like", 
      "op": "softmax", 
      "op": "transpose", 
      "op": "where", 
      "op": "zeros_like", 

where

  • "op": "_contrib_box_nms" is a versatile bounding box suppression op extracted from multibox_detection operator.
  • "op": "slice_like" is used to bypass the infer_shape limitation if image size is not determined. It's a variant of slice so should be very simple to implement.

As long as we have NMS extracted in TVM, we are good to load gluoncv SSD models.

.gitignore Outdated
@@ -98,7 +98,6 @@ build_*
Win32
*.dir
perf
nnvm
Copy link
Contributor

@srkreddy1238 srkreddy1238 Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could rebase to master, this is already merged.

for input_name in extra_op_graph.symbol.list_input_names():
if input_name in params:
extra_op_params[input_name] = params[input_name]
params.remove(input_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_lib op can be used generically.
Removing params !! How about param is being shared ??

A bit unrealistic, but making it to be aware.

CHECK_EQ(in_attrs->size(), 2U) << "Inputs: [data, valid_count]";
TShape dshape = in_attrs->at(0);
TShape vshape = in_attrs->at(1);
CHECK_EQ(dshape.ndim(), 3U) << "Provided: " << dshape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error msg could be more informative to differentiate.

TShape lshape = in_attrs->at(1);
TShape ashape = in_attrs->at(2);
CHECK_EQ(cshape.ndim(), 3U) << "Provided: " << cshape;
CHECK_EQ(lshape.ndim(), 2U) << "Provided: " << lshape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err msg could be more informative to differentiate.

@srkreddy1238
Copy link
Contributor

@kevinthesun Thanks, you brought the point of dmlc/web.

@tqchen
We have many generated models and data (test images ...etc.) across various test cases and tutorials.
How about bringing all these under one project (under DMLC) ?

@@ -146,3 +147,39 @@ def gradients(ys, xs, grad_ys=None):
if isinstance(xs, list) else len(xs.list_output_names())
ret = [grad_g.symbol[i] for i in range(nx)]
return ret

def split_last_op(graph):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully we have a formal graph splitting function in the future rather than this hacky function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A general subgraph interface requires more thoughts and a more complete design. This is a short term solution for limited use cases.

.describe("Clip out-of-boundary boxes.");
DMLC_DECLARE_FIELD(threshold).set_default(0.01)
.describe("Threshold to be a positive prediction.");
DMLC_DECLARE_FIELD(nms_threshold).set_default(0.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is NMS stripped out of this op or not? I saw a standalone NMS operator below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. NMS is now a standalone operator. multi box_detection is still preserved so that MulriboxDetection in MXNet can be directly converted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion process does not have to be one to one, so we can break it into sequence, maybe @zhreshold can have specific thouoghts on what that sequence is.

@@ -294,7 +331,9 @@ def build(graph, target=None, shape=None, dtype="float32",
if params is None:
params = {}
params.update(init_var)
return graph, libmod, params
if not build_extra:
return graph, libmod, params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be opposite as default is to return 3 params.

A.T.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better return 4 params always, no need of if check.

A.T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree for the long term a consistent interface would be better. For now separating number of returns is for backward compatibility. We don't need to modify every existing tutorial.

"""
graph_idx = graph.index
last_op_node = graph_idx.nodes[-1]
last_op_func = getattr(sym, last_op_node["op"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a failsafe case where last node doesn't have any op.

A.T.


if __name__ == "__main__":
test_precompute_prune()
test_compile()
test_run()
test_dtypes()
test_compile_extra_lib()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case for verifying the proper graph splitting in case of extra_lib_target eanabled.

A.T.

Copy link
Contributor Author

@kevinthesun kevinthesun Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case covers graph splitting. It compiles the major network with cuda and the last op with llvm. Then it compares the result of split graph with the result of running the whole graph on cpu. Is there anything missing here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinthesun : what i meant is validating the graph output from build() post split, not the runtime output, runtime output validation is good in your test case.
You can create a sample model with small nodes, and then run your build for split, and check the graph output, you can refer other test cases in nnvm/tests/python/compiler.

@@ -32,6 +32,8 @@ class BuildConfig(object):
defaults = {
"opt_level": 2,
"add_pass": None,
"extra_lib_op": None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about name as split_lib_op . The term extra doesn't fit well, basically it's not extra, it's split part of original graph. Accordingly rename extra_lib_target as well.

A.T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason to use extra here is that instead of returning a single lib, now there is an extra lib which compiles from the split operator. Both of the names make sense in some aspects. Since this is not a general complete solution, I name it as extra for now.

@@ -146,3 +147,39 @@ def gradients(ys, xs, grad_ys=None):
if isinstance(xs, list) else len(xs.list_output_names())
ret = [grad_g.symbol[i] for i in range(nx)]
return ret

def split_last_op(graph):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with @zhreshold , lets not limit the split only to the last op node, we can make the function more generic and take input from use like "op_name" and "sequence in the occurrence from the last node", so that the function can be used for formal splitting at any place not only last node...

For example: op_name="flatten", sequence="1" it will split the first occurrence of flatten from the last node. In this way last node split and other node split also can be achieved.

~/A.T.

Copy link
Contributor Author

@kevinthesun kevinthesun Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But we need a more complete design to cover all cases, especially for networks which have many branches. Simple logics like the last X operators might not work well. A rough idea could be: User selects a set of nodes as subgraph inputs and another set of nodes as outputs. Then we do some checks to make sure this is a valid splitting and compile different subgraphs separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinthesun : you are right, it is little tricky, there are some cases where we can not split the graph as simple, there are many cases needs to be handled, lets create a prototype and discuss on top of that, i am not sure whether in your PR needs to be handled, but i brought the point as it is a very important feature. Thanks!
@tqchen : please provide your opinion on this, Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to name it more general, but for now only support split on last op (raise exception otherwise). Once you create an API, you just lost chance to remove or rename it.

TShape oshape = TShape(3);
oshape[0] = dshape[0];
oshape[1] = dshape[1];
oshape[2] = 6; // [id, prob, xmin, ymin, xmax, ymax]
Copy link
Contributor

@PariksheetPinjari909 PariksheetPinjari909 Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign dshape[2], as value 6 is already checked in previous statements.
I feel no need to assign in 3 statements, directly assign dshape to oshape.

.describe("Clip out-of-boundary boxes.");
DMLC_DECLARE_FIELD(threshold).set_default(0.01)
.describe("Threshold to be a positive prediction.");
DMLC_DECLARE_FIELD(nms_threshold).set_default(0.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion process does not have to be one to one, so we can break it into sequence, maybe @zhreshold can have specific thouoghts on what that sequence is.

@tqchen
Copy link
Member

tqchen commented Jun 6, 2018

Thanks for all the reviewer's comments, I see two major things that we want to improve in here.

  • The nms operator is separated in the code, but not being separated in the nnvm.
    • As @zhreshold suggested, we might want to split the multiboxprior into several ops so that nms can be its own op and we don't have to introduce the bulk op in here
    • We need to support gluon-cv models
  • The split extra op API is a bit hacky, this is not the fault of this PR but reveals a limitation in current graph runtime. A proper solution would add cross-device API to NNVM, I will add that in an RFC

@tqchen
Copy link
Member

tqchen commented Jun 6, 2018

created #1242

@sanallen
Copy link

sanallen commented Jun 7, 2018

@kevinthesun i test a mxnet ssd model,all layer is normal except SoftmaxActivation(mode=channel). Did you find this problem?

@kevinthesun
Copy link
Contributor Author

@sanallen Did you test with ssd in mxnet example? That should work since the model is used in tutorial. SoftmaxActivation conversion is also added in this PR.

@tqchen tqchen merged commit 6ab4da6 into apache:master Jun 14, 2018
@kaishijeng
Copy link

@kevinthesun

Test deploy_ssd.py with mxnet mobilenet-ssd-512 model and got the following error: _contrib_MultiBoxTarget is not supported in nnvm

Traceback (most recent call last):
File "./deploy_ssd.py", line 82, in
net, params = from_mxnet(sym, arg_params, aux_params)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 402, in from_mxnet
sym = _from_mxnet_impl(symbol, {})
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 353, in _from_mxnet_impl
return [_from_mxnet_impl(s, graph) for s in symbol]
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 365, in _from_mxnet_impl
childs = [_from_mxnet_impl(childs[i], graph) for i in range(len(childs.list_outputs()))]
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 367, in _from_mxnet_impl
node = _convert_symbol(op_name, childs, attr)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 326, in _convert_symbol
_raise_not_supported('Operator: ' + op_name)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/mxnet.py", line 24, in _raise_not_supported
raise NotImplementedError(err)
NotImplementedError: Operator: _contrib_MultiBoxTarget is not supported in nnvm.

Any idea?

Thanks,

@tqchen
Copy link
Member

tqchen commented Jun 16, 2018

@kaishijeng you are more than welcomed to bring more discussions to https://discuss.tvm.ai

@kevinthesun
Copy link
Contributor Author

_contrib_MultiBoxTarget is for training. You need to use inference model.

@kaishijeng
Copy link

@kevinthesun

After running deploy.py to remove _contrib_MultiBoxTarget, I got a new error:

NotImplementedError: Operator: L2Normalization is not supported in nnvm.

@kevinthesun
Copy link
Contributor Author

#1157 #1223 L2Normalization related issue and PR.

@ndcuong91
Copy link

@kevinthesun can we compile last operator of the network on a different target device now?

tqchen pushed a commit to tqchen/tvm that referenced this pull request Jul 6, 2018
mnuyens pushed a commit to mnuyens/tvm that referenced this pull request Jul 10, 2018
@kevinthesun kevinthesun deleted the SSDSupport branch July 18, 2018 22:30
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
@2909638069
Copy link

_contrib_MultiBoxTarget is for training. You need to use inference model.

_contrib_MultiBoxTarget is for training. You need to use inference model.

It occurs to me, too.
I just run the tutorial deploy SSD, try to compile it to rasp3b.
NotImplementedError: Operator: _contrib_MultiBoxTarget is not supported in nnvm.
I am newbie, and need more details.

@2909638069
Copy link

OK, then maybe it makes sense to create a github repo, or gist to host the json. In the long term, it would be nice to just use the standard gluon-cv API

gluon-cv depends on scipy, which is not supported by ARMv7l.
I built mxnet on rasp3b, but gluoncv failed .
I hope that I could deploy my code based on gluoncv to rasp3b, which failed to pip3 install gluoncv.
It took me 2 weeks to try tvm and not yet!
It is hard for me to bridge the gap between local and Pi!
So, think about how to deploy on rasp3b.
Keep it simple and stupid!

@kevinthesun
Copy link
Contributor Author

You can follow the tutorial of deploying to edge device. You don't necessarily need to use rpc. You can simply upload compiled graph json, lib.so and param files to rasp and install tom runtime on rasp to load and run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet