Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNVM][FRONTEND] Tensorflow frontend support #1188

Merged
merged 17 commits into from
Jun 13, 2018
Merged

Conversation

srkreddy1238
Copy link
Contributor

Tested for Inception V1 and V3

@srkreddy1238 srkreddy1238 mentioned this pull request May 29, 2018
4 tasks
@srkreddy1238
Copy link
Contributor Author

@tqchen , @Huyuwei and @FrozenGene pls review.

@srkreddy1238
Copy link
Contributor Author

With respect to nnvm base it's all the same except tutorials/nnvm/ location changed now.

# Tensor flow doesn't have seperate list for params extraction.
# Operator name 'Const' is treated as a parameter to build NNVM params dict.
if node.op == "Placeholder":
# Assuming only one input graph with type 'Placeholder'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only a Placeholder is supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike other frameworks Tensorflow doesn't list graph inputs.
Hence I assumed first 'Placeholder' or 'Const' node as input node for the graph.


self._output_shapes[node.name] = \
[tensor_util.TensorShapeProtoToList(shape) \
for shape in self._parse_attr(node.attr)['_output_shapes']]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the code require a specific GraphDef version?
When I try your feature, a problem troubles me.
""
for shape in self._parse_attr(node.attr)['_output_shapes']]
KeyError: '_output_shapes'
""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you need to freeze the tensorflow graph with add_shapes=True like below.

output_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph.as_graph_def(add_shapes=True),
['InceptionV3/Predictions/Reshape_1'],
)

It was discussed earlier @ dmlc/nnvm#472

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! "add_shapes=True" can really avoid the error which happened on Placeholder, but the same error happens th on Const even if the fix is added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_output_shapes should exist for all nodes with above option.

Can you share the dump of the Const node ? Or Attribute dump of Const node ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"convert_variables_to_constants" method does not seem to add "output_shape" field

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srkreddy1238 I want to cry! Looking forward to your updates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to share a patch for multi input later today :)

Can you share the protobuf file for reference ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colin1988
I just checked and multi input work fine. No need of any change.
Ref. added a test case for the same.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srkreddy1238 thanks very much. I will try it right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srkreddy1238 I found the method to fix it. More precisely, I found an error in my code.
609 output_graph_def = tf.graph_util.convert_variables_to_constants(
610 sess, sess.graph.as_graph_def(add_shapes=True), ["dnn/title_scoring/final_score/Sigmoid"])
611 tf.reset_default_graph()
612 graph = tf.import_graph_def(output_graph_def, return_elements=["dnn/title_scoring/final_score/Sigmoid"], name='')
613 output_graph_def = tf.graph_util.convert_variables_to_constants(
614 sess, tf.get_default_graph().as_graph_def(add_shapes=True), ["dnn/title_scoring/final_score/Sigmoid"])
615 tf.train.write_graph(output_graph_def, 'export_model', "asq_title_score.pb.txt", as_text=True)

Line 611 is essential !!

@srkreddy1238
Copy link
Contributor Author

@Huyuwei Can you help with a final review on this ?

@kaishijeng
Copy link

@srkreddy1238

I tried from_tensorflow.py with MobileNet_v1_1.0_224 from https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md.
Unfortunately, it gives me errors below:

('x', (299, 299, 3))
Traceback (most recent call last):
File "from_tensorflow.py", line 154, in
sym, params = nnvm.frontend.from_tensorflow(graph_def)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/tensorflow.py", line 650, in from_tensorflow
sym, params = g.from_tensorflow(graph)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/tensorflow.py", line 463, in from_tensorflow
for shape in self._parse_attr(node.attr)['_output_shapes']]
KeyError: '_output_shapes'

Any idea?

Thanks

@srkreddy1238
Copy link
Contributor Author

srkreddy1238 commented Jun 2, 2018

Regenerate the tensor flow graph with add_shapes=True.

This is an assumption to have _output_shapes for all nodes.

@kaishijeng
Copy link

@srkreddy1238

Do you have a python example to convert checkpoint to frozen model with shapes=True?

Thanks,

@srkreddy1238
Copy link
Contributor Author

Ref tool for conversion: https://github.com/srkreddy1238/dmlc_data/blob/master/work/tf-to-nnvm.py

I checked Mobilenet and below operations missing in frontend now.
DepthwiseConv2dNative, FusedBatchNorm, Relu6.

I am working on mapping these operator to existing nnvm top.

Will share soon.

@srkreddy1238
Copy link
Contributor Author

@tqchen & @Huyuwei : Can you guys help to review this pull request soon?
Or
Should I continue to add additions on top of the same ?

@kaishijeng
Copy link

@srkreddy1238

I am able to convert mobilenet model with your python script. Waiting for your PR to support additional operators of mobilenetv1.

Thanks,

@tqchen
Copy link
Member

tqchen commented Jun 3, 2018

await approval from @Huyuwei

@Huyuwei
Copy link
Contributor

Huyuwei commented Jun 3, 2018

Sorry for late response due to travelling. Will do a final review Tuesday.

from tensorflow.python.framework import tensor_util


repo_base = 'https://github.com/srkreddy1238/dmlc_data/raw/master/models/tensorflow/InceptionV1/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put data into dmlc/web-data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmlc/web-data#74
PR for support files @ web-data.

I will update the above paths after this merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data paths changed to dmlc/web-data.


######################################################################
# Download processed tensorflow model
# ---------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rst standard: make sure the line is the same as the title

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return graph_def

class NodeLookup(object):
"""Converts integer node ID's to human readable labels."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is not very helpful in understanding in tutorials. Consider put it under nnvm.testiung

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.node_lookup = self.load(label_lookup_path, uid_lookup_path)

def load(self, label_lookup_path, uid_lookup_path):
"""Loads a human readable English name for each softmax node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@srkreddy1238
Copy link
Contributor Author

@kaishijeng

I could get through FusedBatchNorm, Relu6 and Shape operators, but depthwise convolution don't have a direct support now.
I will work on depthwise conv later this week.

@srkreddy1238 srkreddy1238 force-pushed the master branch 2 times, most recently from f361f7f to d42ae4f Compare June 5, 2018 03:12
@kaishijeng
Copy link

@srkreddy1238

If there is no depthwise conv support, how from_mxnet.py can support mobilenetv1 without any issue?

Thanks,

@srkreddy1238
Copy link
Contributor Author

@kaishijeng
I referred mxnet too and tried some initial quick trials and failed 👎 .
I will attempt this again in detailed on Thursday & hope to PR new changes by this week end.

@colin1988
Copy link

tensorflow use # Gather to implement # embedding_lookup. Is there corresponding op in tvm/nnvm?

I got the error below

Traceback (most recent call last):
File "asq_model.py", line 42, in
graph, params = nnvm.frontend.from_tensorflow(graph_def)
File "/home/disk2/liushengbing/dev/tvm/py_env/mms-hpc-devel/lib/python2.7/site-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/tensorflow.py", line 650, in from_tensorflow
sym, params = g.from_tensorflow(graph)
File "/home/disk2/liushengbing/dev/tvm/py_env/mms-hpc-devel/lib/python2.7/site-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/tensorflow.py", line 503, in from_tensorflow
op = self._convert_operator(node.op, inputs, attr)
File "/home/disk2/liushengbing/dev/tvm/py_env/mms-hpc-devel/lib/python2.7/site-packages/nnvm-0.8.0-py2.7.egg/nnvm/frontend/tensorflow.py", line 617, in _convert_operator
raise NotImplementedError("Operator {} not implemented.".format(op_name))
NotImplementedError: Operator Gather not implemented.

@srkreddy1238
Copy link
Contributor Author

srkreddy1238 commented Jun 5, 2018

@kaishijeng
srkreddy1238@c5e852e
I could compile and run mobilenet with this patch on top of this PR.
This patch need a bit of clean up before PR.

Can you help to validate the result first?

@srkreddy1238
Copy link
Contributor Author

@colin1988

Gather is not supported in tensorflow now.
I think we don't have support from nnvm/tvm too.

@kaishijeng
Copy link

kaishijeng commented Jun 6, 2018 via email

@colin1988
Copy link

@srkreddy1238 Tensorflow supported gather early, probably starting from 0.10

@kaishijeng
Copy link

@srkreddy1238

How do I pull this patch srkreddy1238/tvm@c5e852e in my code?

FC

test_forward_reshape()
test_forward_squeeze()
test_forward_concat_v2()
test_forward_multi_input()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test 2 or 3 end-to-end networks like Inception, ResNet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Huyuwei
Added Inception V1 & V3 end-to-end test cases.

I will add mobilenet as a separate PR soon (once the dependencies are merged).
Later I will check Resnet too.

@srkreddy1238
Copy link
Contributor Author

@colin1988

I mean our tensorflow frontend now doesn't support Gather now.

I see "take" operator is added to tvm already and nnvm operator for the same is pending.
We could use this to support Gather.

@srkreddy1238
Copy link
Contributor Author

@kaishijeng
We need to add ".patch" to get it as plain diff patch file.
Use the below link to download.
https://github.com/srkreddy1238/tvm/commit/c5e852e.patch

@kaishijeng
Copy link

@srkreddy1238

Apply patch and got the following error:

Traceback (most recent call last):
File "./from_tensorflow.py", line 162, in
graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, dtype=dtype_dict, params=params)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.py", line 272, in build
graph = optimize(graph, shape, dtype, layout)
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/compiler/build_module.py", line 175, in optimize
graph = graph.apply(["InferShape", "SimplifyInference"])
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/graph.py", line 234, in apply
check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
File "/usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/_base.py", line 75, in check_call
raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: [00:00:48] src/compiler/simplify_inference.cc:27: Check failed: dshape.ndim() != 0 (0 vs. 0)

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.4.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(dmlc::StackTraceabi:cxx11+0x58) [0x7f93f387c0]
[bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.4.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x44) [0x7f93f3916c]
[bt] (2) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(nnvm::compiler::BatchNormToInferUnpack(nnvm::NodeAttrs const&, nnvm::NodeEntry, nnvm::NodeEntry, nnvm::NodeEntry, nnvm::NodeEntry, nnvm::NodeEntry, nnvm::TShape, nnvm::TShape)+0xf28) [0x7f91cb8658]
[bt] (3) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(+0x162fd8) [0x7f91cb8fd8]
[bt] (4) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(+0x163ec8) [0x7f91cb9ec8]
[bt] (5) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(nnvm::compiler::SimplifyInference(nnvm::Graph)+0x204) [0x7f91cba91c]
[bt] (6) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(std::_Function_handler<nnvm::Graph (nnvm::Graph), nnvm::Graph (*)(nnvm::Graph)>::_M_invoke(std::_Any_data const&, nnvm::Graph&&)+0xc8) [0x7f91cbb1b8]
[bt] (7) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(nnvm::ApplyPasses(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&)+0x23c) [0x7f91d072b4]
[bt] (8) /usr/local/lib/python2.7/dist-packages/nnvm-0.8.0-py2.7.egg/nnvm/libnnvm_compiler.so(NNGraphApplyPasses+0x2a4) [0x7f91cff274]
[bt] (9) /usr/lib/aarch64-linux-gnu/libffi.so.6(ffi_call_SYSV+0x64) [0x7f9c794e60]

@srkreddy1238
Copy link
Contributor Author

@kaishijeng
Did you make corresponding changes to from_tensorflow utility ?

Mobilenet input node name is "input", hence

shape_dict = {'input': x.shape}
dtype_dict = {'input': x.dtype}

Ref. Below patch with changes to the utility for mobilenet. You just need to change tf model, image paths.
https://github.com/srkreddy1238/tvm/commit/edd72d6904b69ae266cf5067fddbb6487dc4d291.patch

@srkreddy1238
Copy link
Contributor Author

@tqchen & @Huyuwei
Done with review comments.

params: dict of str to ndarray
The parameter dictionary
"""
arg_list = graph.symbol.list_input_names()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efficiency: build a set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@tqchen tqchen merged commit 61d5303 into apache:master Jun 13, 2018
@tqchen
Copy link
Member

tqchen commented Jun 13, 2018

Thanks to all the code reviewers and contributors for help improving this PR, this is now merged!

@srkreddy1238
Copy link
Contributor Author

Thanks for reviewing to improve the code quality.

@kaishijeng
Copy link

kaishijeng commented Jun 15, 2018

@srkreddy1238

What happens to below to support mobilenet?
https://github.com/srkreddy1238/tvm/tree/c5e852e9bc0cc763a69dbb3a7a8baf2c557e07ce
Will you create a PR for it? If not, how can I merge it to the latest TVM master?

Thanks,

@srkreddy1238
Copy link
Contributor Author

@kaishijeng
Ref. #1232 should go first before other changes to tensorflow frontend.

@kaishijeng
Copy link

@srkreddy1238

Which PR should I pull to add FusedBatchNorm operator?

Thanks

@srkreddy1238
Copy link
Contributor Author

srkreddy1238 commented Jun 15, 2018

for FusedBatchNorm only the below line should work.

srkreddy1238@c5e852e#diff-3845514369cc735d96dd63a274a21471R484

@kaishijeng
Copy link

kaishijeng commented Jun 15, 2018 via email

@srkreddy1238
Copy link
Contributor Author

:) no specific reason.

I understand you wanted mobilenet on latest code.

I will raise today for the frontend.

**You may need to take both patches to your local code for mobilenet.

@srkreddy1238
Copy link
Contributor Author

srkreddy1238 commented Jun 15, 2018

@kaishijeng
Ref. #1232 PR works for mobilenet.

@kaishijeng
Copy link

@srkreddy1238

Mobilenet works properly with 1232PR.
Now SSD is officially enabled in TVM. Do you have a plan to support tensorflow ssd_mobilenet?

Thanks,

@FrozenGene
Copy link
Member

@kaishijeng in fact, I supported ssd-mobilenet tensorflow model before tvm support it. My way is transform tensorflow’s ssd-mobilenet to CoreML, then use NNVM/TVM support it. And I don’t need any special ops to support ssd-mobilenet.

@srkreddy1238
Copy link
Contributor Author

@kaishijeng
I will work on SSD addition to TF frontend probably end of next week.

@FrozenGene
Copy link
Member

@srkreddy1238 if everything goes well, I think you will not use new ssd ops in tvm, which is for MXNet I think.

@kaishijeng
Copy link

@FrozenGene

Can you provide an instruction of your way to (via CoreML) enablessd-mobilenet? My platform is RK3399 which is arm-based linux?

Thanks,

@FrozenGene
Copy link
Member

@kaishijeng see: https://github.com/tf-coreml/tf-coreml/tree/master/examples Maybe you will implement some ops in from_coreml.py in nnvm. I previous implemented some ops haven’t contributed back to nnvm.

tqchen pushed a commit to tqchen/tvm that referenced this pull request Jul 6, 2018
mnuyens pushed a commit to mnuyens/tvm that referenced this pull request Jul 10, 2018
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
@madhavajay
Copy link

@FrozenGene Are you able to share any of your code getting the CoreML converted Model to run on TVM?

@FrozenGene
Copy link
Member

FrozenGene commented Oct 10, 2018

@madhavajay You could start from here: https://github.com/tf-coreml/tf-coreml/blob/master/examples/ssd_example.ipynb Then you could implement related ops to support this model. In my memory, there are not many ops to implement. Currently, my CoreML frontend has many different places compared than official TVM. (For example, my TVM support 4-D padding of Conv2D, but official TVM doesn't, we want to PR it soon). So I could not figure out what modification for SSD-Mobilenet model. We will also have plan PR our CoreML frontend, but doesn't have one detail time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants