[KVStore] Add support for other data types by ZihengJiang · Pull Request #5818 · apache/mxnet

ZihengJiang · 2017-04-13T05:27:07Z

Godricly · 2017-04-13T09:42:41Z

How do you support kvstore of DType when using ps-lite? If I remember correctly, the underling ps-lite is not supporting dtype.

ZihengJiang · 2017-04-13T09:54:50Z

@Godricly it only supports kvstore='local'/'device' for now

ZihengJiang · 2017-04-19T19:51:40Z

Discussion is needed due to this pr may involve some api change. The pr is raised to address dtype problem of kvstore, and it requires user to provide a dtype field in his DataIter:

But looks like there exists some legacy case which uses tuple as provide_data? https://github.com/dmlc/mxnet/blob/master/tests/python/train/test_dtype.py#L130

@piiswrong @mli

piiswrong · 2017-04-19T20:00:07Z

model and module need to parse provide_data and convert to datadesc if necessary.
see example in module and executor_group

Godricly · 2017-04-20T15:32:38Z

I tried 2 provide a infer datatype last year, LOL. #2559 #2564
What I suggest is to refactor ps-lite with better DType support before moving on in mxnet.#58

mli · 2017-04-24T05:22:33Z

do we need to update module as well?

mli · 2017-04-24T05:24:17Z

@Godricly ps-lite supports various data types. one can use templated SArray, such as SArray. Internally ps-lite use char for communication. So any type with size >= sizeof(char) should be fine.

…vstore

ZihengJiang · 2017-04-24T18:06:10Z

@mli module has been updated, it would be more appropriate to change the argument data_shapes/label_shapes to data_attrs/label_attrs here(https://github.com/ZihengJiang/mxnet/blob/kvstore/python/mxnet/module/module.py#L323), but I keep it for compatible.

piiswrong · 2017-04-25T05:50:41Z

python/mxnet/module/base_module.py

            warnings.warn(msg)


-def _parse_data_desc(data_names, label_names, data_shapes, label_shapes):


argument name changes has no practical benefit but can cause problems. Let's keep the original names and improve documents.

piiswrong · 2017-04-25T05:50:49Z

python/mxnet/module/module.py

-            self.data_names, self.label_names, data_shapes, label_shapes)
+        data_attrs = data_shapes
+        label_attrs = label_shapes
+        self._data_attrs, self._label_attrs = _parse_data_desc(


* Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename

…at64 as well as operator gtest framework (#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (#5937) * Update documentation for SVMOutput. (#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (#5885) * edit ndarray API docs (#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (#5962) * [KVStore] Add support for other data types (#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (#5977) * Getting rid of identity (#5935) * Activation ops (#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (#5984) * Correction to LSTMCell docstring (#5986) * [Module] fix input_grads order (#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled

…at64 as well as operator gtest framework (apache#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (apache#5937) * Update documentation for SVMOutput. (apache#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (apache#5885) * edit ndarray API docs (apache#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (apache#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (apache#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (apache#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (apache#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (apache#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (apache#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (apache#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (apache#5962) * [KVStore] Add support for other data types (apache#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (apache#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (apache#5977) * Getting rid of identity (apache#5935) * Activation ops (apache#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (apache#5984) * Correction to LSTMCell docstring (apache#5986) * [Module] fix input_grads order (apache#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (apache#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled

* Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename

…at64 as well as operator gtest framework (apache#5936) * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * DeviceTensor3 added, forEachFast not yet converted * DeviceTensor3 version working * DeviceTensor3 working * . * Fix for use_global_stats * fixed bug with testing suite for double (Float64) * python unit tests working for batchnorm * python unit tests * Update documentation for mxnet.initializer.Mixed (apache#5937) * Update documentation for SVMOutput. (apache#5931) * Update documentation for SVMOutput. * Update doc for SVMOutput - fix formatting. * Adding install instruction for Ubuntu-CPU-Python (apache#5885) * edit ndarray API docs (apache#5806) * edit docs in broadcast_reduce_op * edit docs in broadcast_reduce_op * minor change * lint fix * fix * mx.nd.ones * mx.nd.repeat * mx.nd.reverse * add example in repeat * optimizer update * fix nanprod * fix optimizer_op api doc * fix reduce_op api doc * fix nd.ones api doc * mx.nd.repeat doc change * Update broadcast_reduce_op.h * Symbol docs fixes (apache#5930) * symbol docs minor formatting changes * deepcopy, infer_shape, infer_shape_partial docs modified * Few more small fixes * arithmetic functions fixes * some more modifications * changes after review * small change * grad function note added * More API Doc Edits (apache#5886) * edit activation doc * doc l2_normalization * edit MakeLoss doc * edit blockgrad doc * blockgrad fileline fix * edit MakeLoss doc cont. * doc change 'tensor' to 'multidimensional array' * l2normalization doc improve * makeloss doc improve, blockgrad doc improve * fix doc in activation, l2_normalization, make_loss * fix minor grammar * use .describe to avoid build failure. * Update documentation for mxnet.image.imdecode (apache#5957) * Update documentation for mxnet.image.imdecode * Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library) * Fix script by adding path to Dockerfile (apache#5958) * Clean install script * Add test for pip installations * Remove debug statements & comments * Make test runnable as script and from framework * Fix path to Dockerfiles * Putting failing cases at the end * Update doc for Custom operator. (apache#5875) * Update doc for Custom operator. * Update doc for Custom operator. * Fix formating in doc for Custom operator. * Fix formating in doc for Custom operator. * Minor change to ndarray.Custom documentation. * Minor edit in doc for Custom operator. * Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'. * Minor formatting change for Custom operator documentation. * For Custom operator doc, move example into ndarray_doc.py. * Minor change in Custom operator documentation * Improve the doc of pick + Update dmlc-core (apache#5946) * Add PickParam to fix the docstring and the initial value for axis * Update dmlc-core * Update dmlc-core * Image docs modified (apache#5973) * imageIter doc modified * edited imageiter * ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (apache#5962) * [KVStore] Add support for other data types (apache#5818) * Fix kvstore type * Fix lint * Parse inputs to DataDesc * Make module support dtype * Fix lint * Add default dtype in Comm * Fix lint * Revert rename * [cpp-package] Add C++ basic tutorial and build instruction (apache#5971) * Add C++ basic tutorial and build instruction * Remove binaries * Fix lint * Avoid sign-compare * Update documentation for mxnet.metric.np (apache#5977) * Getting rid of identity (apache#5935) * Activation ops (apache#5938) * [Ops] Add op: 'relu' * Add op: 'sigmoid' * Introduce 'kernel_launch_op' * Add tests and describe; move it to elemwise_unary_op * Fix GPU version * Convert caffe AbsVal to mx.symbol.abs in caffe converter (apache#5984) * Correction to LSTMCell docstring (apache#5986) * [Module] fix input_grads order (apache#5980) * fix input_grads order + update dmlc-core * set label to be optional * update env_var doc (apache#5964) * Adjusting make, Callback removed * batch norm gpu testing * Batch Norm rewrite without mshadow as well as operator gtest framework * performance testing * lint fixes * use CUDNN for this test * remove superfluous omp define * Fix file names in comments * build, run, clean gtest works (although a test is failing) * CR comments * Adjust timing tests for more strenuous sample * Remove temp resource allocation * rearrange source into cc and cu files * lint fixes * Trigger build * Use latest mshadow * temporarily revert channel position parameter field * Add more tests for batchnorm * Add more tests for batchnorm * test_operator_gpu working for all types * Compiles after AccReal * Compiles after AccReal * All tests working * All tests working * build, run, clean gtest works (although a test is failing) * vc++ requires explicit int type for omp for loop * Repair cpp-package * signed/unsigned fixed in cuda file * lint fixes in tests and cpp-package directories * more lint * use IsWriting() helper * Fall-through for unsupported MKL shapes/types * Fall-through for unsupported MKL shapes/types * cleaner mkl_off approach * Warning only whem MKL is requested * Warning only whem MKL is requested * lint * .. * python problem fixed * python problem fixed * Merge branch 'batchnorm' into batchnorm_pr # Conflicts: # src/operator/batch_norm.cc # src/operator/batch_norm.cu # tests/cpp/operator/batchnorm_test.cc * lint fix * lint fix * lint fix * lint fix * lint fix * Fix visual c++ compile problem * . * . * All unit tests pass again * lint fix * fix strange compile errors in CUDNN batchnorm header * FInish using flags instead of bools * lint * Fix timing pass count for forward pass * Fix R script install roxygen problem * code formatting, addition of doc strings is causing IDE to add spaces before the calls * removed commented * cr comments * Change back to compilable code * For CPU mode, store as invstd * move testing code around a little * lint fix * Use AccReal in some places to avoid fp16 problems * Fix minor invstd problem in cuda version * remove unused scale param * add permutation unit test, handle cudnn doesn't like 3D * . * lint * . * Remove mkl_off * lint fix and time cudnn when enabled

Fix kvstore type

7f86e17

ZihengJiang changed the title ~~Fix kvstore type~~ [KVStore] Add support for other data types Apr 13, 2017

ZihengJiang and others added 2 commits April 16, 2017 06:12

Fix lint

6194971

Merge branch 'master' into kvstore

97d4029

ZihengJiang added 2 commits April 19, 2017 20:29

Parse inputs to DataDesc

c5d687f

Merge branch 'master' into kvstore

4befc82

ZihengJiang added 3 commits April 24, 2017 17:58

Make module support dtype

84487dc

Merge branch 'kvstore' of https://github.com/ZihengJiang/mxnet into k…

4b25fc1

…vstore

Fix lint

313c386

ZihengJiang added 2 commits April 24, 2017 23:35

Add default dtype in Comm

42e2307

Fix lint

7f96d21

piiswrong reviewed Apr 25, 2017

View reviewed changes

ZihengJiang and others added 2 commits April 25, 2017 05:57

Revert rename

2645d7b

Merge branch 'master' into kvstore

49966fc

mli merged commit 02671eb into apache:master Apr 25, 2017

Godricly mentioned this pull request Aug 22, 2017

There is a problem in distribute training with dtype=fp16 #7554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KVStore] Add support for other data types#5818

[KVStore] Add support for other data types#5818
mli merged 12 commits intoapache:masterfrom
ZihengJiang:kvstore

ZihengJiang commented Apr 13, 2017 •

edited

Loading

Uh oh!

Godricly commented Apr 13, 2017

Uh oh!

ZihengJiang commented Apr 13, 2017

Uh oh!

ZihengJiang commented Apr 19, 2017

Uh oh!

piiswrong commented Apr 19, 2017 •

edited

Loading

Uh oh!

Godricly commented Apr 20, 2017

Uh oh!

mli commented Apr 24, 2017

Uh oh!

mli commented Apr 24, 2017

Uh oh!

ZihengJiang commented Apr 24, 2017

Uh oh!

piiswrong Apr 25, 2017

Uh oh!

piiswrong Apr 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		warnings.warn(msg)


		def _parse_data_desc(data_names, label_names, data_shapes, label_shapes):

Conversation

ZihengJiang commented Apr 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Godricly commented Apr 13, 2017

Uh oh!

ZihengJiang commented Apr 13, 2017

Uh oh!

ZihengJiang commented Apr 19, 2017

Uh oh!

piiswrong commented Apr 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Godricly commented Apr 20, 2017

Uh oh!

mli commented Apr 24, 2017

Uh oh!

mli commented Apr 24, 2017

Uh oh!

ZihengJiang commented Apr 24, 2017

Uh oh!

piiswrong Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

piiswrong Apr 25, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZihengJiang commented Apr 13, 2017 •

edited

Loading

piiswrong commented Apr 19, 2017 •

edited

Loading