Add boolean ndarray #15940

reminisce · 2019-08-19T06:24:11Z

Description

Added boolean ndarray infra in mshadow.
Implemented comparison operators: equal, not_equal, greater, greater_equal, less, and less_equal to use np.bool_ as their output tensors' dtype.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

Follow-up work includes:

More operators that can consume boolean ndarrays. To list a few that must be finished before the next release:
- sum
- cast
- boolean_mask
ndarray boolean indexing

Thank @yzhliu @hzfan for the help on debugging.

reminisce · 2019-08-19T06:32:22Z

src/operator/numpy/np_elemwise_broadcast_op.cc

+  values[0].v_handle = const_cast<DLTensor*>(&(tblobs[0].dltensor()));
+
+  // scalar param
+  type_codes[1] = kDLFloat;


@yzhliu Since I need to pass a double param to the op func generated by TVM, I cannot use the Call function defined in the TVMOpModule. I moved the logic of preparing TVMArgs up here from the Call function to MXNet op's FCompute function and added an independent CallEx in TVMOpModule to just invoke the kernel. We can discuss the change of the API to cater for more use cases.

marcoabreu · 2019-10-03T08:51:49Z

Makefile

@@ -473,11 +473,13 @@ CFLAGS += -I$(TVM_PATH)/include -DMXNET_USE_TVM_OP=1
 LDFLAGS += -L$(ROOTDIR)/lib -ltvm_runtime -Wl,-rpath,'$${ORIGIN}'

 TVM_USE_CUDA := OFF
+TVM_OP_CUDA_ARCH := NONE


Any particular reason you are introducing a second set instead of using the arch set variable we already have? In which use case would these two differ?

Reverted. Thanks for the suggestion.

larroy · 2019-10-05T01:52:11Z

src/operator/operator_tune.cc

+  __macro$(__VA_ARGS__, int32_t); \
+  __macro$(__VA_ARGS__, int64_t); \
+  __macro$(__VA_ARGS__, bool)
+
 #define IMPLEMENT_WORKLOAD_VALUE_FOR_TYPE(__op$, __typ$) \


could you please add a comment to this macro to clarify?

larroy · 2019-10-05T01:54:50Z

tests/python/unittest/test_numpy_op.py

@@ -240,27 +236,38 @@ def is_int(dtype):
    in_data_dim = random.choice([2, 3, 4])
    shape = rand_shape_nd(in_data_dim, dim=3)
    acc_type = {'float16': 'float32', 'float32': 'float64', 'float64': 'float64',
-                'int8': 'int32', 'int32': 'int64', 'int64': 'int64'}
+                'int8': 'int32', 'int32': 'int64', 'int64': 'int64', 'bool': 'int64'}
    for hybridize in [False, True]:


would using https://docs.python.org/3.7/library/itertools.html#itertools.product help readability of the code and make it less nested?

Thanks for the suggestion. Will consider refactoring it in the following PRs.

marcoabreu

Approve for build system

Add np.equal implemented using tvmop Fix setting DLDataType conversion for boolean ndarray Add equal_gpu Fix inputs with different ndims Fix copying boolean ndarrays across devices Refactor binary logic op impl by tvm Add more logic ops Refactor TVMOpModule::Call to CallEx Add binary scalar logic op expr and schedule Add binary scalar logic ops Add free functions for logic ops Rebase with master to fix SetDLTensor bug Fix pylint Add sum op for boolean ndarrays using tvm op module Add sum boolean gpu compute Add bool type support to boolean_mask Boolean indexing working Clean up Fix merge Sync Makefile Rebase Add boolean indexing test Fix sanity Fix gpu and add autograd test Rebase Fix test for windows Fix tests Try to fix cuda arch missing error in ci Fix ci Fix windows build Try to fix cmake Fix cmake Fix Revert config.mk

haojin2

LGTM

reminisce requested review from yzhliu and haojin2 August 19, 2019 06:24

reminisce requested review from anirudh2290, eric-haibin-lin and szha as code owners August 19, 2019 06:24

reminisce removed request for anirudh2290, szha and eric-haibin-lin August 19, 2019 06:25

reminisce commented Aug 19, 2019

View reviewed changes

reminisce added the Numpy label Aug 19, 2019

reminisce added this to In progress in numpy via automation Aug 19, 2019

reminisce force-pushed the add_boolean_ndarray branch 2 times, most recently from 4b372cc to 595e2f7 Compare August 28, 2019 06:22

tingying2020 mentioned this pull request Sep 9, 2019

[numpy] [tvm] operator true_divide #16124

Open

reminisce force-pushed the add_boolean_ndarray branch 2 times, most recently from 88e63b4 to 127b036 Compare September 26, 2019 21:49

reminisce requested review from aaronmarkham and marcoabreu as code owners October 2, 2019 06:20

reminisce force-pushed the add_boolean_ndarray branch 2 times, most recently from d7f2963 to 327e0f7 Compare October 2, 2019 22:06

marcoabreu suggested changes Oct 3, 2019

View reviewed changes

numpy automation moved this from In progress to Needs review Oct 3, 2019

larroy reviewed Oct 5, 2019

View reviewed changes

reminisce force-pushed the add_boolean_ndarray branch from c7e273c to db57be4 Compare October 5, 2019 05:01

numpy automation moved this from Needs review to Reviewer approved Oct 5, 2019

marcoabreu approved these changes Oct 5, 2019

View reviewed changes

reminisce force-pushed the add_boolean_ndarray branch from db57be4 to b49cfce Compare October 8, 2019 05:33

reminisce added 2 commits October 8, 2019 13:25

Fix cmake

9a04649

reminisce added 2 commits October 8, 2019 13:25

Skip compute capability <= 52 for TVM generated ops

0911be4

Fix sanity

a6dac14

reminisce force-pushed the add_boolean_ndarray branch from 5dffe7a to a6dac14 Compare October 8, 2019 20:26

yzhliu approved these changes Oct 8, 2019

View reviewed changes

haojin2 approved these changes Oct 8, 2019

View reviewed changes

haojin2 merged commit 15ea40d into apache:master Oct 8, 2019

numpy automation moved this from Reviewer approved to Done Oct 8, 2019

reminisce mentioned this pull request Oct 9, 2019

Comparison ops implemented using mshadow #16414

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add boolean ndarray #15940

Add boolean ndarray #15940

reminisce commented Aug 19, 2019 •

edited

reminisce Aug 19, 2019

marcoabreu Oct 3, 2019

reminisce Oct 5, 2019

larroy Oct 5, 2019

larroy Oct 5, 2019

reminisce Oct 9, 2019

marcoabreu left a comment

haojin2 left a comment

Add boolean ndarray #15940

Add boolean ndarray #15940

Conversation

reminisce commented Aug 19, 2019 • edited

Description

Checklist

Essentials

Comments

reminisce Aug 19, 2019

Choose a reason for hiding this comment

marcoabreu Oct 3, 2019

Choose a reason for hiding this comment

reminisce Oct 5, 2019

Choose a reason for hiding this comment

larroy Oct 5, 2019

Choose a reason for hiding this comment

larroy Oct 5, 2019

Choose a reason for hiding this comment

reminisce Oct 9, 2019

Choose a reason for hiding this comment

marcoabreu left a comment

Choose a reason for hiding this comment

haojin2 left a comment

Choose a reason for hiding this comment

reminisce commented Aug 19, 2019 •

edited