[MXNET-1185] [WIP] Support large array in several operators #13191

apeforest · 2018-11-08T23:56:23Z

Description

This PR fixed the large array issue (#13036, #13070) in the following operators:
ndarray.ones
ndarray.zeros
ndarray.sum
ndarray.arange
ndarray.slice
ndarray.random.uniform
ndarray.empty

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Support large arrays (> 3billion elements) in certain operators
nightly tests on arrays of 5 billion elements

Comments

More PRs in the future to support other operators
Need a thorough performance test
Rely on a defined data type index_t which is int64_t for now. It can be tuned based on various platforms.

zheng-da · 2018-11-09T00:08:11Z

python/mxnet/base.py

@@ -215,6 +215,7 @@ def _load_lib():
 # type definitions
 mx_uint = ctypes.c_uint
 mx_float = ctypes.c_float
+mx_long = ctypes.c_longlong


should this be longlong or just long?

It should be longlong. c_long is the same as c_int in python ctypes.

samskalicky

LGTM

yuxihu

LGTM. One question: what performance implication do we have to change the data type here? memory usage?

anirudhacharya · 2018-11-09T19:54:58Z

@mxnet-label-bot add [pr-awaiting-review]

apeforest · 2018-11-09T21:42:57Z

@yuxihu That's a good question. We have not yet run a full performance test. @wkcn did some initial test and did not see much performance variation between int32_t and int64_t: https://github.com/wkcn/c_performance

wkcn · 2018-11-10T00:17:25Z

@apeforest Could you please add the test? #13070
Thank you!

apeforest · 2018-11-10T14:17:05Z

@apeforest Could you please add the test? #13070
Thank you!

Added: test_ndarray_empty()

lanking520 · 2018-11-15T07:01:24Z

@apeforest Regarding what you have changed the JNI on Scala, I suggest to apply some changes to the following line as well:

https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/LibInfo.scala#L43 Array[Long]
What is dim_t, is there any Java type I can change correspondingly? https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/LibInfo.scala#L78
https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/org/apache/mxnet/LibInfo.scala#L239

@andrewfayres WDYT?

apeforest · 2018-11-20T18:30:45Z

@lanking520 @andrewfayres After offline discussions, I have updated the PR by limiting the changes in scala package to its JNI interface and leave a TODO item to systematically update Scala code to support large array. Please help to review the code again since there are still a few scala-unit test failure. Thanks!

yuxihu

dim_t, size_t, index_t are used in different places within the change. It is unclear to me when to use each one. It is going to be very difficult for others to follow on future changes. I think we need detailed documentation on how to choose data type for operators in order to support large array. It should be also mentioned the in PR description.

In addition, only a few operators are changed to support large array. This make our code base a mix of int/index_t for operators. This can be confusing. Please document and we may need a more systematic way of handling large array instead of hot fixes to a few operators.

yuxihu · 2018-11-20T18:00:37Z

R-package/src/base.h

@@ -354,8 +354,8 @@ inline std::vector<std::string> SafeGetListNames(const Rcpp::List& src) {
 * \param rshape The dimension in R
 * \return A internal vector representation of shapes in mxnet.
 */
-inline std::vector<mx_uint> Dim2InternalShape(const Rcpp::Dimension &rshape) {
-  std::vector<mx_uint> shape(rshape.size());
+inline std::vector<dim_t> Dim2InternalShape(const Rcpp::Dimension &rshape) {


QQ: what is the type of mx_uint and dim_t?

yuxihu · 2018-11-20T18:09:11Z

src/operator/mxnet_op.h

-      static_cast<size_t>(N), static_cast<size_t>(omp_threads))) {
-      for (int i = 0; i < N; ++i) {
+      N, static_cast<size_t>(omp_threads))) {
+      for (size_t i = 0; i < N; ++i) {


why size_t here and index_t on L549?

Because omp parallel block requires index to be signed integer, while size_t is unsigned.

apeforest · 2018-11-27T07:49:26Z

Change this PR to [WIP] because we want to merge a smaller part first (#13418)

apeforest · 2018-11-27T07:49:55Z

@mxnet-label-bot update [pr-work-in-progress]

roywei · 2018-12-11T01:55:52Z

@apeforest do you plan to still use this PR for part 2? If not maybe close it first.
Thanks!

apeforest · 2018-12-11T17:53:34Z

@roywei We can close this PR for now. I will reopen it when it comes to all the language binding support. Thanks!

apeforest added 8 commits November 6, 2018 09:50

Support large integer in operators

a572198

fix large array in sum

e37a06b

Fix large array issue in slice operation

e48b274

fix bug in shape

b183c3f

fix getitem with large index

fcebf5a

fix bug in slice operator

3c7557b

fix bug in random uniform op

904f09b

add nightly test

08bd8ab

apeforest requested review from anirudh2290 and szha as code owners November 8, 2018 23:56

fix lint error

244f386

zheng-da reviewed Nov 9, 2018

View reviewed changes

apeforest added 2 commits November 8, 2018 17:14

fix compilation error on gpu

3ecd257

fix gpu compilation

c70afe8

samskalicky approved these changes Nov 9, 2018

View reviewed changes

apeforest added 2 commits November 9, 2018 00:03

fix build issue

ffcd175

fix windows build error

dbe0e6c

yuxihu reviewed Nov 9, 2018

View reviewed changes

fix build issue in windows

0680184

apeforest requested a review from nswamy as a code owner November 9, 2018 21:39

apeforest added 3 commits November 9, 2018 13:55

fix omp build issue

8fda02a

fix cpp-package build error

87cd144

fix mkldnn build

7afc7a8

apeforest added 2 commits November 9, 2018 20:58

fix an array size bound

862be24

add constants in tests

22213fa

marcoabreu added the pr-awaiting-review PR is waiting for code review label Nov 12, 2018

fix scala unit test

5286f63

apeforest added 12 commits November 16, 2018 09:48

fix scala build

7247e6b

fix python unit test

f8839b3

update scala-package to fix unittest

e1cd1cd

Merge remote-tracking branch 'upstream/master' into bugfix/large-array

e0f4e2d

fix scala unit test

e0fe05c

fix array typecode for python 2 and python 3

024a0ce

lint it

1f3361b

Merge remote-tracking branch 'upstream/master' into bugfix/large-array

23579e1

lint it again

1cd9b88

fix python include error

335e896

fix unit test

a08c79e

lint me in

69703fc

yuxihu suggested changes Nov 20, 2018

View reviewed changes

apeforest added 4 commits November 20, 2018 14:02

fix python unit test in python2 windows

01952c5

fix perl-package unit test

a68bd97

fix perl package

c1b14d1

Merge remote-tracking branch 'upstream/master' into bugfix/large-array

a3daa9b

apeforest mentioned this pull request Nov 27, 2018

[MXNET-1185] Support large array in several operators (part 1) #13418

Merged

7 tasks

apeforest changed the title ~~[MXNET-1185] Support large array in several operators~~ [MXNET-1185] [WIP] Support large array in several operators Nov 27, 2018

marcoabreu added pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review labels Nov 27, 2018

apeforest closed this Dec 11, 2018

apeforest deleted the bugfix/large-array branch February 25, 2020 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1185] [WIP] Support large array in several operators #13191

[MXNET-1185] [WIP] Support large array in several operators #13191

apeforest commented Nov 8, 2018 •

edited

zheng-da Nov 9, 2018

apeforest Nov 9, 2018

samskalicky left a comment

yuxihu left a comment

anirudhacharya commented Nov 9, 2018 •

edited

apeforest commented Nov 9, 2018

wkcn commented Nov 10, 2018

apeforest commented Nov 10, 2018

lanking520 commented Nov 15, 2018

apeforest commented Nov 20, 2018

yuxihu left a comment

yuxihu Nov 20, 2018

yuxihu Nov 20, 2018

apeforest Nov 27, 2018

apeforest commented Nov 27, 2018

apeforest commented Nov 27, 2018

roywei commented Dec 11, 2018

apeforest commented Dec 11, 2018

[MXNET-1185] [WIP] Support large array in several operators #13191

[MXNET-1185] [WIP] Support large array in several operators #13191

Conversation

apeforest commented Nov 8, 2018 • edited

Description

Checklist

Essentials

Changes

Comments

zheng-da Nov 9, 2018

Choose a reason for hiding this comment

apeforest Nov 9, 2018

Choose a reason for hiding this comment

samskalicky left a comment

Choose a reason for hiding this comment

yuxihu left a comment

Choose a reason for hiding this comment

anirudhacharya commented Nov 9, 2018 • edited

apeforest commented Nov 9, 2018

wkcn commented Nov 10, 2018

apeforest commented Nov 10, 2018

lanking520 commented Nov 15, 2018

apeforest commented Nov 20, 2018

yuxihu left a comment

Choose a reason for hiding this comment

yuxihu Nov 20, 2018

Choose a reason for hiding this comment

yuxihu Nov 20, 2018

Choose a reason for hiding this comment

apeforest Nov 27, 2018

Choose a reason for hiding this comment

apeforest commented Nov 27, 2018

apeforest commented Nov 27, 2018

roywei commented Dec 11, 2018

apeforest commented Dec 11, 2018

apeforest commented Nov 8, 2018 •

edited

anirudhacharya commented Nov 9, 2018 •

edited