[MXNET-1158] JVM Memory Management Documentation #13105

nswamy · 2018-11-04T04:42:05Z

Description

Add Documentation for JVM Memory management explaining the various options and its usage.

@andrewfayres @lanking520 @piyushghai @yzhliu

ankkhedia · 2018-11-04T20:23:07Z

@nswamy Thanks for your contribution!

@mxnet-label-bot [pr-awaiting-review, Scala, Doc]

lanking520

Thanks for your documentation, overall looks good!

scala-package/examples/scripts/run_train_mnist.sh

lanking520 · 2018-11-05T03:39:46Z

scala-package/memory-management.md

+
+The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala.  
+Allocating native memory is straight forward and is done during the construction of the object by a calling the associated C++ API through JNI, however since JVM languages do not have destructors, De-Allocation of these objects becomes problematic and has to explicitly de-allocated. 
+To make it easy, MXNet Scala provides a few modes of operation.


Not finished? Which operations are supported?

scala-package/memory-management.md

lanking520 · 2018-11-05T03:45:50Z

scala-package/memory-management.md

+In this approach, you do not have to write any special code to have native memory cleaned up, however this approach solely depends on the Garbage collector to run and find unreachable objects.
+You can control the frequency of Garbage Collector by calling System.gc() at strategic points such as at the end of an epoch or at the end of a mini-batch in Training.
+
+This approach could be suitable for use-cases such as inference on CPUs and you have large amount of Memory(RAM) on your system.  


So in this case, I can call system.gc() inside my code in a certain timely manner and most of my NDArray would be de-allocated thanks to Phantom Reference?

scala-package/memory-management.md

lupesko

A few suggestions.

scala-package/memory-management.md

lupesko · 2018-11-20T01:14:08Z

scala-package/memory-management.md

+The Scala and Java binding of Apache MXNet uses native memory(C++ Heap either in RAM or GPU memory) in most of the MXNet Scala objects such as NDArray, Symbol, Executor, KVStore, Data Iterators, etc.,. the Scala classes associated with them act as wrappers, 
+the operations on these objects are directed to the MXNet C++ backend via JNI for performance , so the bytes are also stored in the native heap for fast access.   
+
+The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala.  


Nit:

Suggested change

The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala.

The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deallocation of the native memory has to be managed by MXNet Scala.

scala-package/memory-management.md

lupesko · 2018-11-20T01:16:54Z

scala-package/memory-management.md

+
+The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala.  
+Allocating native memory is straight forward and is done during the construction of the object by a calling the associated C++ API through JNI, however since JVM languages do not have destructors, De-Allocation of these objects becomes problematic and has to explicitly de-allocated. 
+To make it easy, MXNet Scala provides a few modes of operation.


Suggested change

To make it easy, MXNet Scala provides a few modes of operation.

To make it easy, MXNet Scala provides a few modes of operation, explained in detail below.

lupesko · 2018-11-20T01:19:51Z

Thanks for the contribution @nswamy .
Pulling in @aaronmarkham to help with the doc review.

lanking520 · 2018-11-24T17:18:54Z

Hi @nswamy could you please address the change and let's get this PR forward

scala-package/memory-management.md

nswamy

addressed all comments and some offline comments i received from @ddavydenko

nswamy · 2018-11-27T23:27:16Z

scala-package/memory-management.md

+In this approach, you do not have to write any special code to have native memory cleaned up, however this approach solely depends on the Garbage collector to run and find unreachable objects.
+You can control the frequency of Garbage Collector by calling System.gc() at strategic points such as at the end of an epoch or at the end of a mini-batch in Training.
+
+This approach could be suitable for use-cases such as inference on CPUs and you have large amount of Memory(RAM) on your system.  


scala-package/memory-management.md

piyushghai · 2018-11-28T01:07:24Z

scala-package/memory-management.md

+
+### 2.  Using Phantom References (Recommended for some use cases)
+
+Apache MXNet uses [Phantom References](https://docs.oracle.com/javase/8/docs/api/java/lang/ref/PhantomReference.html) to track all MXNet Objects that has native memory associated with it. 


[nit] : 'have' instead of 'has'

scala-package/memory-management.md

piyushghai

Thanks for your contribution @nswamy. This is super useful and extremely powerful!
Looks good to me :)

scala-package/memory-management.md

harshp8l · 2018-11-28T01:16:52Z

Thanks, @nswamy this looks great. This was an informative doc - very worthwhile.

andrewfayres · 2018-11-28T01:37:26Z

Overall the content looks good. I'll work to correct some of the minor grammatical stuff after I grab some dinner.

andrewfayres · 2018-11-28T10:02:15Z

Instead of nitpicking at this I made some edits and pushed a new commit like we discussed.

ddavydenko · 2018-11-29T02:32:50Z

@nswamy , do you think you can get to failures in CI in order to push this PR to completion?

zachgk · 2018-11-29T02:32:54Z

I put some suggested edits in a PR at nswamy#3

Edits for scala memory management

nswamy · 2018-11-29T19:02:54Z

Thanks all(@andrewfayres , @zachgk , @lupesko @lanking520 ) for the review and edits. This is ready to be merge, hopefully CI passes and we can merge.

lanking520

Thanks for your contribution, it is well documented now!

andrewfayres

LGTM

zachgk

LGTM

* update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix

…ile (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test #13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

* update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix

…ile (apache#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (apache#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (apache#13431) * Skip flaky test apache#13446 (apache#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (apache#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (apache#13463) * [MXNET-1158] JVM Memory Management Documentation (apache#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (apache#13414) Update row_sparse tutorial * Add resiliency to onnx export code (apache#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (apache#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (apache#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (apache#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (apache#13495) * clarify ops faq regarding docs strings (apache#13492) * Add graph_compact operator. (apache#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (apache#13474) * update github location for sampled_block.py (apache#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * apache#13453 [Clojure] - Add Spec Validations to the Optimizer namespace (apache#13499) * ONNX export: Logical operators (apache#12852) * Fix cmake options parsing in dev_menu (apache#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (apache#12380)" (apache#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (apache#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (apache#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (apache#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (apache#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: apache#13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

nswamy requested a review from yzhliu as a code owner November 4, 2018 04:42

nswamy force-pushed the mem_doc branch from 0a59d8d to dd2685a Compare November 4, 2018 05:57

nswamy changed the title ~~[WIP] JVM Memory Management Documentation~~ JVM Memory Management Documentation Nov 4, 2018

nswamy changed the title ~~JVM Memory Management Documentation~~ [MXNET-1158] JVM Memory Management Documentation Nov 4, 2018

marcoabreu added Doc pr-awaiting-review PR is waiting for code review Scala labels Nov 4, 2018

lanking520 reviewed Nov 5, 2018

View reviewed changes

Wei-1 reviewed Nov 6, 2018

View reviewed changes

scala-package/memory-management.md Show resolved Hide resolved

Wei-1 reviewed Nov 6, 2018

View reviewed changes

scala-package/memory-management.md Show resolved Hide resolved

lupesko reviewed Nov 20, 2018

View reviewed changes

piyushghai reviewed Nov 27, 2018

View reviewed changes

scala-package/memory-management.md Outdated Show resolved Hide resolved

nswamy added 3 commits November 27, 2018 15:14

update train_mnist

978ad14

Add documentation for JVM Memory Management

b1d57fa

update doc

96ff9f9

nswamy force-pushed the mem_doc branch from dd2685a to 96ff9f9 Compare November 28, 2018 00:52

nswamy commented Nov 28, 2018

View reviewed changes

harshp8l reviewed Nov 28, 2018

View reviewed changes

scala-package/memory-management.md Outdated Show resolved Hide resolved

piyushghai reviewed Nov 28, 2018

View reviewed changes

harshp8l reviewed Nov 28, 2018

View reviewed changes

scala-package/memory-management.md Outdated Show resolved Hide resolved

piyushghai reviewed Nov 28, 2018

View reviewed changes

address nit picks

a47d750

harshp8l reviewed Nov 28, 2018

View reviewed changes

scala-package/memory-management.md Outdated Show resolved Hide resolved

address nit picks

3ac8269

Grammar and clarity edits for memory management doc

bcd24c2

zachgk mentioned this pull request Nov 29, 2018

flaky test: test_random.test_randint_generator #13446

Closed

zachgk and others added 6 commits November 28, 2018 18:45

Edits for scala memory management

f6efe66

Update memory-management.md

e030315

Update memory-management.md

03ad8da

Update memory-management.md

1b60a2d

capitalization fix

e13f0a3

Merge pull request #3 from zachgk/mem_doc

f6627cd

Edits for scala memory management

lanking520 approved these changes Nov 29, 2018

View reviewed changes

andrewfayres approved these changes Nov 29, 2018

View reviewed changes

zachgk approved these changes Nov 29, 2018

View reviewed changes

nswamy merged commit 55acf56 into apache:master Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1158] JVM Memory Management Documentation #13105

[MXNET-1158] JVM Memory Management Documentation #13105

nswamy commented Nov 4, 2018 •

edited

ankkhedia commented Nov 4, 2018

lanking520 left a comment

lanking520 Nov 5, 2018

lanking520 Nov 5, 2018

nswamy Nov 27, 2018

lupesko left a comment

lupesko Nov 20, 2018

lupesko Nov 20, 2018

lupesko commented Nov 20, 2018

lanking520 commented Nov 24, 2018

nswamy left a comment

nswamy Nov 27, 2018

piyushghai Nov 28, 2018

piyushghai left a comment •

edited

harshp8l commented Nov 28, 2018

andrewfayres commented Nov 28, 2018

andrewfayres commented Nov 28, 2018

ddavydenko commented Nov 29, 2018

zachgk commented Nov 29, 2018

nswamy commented Nov 29, 2018

lanking520 left a comment

andrewfayres left a comment

zachgk left a comment

	The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deAllocation of the native memory has to be managed by MXNet Scala.
	The JVM using the Garbage Collector only manages objects allocated in the JVM Heap and is not aware of the memory footprint of these objects in the native memory, hence allocation/deallocation of the native memory has to be managed by MXNet Scala.

	To make it easy, MXNet Scala provides a few modes of operation.
	To make it easy, MXNet Scala provides a few modes of operation, explained in detail below.


		### 2. Using Phantom References (Recommended for some use cases)

		Apache MXNet uses [Phantom References](https://docs.oracle.com/javase/8/docs/api/java/lang/ref/PhantomReference.html) to track all MXNet Objects that has native memory associated with it.

[MXNET-1158] JVM Memory Management Documentation #13105

[MXNET-1158] JVM Memory Management Documentation #13105

Conversation

nswamy commented Nov 4, 2018 • edited

Description

ankkhedia commented Nov 4, 2018

lanking520 left a comment

Choose a reason for hiding this comment

lanking520 Nov 5, 2018

Choose a reason for hiding this comment

lanking520 Nov 5, 2018

Choose a reason for hiding this comment

nswamy Nov 27, 2018

Choose a reason for hiding this comment

lupesko left a comment

Choose a reason for hiding this comment

lupesko Nov 20, 2018

Choose a reason for hiding this comment

lupesko Nov 20, 2018

Choose a reason for hiding this comment

lupesko commented Nov 20, 2018

lanking520 commented Nov 24, 2018

nswamy left a comment

Choose a reason for hiding this comment

nswamy Nov 27, 2018

Choose a reason for hiding this comment

piyushghai Nov 28, 2018

Choose a reason for hiding this comment

piyushghai left a comment • edited

Choose a reason for hiding this comment

harshp8l commented Nov 28, 2018

andrewfayres commented Nov 28, 2018

andrewfayres commented Nov 28, 2018

ddavydenko commented Nov 29, 2018

zachgk commented Nov 29, 2018

nswamy commented Nov 29, 2018

lanking520 left a comment

Choose a reason for hiding this comment

andrewfayres left a comment

Choose a reason for hiding this comment

zachgk left a comment

Choose a reason for hiding this comment

nswamy commented Nov 4, 2018 •

edited

piyushghai left a comment •

edited