Memory allocator bug fix #5035 #5133

eric-haibin-lin · 2017-02-24T06:35:34Z

(Together with NNVM PR dmlc/nnvm#105)

Found some inefficiency in the system and made a few changes related to memory allocation in MXNet:

In the bucketing module, curr_module is passed in as the shared_module, but instead the module with default_bucket_key should be passed. Link.
When the memory pool of the default bucket module doesn't hold sufficient memory for other bucket to bind, extra NDArrays are allocated. But these NDArrays are not added back to the pool. Link
While matching new memory required to the existing memory slots in the pool, the new memory list is not sorted based on size, which could result in small memory blocks occupying a big one.
The NNVM_EXEC_MATCH_RANGE variable has impact on the result of memory planning. Instead of letting the user to choose it, the backend could just try different values and choose the best one to automate the process. Some users are not even aware of this variable. Link

Fixing 1 and 2 reduce the memory quite a lot, while 3 and 4 bring marginal reduction if 1 and 2 are fixed (5% ~ 10%).

Benchmark result on LSTM workload:

Version (	`22673b6` (baseline)	1	1 + 2	1 + 2 + 3	1 + 2 + 3 + 4
Memory (MB)	> 12288 (Out of Memory)	4297	1956	1774	1718

Benchmark result on Neural Style workload

Version (	`22673b6` (baseline)	1 + 2 + 3 + 4
Memory (MB)	> 12288 (Out of Memory)	2034

LSTM configuration: python lstm_bucketing.py --gpus=0 --num-layers=4 --num-hidden=1024 --num-embed=512 --num-epochs=1
Neural style uses default configuration

piiswrong · 2017-02-24T06:41:45Z

src/executor/graph_executor.cc

+  if (shared_exec != nullptr) {
+     for (auto& nd : dynamic_cast<GraphExecutor*>(shared_exec)->data_pool_) {
+       size_t bytes = nd.shape().Size() * mshadow::mshadow_sizeof(nd.dtype());
+       shared_pool.emplace_back(nd.ctx().dev_id, bytes); 


Is this the same structure that was removed during nnvm refactor?

There used to be a struct like this

https://github.com/dmlc/mxnet/blob/v0.8.0/src/symbol/graph_executor.h#L60
https://github.com/dmlc/mxnet/blob/v0.8.0/src/symbol/graph_executor.h#L60

piiswrong · 2017-02-24T08:13:43Z

Could you add a memory test using bucketing module?

piiswrong · 2017-02-25T19:46:37Z

python/mxnet/module/base_module.py

@@ -133,6 +133,7 @@ def __init__(self, logger=logging):
        self.params_initialized = False
        self.optimizer_initialized = False
        self._symbol = None
+        self.total_exec_bytes = 0


Change to _total_exec_bytes to indicate users should not rely on this.

piiswrong · 2017-02-25T19:47:25Z

python/mxnet/module/executor_group.py

@@ -558,6 +559,8 @@ def _get_or_reshape(name, shared_data_arrays, arg_shape, arg_type, context, logg
        executor = self.symbol.bind(ctx=context, args=arg_arrays,
                                    args_grad=grad_arrays, aux_states=aux_arrays,
                                    grad_req=self.grad_req, shared_exec=shared_exec)
+        # Get the total bytes allocated for this executor
+        self.total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1])


Does this discount memory taken from shared exec?

No, it doesn't. But we definitely can add finer grained profiling stat later (how much extra memory allocated despite shared exec, how it's mapped to ops, which device does it live on, etc)

piiswrong · 2017-02-25T19:49:15Z

nnvm pr is merged. Please update submodule

…e while binding new buckets

…y. Add test case for bucketing

eric-haibin-lin · 2017-02-28T16:16:58Z

@piiswrong test passed. Please merge. Thanks!

eric-haibin-lin force-pushed the pr branch from c5bd2e8 to 6f6d8df Compare February 24, 2017 06:38

piiswrong reviewed Feb 24, 2017

View reviewed changes

eric-haibin-lin force-pushed the pr branch 2 times, most recently from d6cf453 to 71cd198 Compare February 24, 2017 23:20

piiswrong reviewed Feb 25, 2017

View reviewed changes

eric-haibin-lin force-pushed the pr branch 8 times, most recently from da1ecc5 to c218b1a Compare February 28, 2017 03:12

eric-haibin-lin and others added 8 commits February 28, 2017 04:38

Always pass in the bucket with default_bucket_key as the shared_modul…

260b148

…e while binding new buckets

Imbalance version of shared pool during plan memory

27a8a08

Auto search and updated shared mem pool

fcd3ecf

Cleanup unused code

c644f1e

Sort new pool before memory matching. Remove shared pool in PlanMemor…

5db7ace

…y. Add test case for bucketing

Remove itoa().

b350430

Fix lint warnings

6205883

Change total_exec_bytes to private.

c85e496

eric-haibin-lin force-pushed the pr branch from c218b1a to c85e496 Compare February 28, 2017 04:41

tdomhan mentioned this pull request Feb 28, 2017

High memory usage with bucketing #5035

Closed

piiswrong merged commit 884b50a into apache:master Feb 28, 2017

eric-haibin-lin deleted the pr branch February 28, 2017 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory allocator bug fix #5035 #5133

Memory allocator bug fix #5035 #5133

eric-haibin-lin commented Feb 24, 2017 •

edited

Loading

piiswrong Feb 24, 2017

eric-haibin-lin Feb 24, 2017

piiswrong commented Feb 24, 2017

piiswrong Feb 25, 2017

piiswrong Feb 25, 2017 •

edited

Loading

eric-haibin-lin Feb 25, 2017

piiswrong commented Feb 25, 2017

eric-haibin-lin commented Feb 28, 2017

Memory allocator bug fix #5035 #5133

Memory allocator bug fix #5035 #5133

Conversation

eric-haibin-lin commented Feb 24, 2017 • edited Loading

piiswrong Feb 24, 2017

Choose a reason for hiding this comment

eric-haibin-lin Feb 24, 2017

Choose a reason for hiding this comment

piiswrong commented Feb 24, 2017

piiswrong Feb 25, 2017

Choose a reason for hiding this comment

piiswrong Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

eric-haibin-lin Feb 25, 2017

Choose a reason for hiding this comment

piiswrong commented Feb 25, 2017

eric-haibin-lin commented Feb 28, 2017

eric-haibin-lin commented Feb 24, 2017 •

edited

Loading

piiswrong Feb 25, 2017 •

edited

Loading