Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Memory allocator bug fix #5035 #5133

Merged
merged 8 commits into from
Feb 28, 2017
Merged

Memory allocator bug fix #5035 #5133

merged 8 commits into from
Feb 28, 2017

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Feb 24, 2017

#4795
#5123
#5035

(Together with NNVM PR dmlc/nnvm#105)

Found some inefficiency in the system and made a few changes related to memory allocation in MXNet:

  1. In the bucketing module, curr_module is passed in as the shared_module, but instead the module with default_bucket_key should be passed. Link.
  2. When the memory pool of the default bucket module doesn't hold sufficient memory for other bucket to bind, extra NDArrays are allocated. But these NDArrays are not added back to the pool. Link
  3. While matching new memory required to the existing memory slots in the pool, the new memory list is not sorted based on size, which could result in small memory blocks occupying a big one.
  4. The NNVM_EXEC_MATCH_RANGE variable has impact on the result of memory planning. Instead of letting the user to choose it, the backend could just try different values and choose the best one to automate the process. Some users are not even aware of this variable. Link

Fixing 1 and 2 reduce the memory quite a lot, while 3 and 4 bring marginal reduction if 1 and 2 are fixed (5% ~ 10%).

Benchmark result on LSTM workload:

Version ( 22673b6 (baseline) 1 1 + 2 1 + 2 + 3 1 + 2 + 3 + 4
Memory (MB) > 12288 (Out of Memory) 4297 1956 1774 1718

Benchmark result on Neural Style workload

Version ( 22673b6 (baseline) 1 + 2 + 3 + 4
Memory (MB) > 12288 (Out of Memory) 2034
  • LSTM configuration: python lstm_bucketing.py --gpus=0 --num-layers=4 --num-hidden=1024 --num-embed=512 --num-epochs=1
  • Neural style uses default configuration

if (shared_exec != nullptr) {
for (auto& nd : dynamic_cast<GraphExecutor*>(shared_exec)->data_pool_) {
size_t bytes = nd.shape().Size() * mshadow::mshadow_sizeof(nd.dtype());
shared_pool.emplace_back(nd.ctx().dev_id, bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same structure that was removed during nnvm refactor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piiswrong
Copy link
Contributor

Could you add a memory test using bucketing module?

@@ -133,6 +133,7 @@ def __init__(self, logger=logging):
self.params_initialized = False
self.optimizer_initialized = False
self._symbol = None
self.total_exec_bytes = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to _total_exec_bytes to indicate users should not rely on this.

@@ -558,6 +559,8 @@ def _get_or_reshape(name, shared_data_arrays, arg_shape, arg_type, context, logg
executor = self.symbol.bind(ctx=context, args=arg_arrays,
args_grad=grad_arrays, aux_states=aux_arrays,
grad_req=self.grad_req, shared_exec=shared_exec)
# Get the total bytes allocated for this executor
self.total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1])
Copy link
Contributor

@piiswrong piiswrong Feb 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this discount memory taken from shared exec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. But we definitely can add finer grained profiling stat later (how much extra memory allocated despite shared exec, how it's mapped to ops, which device does it live on, etc)

@piiswrong
Copy link
Contributor

nnvm pr is merged. Please update submodule

@eric-haibin-lin
Copy link
Member Author

@piiswrong test passed. Please merge. Thanks!

@piiswrong piiswrong merged commit 884b50a into apache:master Feb 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants