-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
src/executor/graph_executor.cc
Outdated
if (shared_exec != nullptr) { | ||
for (auto& nd : dynamic_cast<GraphExecutor*>(shared_exec)->data_pool_) { | ||
size_t bytes = nd.shape().Size() * mshadow::mshadow_sizeof(nd.dtype()); | ||
shared_pool.emplace_back(nd.ctx().dev_id, bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same structure that was removed during nnvm refactor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be a struct like this
https://github.com/dmlc/mxnet/blob/v0.8.0/src/symbol/graph_executor.h#L60
https://github.com/dmlc/mxnet/blob/v0.8.0/src/symbol/graph_executor.h#L60
Could you add a memory test using bucketing module? |
d6cf453
to
71cd198
Compare
python/mxnet/module/base_module.py
Outdated
@@ -133,6 +133,7 @@ def __init__(self, logger=logging): | |||
self.params_initialized = False | |||
self.optimizer_initialized = False | |||
self._symbol = None | |||
self.total_exec_bytes = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to _total_exec_bytes to indicate users should not rely on this.
@@ -558,6 +559,8 @@ def _get_or_reshape(name, shared_data_arrays, arg_shape, arg_type, context, logg | |||
executor = self.symbol.bind(ctx=context, args=arg_arrays, | |||
args_grad=grad_arrays, aux_states=aux_arrays, | |||
grad_req=self.grad_req, shared_exec=shared_exec) | |||
# Get the total bytes allocated for this executor | |||
self.total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this discount memory taken from shared exec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't. But we definitely can add finer grained profiling stat later (how much extra memory allocated despite shared exec, how it's mapped to ops, which device does it live on, etc)
nnvm pr is merged. Please update submodule |
da1ecc5
to
c218b1a
Compare
…e while binding new buckets
…y. Add test case for bucketing
@piiswrong test passed. Please merge. Thanks! |
#4795
#5123
#5035
(Together with NNVM PR dmlc/nnvm#105)
Found some inefficiency in the system and made a few changes related to memory allocation in MXNet:
curr_module
is passed in as theshared_module
, but instead the module with default_bucket_key should be passed. Link.NNVM_EXEC_MATCH_RANGE
variable has impact on the result of memory planning. Instead of letting the user to choose it, the backend could just try different values and choose the best one to automate the process. Some users are not even aware of this variable. LinkFixing 1 and 2 reduce the memory quite a lot, while 3 and 4 bring marginal reduction if 1 and 2 are fixed (5% ~ 10%).
Benchmark result on LSTM workload:
Benchmark result on Neural Style workload