Shrink host buffers #1712

mzient · 2020-02-04T09:46:11Z

Reduce the size of the underlying allocation for non-pinned host buffers when the new size is less than 90% of current allocation size.

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Why we need this PR?

Pick one, remove the rest

It fixes a problem with excessive host memory consumption when handling datasets with high size variance.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
- Added a backend-dependent shrink threshold value
Affected modules and functionalities:
- buffer.h
Key points relevant for the review:
- buffer.h?
Validation and testing:
- existing tests apply
- need to run performance tests (i.e. L3 training)
Documentation (including examples):
- N/A

JIRA TASK: N/A

Reduce the size of the underlying allocation for non-pinned host buffers when the new size is less than 90% of current allocation size. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2020-02-04T09:47:17Z

!build

dali-automaton · 2020-02-04T09:50:27Z

CI MESSAGE: [1107586]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2020-02-04T14:18:02Z

!build

dali-automaton · 2020-02-04T14:20:23Z

CI MESSAGE: [1107944]: BUILD STARTED

jantonguirao · 2020-02-04T14:21:57Z

dali/pipeline/data/buffer.h

@@ -270,6 +271,21 @@ class Buffer {

  DISABLE_COPY_MOVE_ASSIGN(Buffer);

+  static void SetGrowthFactor(double factor) {
+    assert(factor >= 1.0);
+    growth_factor_ = factor;


mutex lock?

I think it's not necessary and definitely quite costly. We could make it atomic, but that's still a bit of overdoing things. It's not a critical resource and the function is extremely unlikely to be used at the time the value is used. The timing is not critical and there can be no race condition, since the value is only used once in ResizeHelper.

fair enough

dali-automaton · 2020-02-04T14:32:21Z

CI MESSAGE: [1107944]: BUILD FAILED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2020-02-04T14:48:56Z

!build

dali-automaton · 2020-02-04T14:50:22Z

CI MESSAGE: [1107971]: BUILD STARTED

JanuszL · 2020-02-04T15:43:52Z

dali/pipeline/init.cc

@@ -40,13 +43,30 @@ void subscribe_signals() {

 #endif

+void InitializeBufferPolicies() {
+  if (const char *threshold_str = std::getenv("DALI_HOST_BUFFER_SHRINK_THRESHOLD")) {


Maybe we can extract this setter to some common place as now we have ranges spread in two places.

I moved the max growth factor to buffer.h.

mzient · 2020-02-04T16:00:24Z

!build

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2020-02-04T16:02:56Z

CI MESSAGE: [1107586]: BUILD PASSED

dali-automaton · 2020-02-04T16:05:30Z

CI MESSAGE: [1108086]: BUILD STARTED

dali-automaton · 2020-02-04T16:23:32Z

CI MESSAGE: [1107971]: BUILD PASSED

dali-automaton · 2020-02-04T17:27:29Z

CI MESSAGE: [1108086]: BUILD PASSED

- updates `Custom operator` documentation to reflects the most recent DALI operator API - updates `Memory consumption` docs section to reflect shape inference and changes from `Shrink host buffers (#1712)` Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Shrink host buffers

8f6b03c

Reduce the size of the underlying allocation for non-pinned host buffers when the new size is less than 90% of current allocation size. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient requested a review from a team February 4, 2020 09:46

JanuszL approved these changes Feb 4, 2020

View reviewed changes

Make growth factor and shrink threshold configurable.

97c127c

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient requested a review from JanuszL February 4, 2020 14:18

mzient requested review from klecki and jantonguirao February 4, 2020 14:21

jantonguirao reviewed Feb 4, 2020

View reviewed changes

Fix implicit instantiation problem bothering Clang.

7ce5aec

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient force-pushed the ShrinkHostBuffers branch from b181bb7 to 7ce5aec Compare February 4, 2020 14:47

jantonguirao approved these changes Feb 4, 2020

View reviewed changes

JanuszL reviewed Feb 4, 2020

View reviewed changes

JanuszL approved these changes Feb 4, 2020

View reviewed changes

Common place for max growth factor.

f176c2a

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

JanuszL approved these changes Feb 4, 2020

View reviewed changes

mzient merged commit 8362dfc into NVIDIA:master Feb 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrink host buffers #1712

Shrink host buffers #1712

mzient commented Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

jantonguirao Feb 4, 2020

mzient Feb 4, 2020

jantonguirao Feb 4, 2020

dali-automaton commented Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

JanuszL Feb 4, 2020

mzient Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

Shrink host buffers #1712

Shrink host buffers #1712

Conversation

mzient commented Feb 4, 2020

Why we need this PR?

What happened in this PR?

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

jantonguirao Feb 4, 2020

Choose a reason for hiding this comment

mzient Feb 4, 2020

Choose a reason for hiding this comment

jantonguirao Feb 4, 2020

Choose a reason for hiding this comment

dali-automaton commented Feb 4, 2020

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

JanuszL Feb 4, 2020

Choose a reason for hiding this comment

mzient Feb 4, 2020

Choose a reason for hiding this comment

mzient commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020

dali-automaton commented Feb 4, 2020