Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrink host buffers #1712

Merged
merged 4 commits into from
Feb 4, 2020
Merged

Shrink host buffers #1712

merged 4 commits into from
Feb 4, 2020

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Feb 4, 2020

  • Reduce the size of the underlying allocation for non-pinned host buffers when the new size is less than 90% of current allocation size.

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It fixes a problem with excessive host memory consumption when handling datasets with high size variance.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • Added a backend-dependent shrink threshold value
  • Affected modules and functionalities:
    • buffer.h
  • Key points relevant for the review:
    • buffer.h?
  • Validation and testing:
    • existing tests apply
    • need to run performance tests (i.e. L3 training)
  • Documentation (including examples):
    • N/A

JIRA TASK: N/A

Reduce the size of the underlying allocation for non-pinned host
buffers when the new size is less than 90% of current allocation size.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient mzient requested a review from a team February 4, 2020 09:46
@mzient
Copy link
Contributor Author

mzient commented Feb 4, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107586]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Feb 4, 2020

!build

@mzient mzient requested a review from JanuszL February 4, 2020 14:18
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107944]: BUILD STARTED

@@ -270,6 +271,21 @@ class Buffer {

DISABLE_COPY_MOVE_ASSIGN(Buffer);

static void SetGrowthFactor(double factor) {
assert(factor >= 1.0);
growth_factor_ = factor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutex lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary and definitely quite costly. We could make it atomic, but that's still a bit of overdoing things. It's not a critical resource and the function is extremely unlikely to be used at the time the value is used. The timing is not critical and there can be no race condition, since the value is only used once in ResizeHelper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107944]: BUILD FAILED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Feb 4, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107971]: BUILD STARTED

@@ -40,13 +43,30 @@ void subscribe_signals() {

#endif

void InitializeBufferPolicies() {
if (const char *threshold_str = std::getenv("DALI_HOST_BUFFER_SHRINK_THRESHOLD")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extract this setter to some common place as now we have ranges spread in two places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the max growth factor to buffer.h.

@mzient
Copy link
Contributor Author

mzient commented Feb 4, 2020

!build

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107586]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1108086]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1107971]: BUILD PASSED

@mzient mzient merged commit 8362dfc into NVIDIA:master Feb 4, 2020
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1108086]: BUILD PASSED

JanuszL added a commit that referenced this pull request Feb 7, 2020
- updates `Custom operator` documentation to reflects the most recent DALI operator API
- updates `Memory consumption` docs section to reflect shape inference and changes from `Shrink host buffers (#1712)`

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants