Update `Memory consumption` and `Custom operator` docs sections #1719

JanuszL · 2020-02-06T16:57:32Z

updates Custom operator documentation to reflects the most recent DALI operator API
updates Memory consumption docs section to reflect shape inference and changes from Shrink host buffers (#1712)

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It updates documentation for Custom operator and Memory consumption

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
updates Custom operator documentation to reflects the most recent DALI operator API
updates Memory consumption docs section to reflect shape inference and changes from Shrink host buffers (#1712)
Affected modules and functionalities:
docs
Key points relevant for the review:
NA
Validation and testing:
CI
Documentation (including examples):
Custom operator and Memory consumption docs

JIRA TASK: [NA]

mzient · 2020-02-06T17:47:50Z

docs/examples/custom_operations/custom_operator/create_a_custom_operator.ipynb

@@ -524,7 +540,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
-   "version": "2.7.12"
+   "version": "2.7.17"


Wow. I thought it was deprecated :)

klecki · 2020-02-07T12:14:29Z

docs/advanced_topics.rst

-This is most visible for the operators whose output size may differ from sample to sample and from run to run. Operator with the fixed size outputs, such as crop, does not influence the overall memory consumption growth over time
+DALI uses three kinds of memory: host, host page-locked (pinned) and GPU.
+
+GPU and pinned memory allocation and freeing require device synchronization. For this reason, DALI avoids reallocating these kinds of memory whenever possible. The buffers allocated with this kind of storage will only grow when the existing buffer is too small to accommodate the requested shape. This allocation strategy reduces the number of total memory management operations and greatly increases the processing speed after the allocations have stabilized.


Maybe talk about allocation and deallocation instead of allocation and freeing?

Just a suggestion, I don't have any strong opinion.

Neither do I.

klecki · 2020-02-07T12:16:25Z

docs/advanced_topics.rst

+
+GPU and pinned memory allocation and freeing require device synchronization. For this reason, DALI avoids reallocating these kinds of memory whenever possible. The buffers allocated with this kind of storage will only grow when the existing buffer is too small to accommodate the requested shape. This allocation strategy reduces the number of total memory management operations and greatly increases the processing speed after the allocations have stabilized.
+
+In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 and 1, with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.


Maybe add a sentence that 1 would mean to shrink always, and 0 would mean to never shrink.

klecki · 2020-02-07T12:18:03Z

docs/examples/custom_operations/custom_operator/create_a_custom_operator.ipynb

      "-- Configuring done\n",
      "-- Generating done\n",
-      "-- Build files have been written to: /home/dali/git/dali/docs/examples/extend/customdummy/build\n",
+      "-- Build files have been written to: /home/jlisiecki/Dali/dali/docs/examples/custom_operations/custom_operator/customdummy/build\n",


Please adjust to a generic path.

klecki · 2020-02-07T12:19:00Z

docs/advanced_topics.rst

+
+In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 and 1, with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.
+
+Additionally, both host and GPU buffers have configurable growth factor - if it's above 1 and the requested new size exceeds buffer capacity, the buffer will be allocated with extra margin to potentially avoid subsequent reallocations. This functionality is disabled by default (the growth factor is set to 1). These factors can be controlled via environment variables `DALI_HOST_BUFFER_GROWTH_FACTOR` and `DALI_DEVICE_BUFFER_GROWTH_FACTOR`, respectively as well as with Python API functions `nvidia.dali.backend.SetHostBufferGrowthFactor` and `nvidia.dali.backend.SetDeviceBufferGrowthFactor`. For convenience, the variable `DALI_BUFFER_GROWTH_FACTOR` and corresponding Python function `nvidia.dali.backend.SetBufferGrowthFactor` set the same growth factor for host and GPU buffers.


Should we repeat here that the HOST one is also responsible for host pinned memory?

klecki · 2020-02-07T12:44:29Z

docs/advanced_topics.rst

-This is most visible for the operators whose output size may differ from sample to sample and from run to run. Operator with the fixed size outputs, such as crop, does not influence the overall memory consumption growth over time
+DALI uses three kinds of memory: host, host page-locked (pinned) and GPU.
+
+GPU buffers are allocated to house transformation results is as large as the largest possible batch, while the CPU buffers can be as large as batch size multiplied by the size of the largest sample. Note that even though the CPU processes one sample at a time per thread, a vector of samples needs to reside in the memory. It is worth to note that some CPU operators can calculate the output shape (and thus, the memory required) ahead of time, in which case the output will be preallocated as a single continuous buffer for the whole batch, which makes their memory consumption on par with their GPU counterparts.


I think this lacks a bit of context, maybe something like:

DALI works on batches of samples. For GPU Operators the batch is stored as continuous allocation which is processed in one go. This again reduces the number of necessary allocations. For some CPU Operators, that cannot calculate their output size ahead of time, the batch is instead stored as a vector of separately allocated samples (for others it's still a single continuous allocation). For example if your batches consists of nine 480p images and one 4K image in random order, the single continuous allocation would be able to accommodate all possible combinations of such batches. On the other hand, the CPU batch presented as separate buffers will need to keep an 4K allocation for every sample after several iterations.

The example at the end can be swapped for something less concrete.

klecki · 2020-02-07T14:01:10Z

docs/advanced_topics.rst

+
+In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 (never shrink) and 1 (always shrink), with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.
+
+During processing, it works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.


Suggested change

During processing, it works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.

During processing, DALI works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.

- updates `Custom operator` documentation to reflects the most recent DALI operator API - updates `Memory consumption` docs section to reflect shape inference and changes from `Shrink host buffers (#1712)` Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-02-07T14:05:14Z

!build

dali-automaton · 2020-02-07T14:10:19Z

CI MESSAGE: [1115020]: BUILD STARTED

dali-automaton · 2020-02-07T14:56:26Z

CI MESSAGE: [1115020]: BUILD PASSED

JanuszL requested review from jantonguirao, klecki and mzient February 6, 2020 16:57

mzient reviewed Feb 6, 2020

View reviewed changes

mzient approved these changes Feb 6, 2020

View reviewed changes

klecki reviewed Feb 7, 2020

View reviewed changes

klecki approved these changes Feb 7, 2020

View reviewed changes

JanuszL merged commit eb8ff52 into NVIDIA:master Feb 7, 2020

JanuszL deleted the fix_docs branch February 7, 2020 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `Memory consumption` and `Custom operator` docs sections #1719

Update `Memory consumption` and `Custom operator` docs sections #1719

JanuszL commented Feb 6, 2020

mzient Feb 6, 2020 •

edited

Loading

JanuszL Feb 6, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

JanuszL Feb 7, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

klecki Feb 7, 2020

JanuszL Feb 7, 2020

JanuszL commented Feb 7, 2020

dali-automaton commented Feb 7, 2020

dali-automaton commented Feb 7, 2020


		GPU and pinned memory allocation and freeing require device synchronization. For this reason, DALI avoids reallocating these kinds of memory whenever possible. The buffers allocated with this kind of storage will only grow when the existing buffer is too small to accommodate the requested shape. This allocation strategy reduces the number of total memory management operations and greatly increases the processing speed after the allocations have stabilized.

		In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 and 1, with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.


		In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 and 1, with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.

		Additionally, both host and GPU buffers have configurable growth factor - if it's above 1 and the requested new size exceeds buffer capacity, the buffer will be allocated with extra margin to potentially avoid subsequent reallocations. This functionality is disabled by default (the growth factor is set to 1). These factors can be controlled via environment variables `DALI_HOST_BUFFER_GROWTH_FACTOR` and `DALI_DEVICE_BUFFER_GROWTH_FACTOR`, respectively as well as with Python API functions `nvidia.dali.backend.SetHostBufferGrowthFactor` and `nvidia.dali.backend.SetDeviceBufferGrowthFactor`. For convenience, the variable `DALI_BUFFER_GROWTH_FACTOR` and corresponding Python function `nvidia.dali.backend.SetBufferGrowthFactor` set the same growth factor for host and GPU buffers.


		In contrast, ordinary host memory is relatively cheap to allocate and free. To reduce host memory consumption, the buffers may shrink if the new requested size is smaller than a specified fraction of the old size (called shrink threshold). It can be adjusted to any value between 0 (never shrink) and 1 (always shrink), with the default being 0.9. The value can be controlled either via environment variable `DALI_HOST_BUFFER_SHRINK_THRESHOLD` or set in Python using `nvidia.dali.backend.SetHostBufferShrinkThreshold` function.

		During processing, it works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.

	During processing, it works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.
	During processing, DALI works on batches of samples. For GPU and some CPU Operators, the batch is stored as continuous allocation which is processed in one go, which reduces the number of necessary allocations.

Update Memory consumption and Custom operator docs sections #1719

Update Memory consumption and Custom operator docs sections #1719

Conversation

JanuszL commented Feb 6, 2020

Why we need this PR?

What happened in this PR?

mzient Feb 6, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL commented Feb 7, 2020

dali-automaton commented Feb 7, 2020

dali-automaton commented Feb 7, 2020

Update `Memory consumption` and `Custom operator` docs sections #1719

Update `Memory consumption` and `Custom operator` docs sections #1719

mzient Feb 6, 2020 •

edited

Loading