Add an ability to run DALI without GPU #2165

JanuszL · 2020-07-29T00:42:01Z

Why we need this PR?

Pick one, remove the rest

It adds an ability to run DALI without GPU

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
Adds a special case for device_id that skips all CUDA related calls
Affected modules and functionalities:
Executor
A couple of operators
Key points relevant for the review:
Some fixes are hacky, check if it can be improved
Validation and testing:
Added a test case for CPU only scenario
Documentation (including examples):
Updated pipeline description

JIRA TASK: [DALI-1491]

dali-automaton · 2020-07-29T00:43:46Z

CI MESSAGE: [1503498]: BUILD STARTED

dali-automaton · 2020-07-29T00:46:53Z

CI MESSAGE: [1503498]: BUILD FAILED

dali-automaton · 2020-07-29T00:48:12Z

CI MESSAGE: [1503505]: BUILD STARTED

dali-automaton · 2020-07-29T01:25:14Z

CI MESSAGE: [1503505]: BUILD FAILED

dali-automaton · 2020-07-29T09:24:18Z

CI MESSAGE: [1504379]: BUILD STARTED

dali-automaton · 2020-07-29T10:32:36Z

CI MESSAGE: [1504495]: BUILD STARTED

dali-automaton · 2020-07-29T10:45:08Z

CI MESSAGE: [1504379]: BUILD FAILED

dali-automaton · 2020-07-29T12:01:36Z

CI MESSAGE: [1504495]: BUILD FAILED

dali-automaton · 2020-07-29T17:26:50Z

CI MESSAGE: [1505415]: BUILD STARTED

dali-automaton · 2020-07-29T17:28:56Z

CI MESSAGE: [1505415]: BUILD FAILED

dali-automaton · 2020-07-29T17:32:51Z

CI MESSAGE: [1505433]: BUILD STARTED

dali-automaton · 2020-07-29T18:54:03Z

CI MESSAGE: [1505433]: BUILD FAILED

dali-automaton · 2020-07-29T22:36:58Z

CI MESSAGE: [1506219]: BUILD STARTED

dali-automaton · 2020-07-30T00:17:39Z

CI MESSAGE: [1506219]: BUILD FAILED

dali-automaton · 2020-07-30T00:51:21Z

CI MESSAGE: [1506588]: BUILD STARTED

dali-automaton · 2020-07-30T03:23:42Z

CI MESSAGE: [1506588]: BUILD FAILED

dali-automaton · 2020-07-30T08:59:46Z

CI MESSAGE: [1506588]: BUILD PASSED

dali-automaton · 2020-08-27T16:35:41Z

CI MESSAGE: [1577356]: BUILD STARTED

dali-automaton · 2020-08-27T18:02:27Z

CI MESSAGE: [1577356]: BUILD FAILED

dali-automaton · 2020-08-27T21:41:19Z

CI MESSAGE: [1577356]: BUILD PASSED

klecki · 2020-08-28T11:11:31Z

dali/pipeline/executor/executor.h

+                 "or equal to CPU_ONLY_DEVICE_ID.");
+    DALI_ENFORCE(graph_->NumOp(OpType::GPU) == 0 && graph_->NumOp(OpType::MIXED) == 0,
+                 "Cannot run a pipeline with Mixed/GPU ops in CPU-only mode. Please provide "
+                 "valid device id or change the operators device.");


Small nitpick:

Suggested change

"valid device id or change the operators device.");

"valid device id or change the operators' device.");

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2020-08-28T11:18:45Z

CI MESSAGE: [1580145]: BUILD STARTED

dali-automaton · 2020-08-28T13:05:33Z

CI MESSAGE: [1580145]: BUILD FAILED

dali-automaton · 2020-08-28T13:44:31Z

CI MESSAGE: [1580145]: BUILD PASSED

Remove from TensorVector the AsTensorList method and constructor from external shared_ptr<TensorList> Both allowed to observe the internal state of TensorVector from the outside, breaking the encapsulation. Adjust few places that relied on the changes of internal state to be externally visible: * Initializing the data graph in workspaces. The TV -> TL conversion is done vie Mixed stage. * ArgumentInputs that are produced as TensorList need to be resynced by ShareData instead. This reverts some changes from NVIDIA#2165: - CPU stage cannot be used as direct outputs due to the TensorList/TensorVector mismatch, we can only share data downwards, but we cannot share and preemptivelly expect the allocation to be mirrored. - CPU-only stage still uses Mixed stage, but just with MakeContigous constrained to CPU outputs. Workspace initialization now takes the CPU_ONLY_DEVICE_ID into consideration and does not set the stream (resulting in has_stream() being false, which in turn keeps the AccessOrder in Mixed ops as HostOrder only). Just for the time of tests two optimizations with contiguous TensorVector -> TensorList were disabled. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Remove from TensorVector the AsTensorList method and constructor from external shared_ptr<TensorList> Both allowed to observe the internal state of TensorVector from the outside, breaking the encapsulation. Adjust few places that relied on the changes of internal state to be externally visible: * Initializing the data graph in workspaces. The TV -> TL conversion is done vie Mixed stage. * ArgumentInputs that are produced as TensorList need to be resynced by ShareData instead. This reverts some changes from NVIDIA#2165: - CPU stage cannot be used as direct outputs due to the TensorList/TensorVector mismatch, we can only share data downwards, but we cannot share and preemptivelly expect the allocation to be mirrored. - CPU-only stage still uses Mixed stage, but just with MakeContigous constrained to CPU outputs. Workspace initialization now takes the CPU_ONLY_DEVICE_ID into consideration and does not set the stream (resulting in has_stream() being false, which in turn keeps the AccessOrder in Mixed ops as HostOrder only). Memory is set to non pinned when CPU_ONLY_DEVICE_ID is detected. Eager mode optimizations with contiguous TensorVector -> TensorList was disabled - it waits for the rework of TensorVector replacint TensorList. TODO: This commit also removes the External Source optimization. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Remove from TensorVector the AsTensorList method and constructor from external shared_ptr<TensorList> Both allowed to observe the internal state of TensorVector from the outside, breaking the encapsulation. Adjust few places that relied on the changes of internal state to be externally visible: * Initializing the data graph in workspaces. The TV -> TL conversion is done vie Mixed stage. * ArgumentInputs that are produced as TensorList need to be resynced by ShareData instead. This reverts some changes from #2165: - CPU stage cannot be used as direct outputs due to the TensorList/TensorVector mismatch, we can only share data downwards, but we cannot share and preemptivelly expect the allocation to be mirrored. - CPU-only stage still uses Mixed stage, but just with MakeContigous constrained to CPU outputs. Workspace initialization now takes the CPU_ONLY_DEVICE_ID into consideration and does not set the stream (resulting in has_stream() being false, which in turn keeps the AccessOrder in Mixed ops as HostOrder only). Memory is set to non pinned when CPU_ONLY_DEVICE_ID is detected. Eager mode optimizations with contiguous TensorVector -> TensorList was disabled - it waits for the rework of TensorVector replacing TensorList. An escape hatch to access the shared_ptr of the allocation was ported to TensorVector from TensorList, to allow for ExternalSource to pass data without copy. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Remove from TensorVector the AsTensorList method and constructor from external shared_ptr<TensorList> Both allowed to observe the internal state of TensorVector from the outside, breaking the encapsulation. Adjust few places that relied on the changes of internal state to be externally visible: * Initializing the data graph in workspaces. The TV -> TL conversion is done vie Mixed stage. * ArgumentInputs that are produced as TensorList need to be resynced by ShareData instead. This reverts some changes from NVIDIA#2165: - CPU stage cannot be used as direct outputs due to the TensorList/TensorVector mismatch, we can only share data downwards, but we cannot share and preemptivelly expect the allocation to be mirrored. - CPU-only stage still uses Mixed stage, but just with MakeContigous constrained to CPU outputs. Workspace initialization now takes the CPU_ONLY_DEVICE_ID into consideration and does not set the stream (resulting in has_stream() being false, which in turn keeps the AccessOrder in Mixed ops as HostOrder only). Memory is set to non pinned when CPU_ONLY_DEVICE_ID is detected. Eager mode optimizations with contiguous TensorVector -> TensorList was disabled - it waits for the rework of TensorVector replacing TensorList. An escape hatch to access the shared_ptr of the allocation was ported to TensorVector from TensorList, to allow for ExternalSource to pass data without copy. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

JanuszL force-pushed the cpu_only_core branch from 03fafc8 to c8d5afb Compare July 29, 2020 00:46

JanuszL force-pushed the cpu_only_core branch from c8d5afb to 47f039e Compare July 29, 2020 09:22

JanuszL force-pushed the cpu_only_core branch 2 times, most recently from 254502a to 046af83 Compare July 29, 2020 10:57

JanuszL force-pushed the cpu_only_core branch from 046af83 to eb1faf1 Compare July 29, 2020 17:23

JanuszL force-pushed the cpu_only_core branch from eb1faf1 to 6589d7b Compare July 29, 2020 17:30

JanuszL force-pushed the cpu_only_core branch 2 times, most recently from 74a4ffd to f8496f0 Compare July 29, 2020 22:18

JanuszL force-pushed the cpu_only_core branch 2 times, most recently from 18f38e7 to 5747322 Compare July 29, 2020 23:46

JanuszL force-pushed the cpu_only_core branch from 5747322 to 4f649e9 Compare July 30, 2020 00:42

JanuszL force-pushed the cpu_only_core branch from 4f649e9 to fb06000 Compare July 30, 2020 08:44

JanuszL force-pushed the cpu_only_core branch from fb06000 to 6608801 Compare July 30, 2020 09:03

JanuszL force-pushed the cpu_only_core branch from cfa9ff4 to 386855c Compare August 27, 2020 16:09

klecki approved these changes Aug 28, 2020

View reviewed changes

klecki reviewed Aug 28, 2020

View reviewed changes

Fix nitpick

ca6362b

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL merged commit c9942ce into NVIDIA:master Aug 28, 2020

JanuszL deleted the cpu_only_core branch August 28, 2020 13:47

JanuszL mentioned this pull request Aug 28, 2020

working in cpu only mode #504

Closed

klecki mentioned this pull request Apr 15, 2022

Stop exposing internal contiguous TV storage #3827

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an ability to run DALI without GPU #2165

Add an ability to run DALI without GPU #2165

JanuszL commented Jul 29, 2020 •

edited

Loading

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Aug 27, 2020

dali-automaton commented Aug 27, 2020

dali-automaton commented Aug 27, 2020

klecki Aug 28, 2020

JanuszL Aug 28, 2020

dali-automaton commented Aug 28, 2020

dali-automaton commented Aug 28, 2020

dali-automaton commented Aug 28, 2020

	"valid device id or change the operators device.");
	"valid device id or change the operators' device.");

Add an ability to run DALI without GPU #2165

Add an ability to run DALI without GPU #2165

Conversation

JanuszL commented Jul 29, 2020 • edited Loading

Why we need this PR?

What happened in this PR?

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 29, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Jul 30, 2020

dali-automaton commented Aug 27, 2020

dali-automaton commented Aug 27, 2020

dali-automaton commented Aug 27, 2020

klecki Aug 28, 2020

Choose a reason for hiding this comment

JanuszL Aug 28, 2020

Choose a reason for hiding this comment

dali-automaton commented Aug 28, 2020

dali-automaton commented Aug 28, 2020

dali-automaton commented Aug 28, 2020

JanuszL commented Jul 29, 2020 •

edited

Loading