-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable python ExternalSource operator for the GPU data #1997
Conversation
Check out this pull request on Review Jupyter notebook visual diffs & provide feedback on notebooks. Powered by ReviewNB |
- adds Python side support for GPU data feed to ExterenlSource operator - extends ExternalSource example Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
b2f3f6d
to
bb87f0a
Compare
!build |
CI MESSAGE: [1371416]: BUILD STARTED |
CI MESSAGE: [1371416]: BUILD FAILED |
f7893fb
to
505712a
Compare
!build |
CI MESSAGE: [1371753]: BUILD STARTED |
CI MESSAGE: [1371753]: BUILD FAILED |
!build |
505712a
to
bfd2269
Compare
CI MESSAGE: [1372990]: BUILD STARTED |
CI MESSAGE: [1372990]: BUILD FAILED |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
60e4ce0
to
90c80b5
Compare
!build |
CI MESSAGE: [1373346]: BUILD STARTED |
CI MESSAGE: [1373346]: BUILD FAILED |
0e0743a
to
20113e6
Compare
!build |
CI MESSAGE: [1373692]: BUILD STARTED |
CI MESSAGE: [1373692]: BUILD FAILED |
20113e6
to
3d21b88
Compare
!builld |
3d21b88
to
6ce08bb
Compare
!builld |
CI MESSAGE: [1401034]: BUILD FAILED |
c6e202f
to
fff2fdb
Compare
Correct, when user calls
Again, I guess that's rather obvious - you can touch the memory after sync function returns. If you feel, that it's not that obvious as I think, let's add this to the docs |
fff2fdb
to
a001296
Compare
I added a |
CI MESSAGE: [1402155]: BUILD STARTED |
dali/python/backend_impl.cc
Outdated
@@ -302,7 +311,7 @@ void ExposeTensor(py::module &m) { | |||
layout : str | |||
Layout of the data | |||
device_id: int | |||
Device of where this tensor resides | |||
Device of where this tensor resides. If no is provided the current device is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Device of where this tensor resides. If no is provided the current device is used. | |
Device of where this tensor resides. If not provided, the current device is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dali/python/backend_impl.cc
Outdated
device_id: int | ||
Device of where this lists of tensors resides | ||
device_id : int | ||
Device of where this tensor resides. If no is provided the current device is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CI MESSAGE: [1401034]: BUILD PASSED |
*/ | ||
DLL_PUBLIC void | ||
daliSetExternalInputAsync(daliPipelineHandle *pipe_handle, const char *name, | ||
device_type_t device, const void *data_ptr, | ||
dali_data_type_t data_type, const int64_t *shapes, | ||
int sample_dim, const char *layout_str, | ||
cudaStream_t stream); | ||
cudaStream_t stream, int sync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a question if we still need both async and sync variants of this function if this can be handled by the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync has own stream, in this variant you still need to provide one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense.
a001296
to
57d1152
Compare
CI MESSAGE: [1402155]: BUILD FAILED |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
57d1152
to
f17c71a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync
for the SetExternalInput seems to solve the problems, I'm not sure how I feel about the name and behaviour of the blocking
ExternalSource parameter, if it won't bring some confusion.
*/ | ||
DLL_PUBLIC void | ||
daliSetExternalInputAsync(daliPipelineHandle *pipe_handle, const char *name, | ||
device_type_t device, const void *data_ptr, | ||
dali_data_type_t data_type, const int64_t *shapes, | ||
int sample_dim, const char *layout_str, | ||
cudaStream_t stream); | ||
cudaStream_t stream, int sync); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense.
include/dali/c_api.h
Outdated
@@ -139,13 +139,14 @@ DLL_PUBLIC void daliDeserializeDefault(daliPipelineHandle *pipe_handle, | |||
* Can be set to NULL. | |||
* @param stream CUDA stream to use when copying the data onto GPU. Remember to synchronize on the | |||
* provided stream. | |||
* @param sync If block until data provided is copied to the internal DALI buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param sync If block until data provided is copied to the internal DALI buffer | |
* @param sync Whether to block until the provided data is copied to the internal DALI buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -148,7 +144,7 @@ class ExternalSource : public Operator<Backend> { | |||
copy_to_storage_event = copy_to_storage_events_.GetEmpty(); | |||
} | |||
|
|||
data.front()->Copy(tl, stream); | |||
data.front()->Copy(t, stream); | |||
if (std::is_same<SrcBackend, GPUBackend>::value) { | |||
cudaEventRecord(*copy_to_storage_event.front(), stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Thanks for adding the comment.
{ | ||
std::unique_lock<std::mutex> busy_lock(busy_m_); | ||
cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();}); | ||
cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to capture the blocking_
as copy instead of reference as it wouldn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
{ | ||
std::unique_lock<std::mutex> busy_lock(busy_m_); | ||
cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();}); | ||
cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] { | ||
return !(data.IsEmpty() && blocking); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also alternatively:
if (blocking) {
cv_.wait(busy_lock, [&data = tl_data_]{return !data.IsEmpty();});
} else {
// we have the lock, fail if there is no data.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1075,7 +1089,7 @@ PYBIND11_MODULE(backend_impl, m) { | |||
.def("SetExternalTLInput", | |||
[](Pipeline *p, const string &name, const TensorList<CPUBackend> &tl, | |||
py::object /*cuda_stream*/) { | |||
p->SetExternalInput(name, tl, 0); | |||
p->SetExternalInput(name, tl, 0, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it is fine to pass a false here as well (or rather it just doesn't matter)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's more aligned with the docs that claim to be blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is CPU so it should sync anyway, so it is better to pass the true to be aligned with what happens under the hood.
cv_.wait(busy_lock, [&data = tl_data_, &blocking = blocking_] { | ||
return !(data.IsEmpty() && blocking); | ||
}); | ||
if (!blocking_ && tl_data_.IsEmpty()) { | ||
DALI_FAIL("No data was provided to the ExternalSource. Make sure to feed it properly."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as for the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I'm open for suggestions regarding the naming. |
47f976c
to
e1efecd
Compare
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
e1efecd
to
d46503b
Compare
!build |
CI MESSAGE: [1402575]: BUILD STARTED |
CI MESSAGE: [1402575]: BUILD PASSED |
- changes introduced by NVIDIA#1997 were not applied to conda based test - ExternalSource jupyter example is extended by GPU case and requires cupy and imageio to run, this PR fixes this Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
- changes introduced by #1997 were not applied to conda based test - ExternalSource jupyter example is extended by GPU case and requires cupy and imageio to run, this PR fixes this Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
adds Python side support for GPU data feed to ExterenlSource operator
extends ExternalSource example
ExternalSource
Python API
backend_impl
NA
new CI tests are added
example is extended
JIRA TASK: [DALI-182]