-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend external source operator capacity #1127
Conversation
!build |
CI MESSAGE: [834670]: BUILD STARTED |
CI MESSAGE: [834670]: BUILD FAILED |
!build |
CI MESSAGE: [834754]: BUILD STARTED |
CI MESSAGE: [834754]: BUILD FAILED |
!build |
CI MESSAGE: [835080]: BUILD STARTED |
CI MESSAGE: [835080]: BUILD FAILED |
!build |
CI MESSAGE: [835829]: BUILD STARTED |
CI MESSAGE: [835829]: BUILD FAILED |
CI MESSAGE: [836294]: BUILD STARTED |
CI MESSAGE: [836294]: BUILD FAILED |
!build |
CI MESSAGE: [837881]: BUILD FAILED |
CI MESSAGE: [837896]: BUILD STARTED |
CI MESSAGE: [837896]: BUILD FAILED |
!build |
CI MESSAGE: [837981]: BUILD STARTED |
CI MESSAGE: [837981]: BUILD PASSED |
} | ||
cv_.notify_all(); | ||
output.Copy(*data, (ws->has_stream() ? ws->stream() : 0)); | ||
busy_lock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as I wrote before, but not really strong opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
template <typename T> | ||
class CachingList { | ||
public: | ||
CachingList() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CachingList() {} | |
CachingList() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
std::unique_ptr<T> GetEmpty() { | ||
if (!empty_data_.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!empty_data_.size()) { | |
if (empty_data_.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,254 @@ | |||
// Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CI MESSAGE: [848584]: BUILD STARTED |
CI MESSAGE: [848584]: BUILD FAILED |
CI MESSAGE: [848665]: BUILD STARTED |
CI MESSAGE: [848665]: BUILD FAILED |
!build |
CI MESSAGE: [849861]: BUILD STARTED |
CI MESSAGE: [849861]: BUILD FAILED |
if (cuda_event) { | ||
cuda_events_.Recycle(std::move(cuda_event)); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the RecycleHelper more complicated than it should be
Why not:
private:
void RecycleHelper(std::unique_ptr<std::vector<Tensor<CPUBackend>>>> data) {
t_data_.Recycle(std::move(data));
}
void RecycleHelper(std::unique_ptr<TensorList<CPUBackend>> data) {
tl_data_.Recycle(std::move(data));
}
and then in RecycleBuffer you just call
RecycleHelper(std::move(data));
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}; | ||
|
||
template <typename DataType> | ||
std::enable_if_t<std::is_same<DataType, std::unique_ptr<TensorList<CPUBackend>>>::value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment below about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CI MESSAGE: [850380]: BUILD STARTED |
CI MESSAGE: [850380]: BUILD FAILED |
CI MESSAGE: [851002]: BUILD STARTED |
CI MESSAGE: [851002]: BUILD FAILED |
CI MESSAGE: [851002]: BUILD PASSED |
DALI_ENFORCE(!tl_data_.IsEmpty(), "ExternalSource is empty. Need to feed data first."); | ||
tl_data = tl_data_.PopFront(); | ||
DALI_ENFORCE(OperatorBase::batch_size_ == static_cast<int>(tl_data.front()->ntensor()), | ||
"Data list provided to ExternalSource needs to have batch_size length."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing indent
DALI_ENFORCE(!t_data_.IsEmpty(), "ExternalSource is empty. Need to feed data first."); | ||
t_data = t_data_.PopFront(); | ||
DALI_ENFORCE(OperatorBase::batch_size_ == static_cast<int>(t_data.front()->size()), | ||
"Data list provided to ExternalSource needs to have batch_size length."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cv_.notify_all(); | ||
|
||
auto &output = ws->Output<GPUBackend>(0); | ||
output.Copy(*(data.front()), (ws->has_stream() ? ws->stream() : 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have (ws->has_stream() ? ws->stream() : 0)
twice. Consider using a variable instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// HostWorkspace doesn't have any stream | ||
cudaStream_t stream = 0; | ||
if (is_tl_data) { | ||
output.Copy(*(tl_data.front()), data_idx, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this copy the TensorList batch_size_
times? I guess it shouldn't be in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. output
is a tensor, Copy(*(tl_data.front()), data_idx, stream);
extracts the data_idx
tensor from tl_data
and copies to the output
.
I think we don't have a fast way to copy TensorList to a vector of Tensors other than go one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed the data_idx argument. Everything's ok.
return full_data_.empty(); | ||
} | ||
|
||
ListT PopFront() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment describing how the CachingList works and that it always operates on single elements wrapped in a list? I misread the doc for splice and was wandering why you don't handle a case with more than 1 element.
BTW is this really that much better than references + moves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description added. The idea with one element list was provided by @mzient as it saves the allocation for the new element in the list. I don't think it gives us much but on the other hand it doesn't cost much (regarding the code itself) to make it this way either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The small allocations can be painful, esp. if there is a lot of them.
!build |
CI MESSAGE: [859754]: BUILD STARTED |
CI MESSAGE: [859754]: BUILD PASSED |
!build |
CI MESSAGE: [863164]: BUILD STARTED |
CI MESSAGE: [863164]: BUILD FAILED |
- makes external source to be able to hold more than one sample at the time so the user can feed more data ahead before it is consumed by RunCPU/RunGPU Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [863215]: BUILD STARTED |
CI MESSAGE: [863215]: BUILD PASSED |
at the time so the user can feed more data ahead before it is consumed
by RunCPU/RunGPU
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
at the time so the user can feed more data ahead before it is consumed
What happened in this PR?
JIRA TASK: DALI-954