-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erase GPU operator #1971
Erase GPU operator #1971
Conversation
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather minor things, otherwise looks ok.
} | ||
|
||
private: | ||
OpSpec spec_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That smells of bad design - base Operator class has this field - maybe we should just rework OpImplBase into something that doesn't require this kind of ugly tricks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to const OpSpec&
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
!build |
dali/kernels/erase/erase_gpu.h
Outdated
@@ -309,7 +310,7 @@ struct do_copy_or_erase { | |||
|
|||
template <int channel_dim = -1, typename T, int ndim = 2> | |||
__global__ void erase_gpu_impl(erase_sample_desc<T, ndim> *samples, ivec<ndim> region_shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__global__ void erase_gpu_impl(erase_sample_desc<T, ndim> *samples, ivec<ndim> region_shape, | |
__global__ void erase_gpu_impl(const erase_sample_desc<T, ndim> *samples, ivec<ndim> region_shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CI MESSAGE: [1337975]: BUILD STARTED |
dali/kernels/erase/erase_gpu.h
Outdated
auto *sample_desc_gpu = ctx.scratchpad->ToGPU(stream, make_span(sample_desc_cpu, num_samples)); | ||
auto* fill_values_gpu = | ||
ctx.scratchpad->ToGPU(stream, make_span(fill_values_cpu, num_fill_values)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better option would be to use ToContiguousGPU
- it will issue just one cudaMemcpy.
It's not in the scope of this task, I guess, however, if we keep some sane limit on number of channels, then the fill value could be copied to a __constant__
- it should improve the performance, since the fill_value won't have to be read from global memory and will not compete for cache with input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used ToContiguousGPU
dali/kernels/erase/erase_gpu.h
Outdated
const T *in = nullptr; | ||
T* out = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const T *in = nullptr; | |
T* out = nullptr; | |
const T *__restrict__ in = nullptr; | |
T *__restrict__ out = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
e28748a
to
d765359
Compare
return {reinterpret_cast<ptrs_t>(tlv.data.data()), new_shape}; | ||
} | ||
|
||
template <int ndim, typename Storage> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this overload is necessary - a non-const TensorListView should convert implicitly to const one.
@@ -126,8 +126,8 @@ std::tuple<std::remove_cv_t<element_t<Collections>>*...> | |||
ToContiguousGPUMem(Scratchpad &scratchpad, cudaStream_t stream, const Collections &... c) { | |||
const size_t N = sizeof...(Collections); | |||
static_assert( | |||
all_of<std::is_pod<std::remove_cv_t<element_t<Collections>>>::value...>::value, | |||
"ToContiguousGPUMem must be used with collections of POD types"); | |||
all_of<std::is_trivially_copyable<std::remove_cv_t<element_t<Collections>>>::value...>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more is_pod in this file - please change it, too.
regions_shape.set_tensor_shape(i, {n_regions, 2, Dims}); | ||
} | ||
TensorList<CPUBackend> regions_cpu; | ||
regions_cpu.set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<int32_t>())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not:
regions_cpu.set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<int32_t>())); | |
regions_cpu.set_type(TypeInfo::Create<ibox<Dims>()); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
TensorList<CPUBackend> regions_cpu; | ||
regions_cpu.set_type(TypeTable::GetTypeInfo(TypeTable::GetTypeID<int32_t>())); | ||
regions_cpu.Resize(regions_shape); | ||
auto regions_tlv = detail::as_boxes<Dims>(view<int32_t, 3>(regions_cpu)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you don't need this as_boxes
thing at all, you can take a view<ibox<Dims>>(regions_cpu)
and place the data in it directly. AFAIK it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Rafal <Banas.Rafal97@gmail.com>
!build |
CI MESSAGE: [1340155]: BUILD STARTED |
CI MESSAGE: [1340155]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
The gpu erase kernel was modified to support different number of erase regions for each sample. The operator implementation is mostly just instantiating the kernel.
GPU erase kernel and a new file with GPU operator.
Instantiating the kernel.
Existing python test was extended to GPU. Kernel test was extended to cover different number of erase regions per sample.
N/A
JIRA TASK: [DALI-1245]