-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SaltAndPepper GPU operator #2956
Add SaltAndPepper GPU operator #2956
Conversation
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
1824630
to
4d67107
Compare
@@ -223,7 +223,9 @@ def test_external_source(): | |||
fn.hsv, | |||
fn.hue, | |||
fn.jpeg_compression_distortion, | |||
fn.noise.gaussian, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one was missing.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -174,9 +174,15 @@ def test_flip_cpu(): | |||
def test_jpeg_compression_distortion_cpu(): | |||
check_single_input(fn.jpeg_compression_distortion, quality = 10) | |||
|
|||
def test_noise_gaussian_cpu(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one was missing
dali/operators/random/rng_base_gpu.h
Outdated
} | ||
} | ||
return blocks_num; | ||
return nsamples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to return just output.num_samples()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's exactly that. I just put it in a variable to be used in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
auto &samples_cpu = backend_data_.sample_descs_cpu_; | ||
auto &samples_gpu = backend_data_.sample_descs_gpu_; | ||
int nsamples = SetupSampleDescs(samples_cpu.data(), out_view, in_view, channel_dim); | ||
if (nsamples == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can as well put in L132:
if (out_view.num_samples() == 0) {
return;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// For generate once, apply to all channels, the channel dimension will be | ||
// removed, so that each CUDA thread always processes all channels of a pixel. | ||
for (int s = 0; s < nsamples; s++) { | ||
shape_copy.tensor_shape_span(s)[channel_dim] = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this change affect sample_siz
inside SetupBlockDescs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the whole point. In this case we calculate blocks as number of full pixels. SetupBlockDesc is dividing a shape into blocks. Here we remove the channel dimension from the shape copy, calculate blocks, and then launch a different kernel, that uses c_count and c_stride to visit all channels in every point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think id would better to pass the original shape to SetupBlockDescs and just skip channel_dim
when computing the volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
Signed-off-by: Joaquin Anton <janton@nvidia.com>
blockdesc_count = SetupBlockDescs( | ||
blocks_cpu, block_sz, max_nblocks, out_view, in_view); | ||
|
||
// TODO(janton): set layout explicitly from the user for RNG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO is about random generators, not noise generators. We should do it at some point but not in this PR.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -28,33 +28,82 @@ namespace dali { | |||
|
|||
namespace { | |||
|
|||
template <bool value> | |||
using bool_const = std::integral_constant<bool, value>; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am dividing the implementation into four categories (cross product of is_noise_gen and is_per_channel)
can get different noise).
template <typename T, typename Dist> | ||
__device__ __inline__ void Generate(BlockDesc<true> desc, | ||
__device__ __inline__ void Generate(const SampleDesc &sample, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is_noise_gen=True, is_per_channel=True.
Noise generator (applies random noise to an input) that treats channels independently (each channel
auto n = dist.Generate(in[idx], rng); | ||
dist.Apply(out[idx], in[idx], n); | ||
} | ||
} | ||
|
||
template <typename T, typename Dist> | ||
__device__ __inline__ void Generate(BlockDesc<false> desc, | ||
__device__ __inline__ void Generate(const SampleDesc &sample, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is_noise_gen=True, is_per_channel=false.
Noise generator (applies random noise to an input) that treats all channels in a pixel as a whole (same noise is applied to all channels in a pixel)
} | ||
} | ||
|
||
template <typename T, typename Dist> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is_noise_gen=False, is_per_channel=True.
RNG (generates a random number, doesn't depend on any input) that treats channels independently (each channel can get a different value)
} | ||
} | ||
|
||
template <typename T, typename Dist> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is_noise_gen=False, is_per_channel=False.
RNG (generates a random number, doesn't depend on any input) that produces the same number for all channels in a pixel
int sample_idx; | ||
void* output; | ||
size_t size; | ||
struct SampleDesc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Divided into BlockDesc and SampleDesc, because we added some info that doesn't need to be repeated for every block (pixel stride, channel stride, number of channels...)
int64_t sample_sz = volume(sh); | ||
if (channel_dim >= 0) { | ||
int nchannels = sh[channel_dim]; | ||
samples[s].p_count = sample_sz / nchannels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here p_count, p_stride are full "pixels"
samples[s].c_count = nchannels; | ||
samples[s].c_stride = volume(sh.begin() + channel_dim + 1, sh.end()); | ||
} else { | ||
samples[s].p_count = sample_sz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here p_count are total number of elements (channels are flattened).
dali/operators/random/rng_base_gpu.h
Outdated
std::tie(blocks_per_sample, blocks_num) = DistributeBlocksPerSample(shape, block_sz, max_nblocks); | ||
int64_t block = 0; | ||
for (int s = 0; s < shape.size(); s++) { | ||
T *sample_data = static_cast<T *>(output[s].data); | ||
auto sample_size = volume(shape[s]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extract the TensorShape here and doctor the channel dimension here instead of relying on the caller passing a manipulated input shape.
auto sample_size = volume(shape[s]); | |
shape_in_pixels = shape[s]; | |
if (channel_dim >= 0) | |
shape_in_pixels[channel_dim] = 1;l | |
auto sample_size = volume(shape_in_pixels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
RNGKernel<T, Dist, false> | ||
<<<gridDim, blockDim, 0, ws.stream()>>>(blocks_gpu, rngs, dists, blockdesc_count); | ||
} | ||
VALUE_SWITCH(use_default_dist ? 1 : 0, DefaultDist, (false, true), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A random thought: perhaps we should have a BOOL_SWITCH and get rid of this default:
statement nonsense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Let's handle that in a separate PR.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [2383324]: BUILD STARTED |
CI MESSAGE: [2383324]: BUILD FAILED |
!build |
CI MESSAGE: [2385625]: BUILD STARTED |
CI MESSAGE: [2385625]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Extended RNGBase to allow for monochrome noise. Register a GPU SaltAndPepper operator
fn.noise. operators*
GPU RNGBase changes
Existing tests. Added new tests for variable batch size and cpu_only mode
No new documentation needed.
JIRA TASK: [DALI-1970]