Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SaltAndPepper GPU operator #2956

Merged
merged 20 commits into from
May 19, 2021

Conversation

jantonguirao
Copy link
Contributor

Why we need this PR?

Pick one, remove the rest

  • It adds new feature needed because to generate salt and pepper noise on the GPU

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    Extended RNGBase to allow for monochrome noise. Register a GPU SaltAndPepper operator
  • Affected modules and functionalities:
    fn.noise. operators*
  • Key points relevant for the review:
    GPU RNGBase changes
  • Validation and testing:
    Existing tests. Added new tests for variable batch size and cpu_only mode
  • Documentation (including examples):
    No new documentation needed.

JIRA TASK: [DALI-1970]

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@JanuszL JanuszL self-assigned this May 13, 2021
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -223,7 +223,9 @@ def test_external_source():
fn.hsv,
fn.hue,
fn.jpeg_compression_distortion,
fn.noise.gaussian,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was missing.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -174,9 +174,15 @@ def test_flip_cpu():
def test_jpeg_compression_distortion_cpu():
check_single_input(fn.jpeg_compression_distortion, quality = 10)

def test_noise_gaussian_cpu():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was missing

}
}
return blocks_num;
return nsamples;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to return just output.num_samples()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's exactly that. I just put it in a variable to be used in the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

auto &samples_cpu = backend_data_.sample_descs_cpu_;
auto &samples_gpu = backend_data_.sample_descs_gpu_;
int nsamples = SetupSampleDescs(samples_cpu.data(), out_view, in_view, channel_dim);
if (nsamples == 0) {
Copy link
Contributor

@JanuszL JanuszL May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can as well put in L132:

if (out_view.num_samples() == 0) {
  return;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// For generate once, apply to all channels, the channel dimension will be
// removed, so that each CUDA thread always processes all channels of a pixel.
for (int s = 0; s < nsamples; s++) {
shape_copy.tensor_shape_span(s)[channel_dim] = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this change affect sample_siz inside SetupBlockDescs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the whole point. In this case we calculate blocks as number of full pixels. SetupBlockDesc is dividing a shape into blocks. Here we remove the channel dimension from the shape copy, calculate blocks, and then launch a different kernel, that uses c_count and c_stride to visit all channels in every point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think id would better to pass the original shape to SetupBlockDescs and just skip channel_dim when computing the volume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Signed-off-by: Joaquin Anton <janton@nvidia.com>
blockdesc_count = SetupBlockDescs(
blocks_cpu, block_sz, max_nblocks, out_view, in_view);

// TODO(janton): set layout explicitly from the user for RNG
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO is about random generators, not noise generators. We should do it at some point but not in this PR.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -28,33 +28,82 @@ namespace dali {

namespace {

template <bool value>
using bool_const = std::integral_constant<bool, value>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am dividing the implementation into four categories (cross product of is_noise_gen and is_per_channel)
can get different noise).

template <typename T, typename Dist>
__device__ __inline__ void Generate(BlockDesc<true> desc,
__device__ __inline__ void Generate(const SampleDesc &sample,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is_noise_gen=True, is_per_channel=True.
    Noise generator (applies random noise to an input) that treats channels independently (each channel

auto n = dist.Generate(in[idx], rng);
dist.Apply(out[idx], in[idx], n);
}
}

template <typename T, typename Dist>
__device__ __inline__ void Generate(BlockDesc<false> desc,
__device__ __inline__ void Generate(const SampleDesc &sample,
Copy link
Contributor Author

@jantonguirao jantonguirao May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is_noise_gen=True, is_per_channel=false.
    Noise generator (applies random noise to an input) that treats all channels in a pixel as a whole (same noise is applied to all channels in a pixel)

}
}

template <typename T, typename Dist>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is_noise_gen=False, is_per_channel=True.
    RNG (generates a random number, doesn't depend on any input) that treats channels independently (each channel can get a different value)

}
}

template <typename T, typename Dist>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is_noise_gen=False, is_per_channel=False.
    RNG (generates a random number, doesn't depend on any input) that produces the same number for all channels in a pixel

int sample_idx;
void* output;
size_t size;
struct SampleDesc {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Divided into BlockDesc and SampleDesc, because we added some info that doesn't need to be repeated for every block (pixel stride, channel stride, number of channels...)

int64_t sample_sz = volume(sh);
if (channel_dim >= 0) {
int nchannels = sh[channel_dim];
samples[s].p_count = sample_sz / nchannels;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here p_count, p_stride are full "pixels"

samples[s].c_count = nchannels;
samples[s].c_stride = volume(sh.begin() + channel_dim + 1, sh.end());
} else {
samples[s].p_count = sample_sz;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here p_count are total number of elements (channels are flattened).

@jantonguirao jantonguirao assigned mzient and unassigned mzient and szalpal May 17, 2021
std::tie(blocks_per_sample, blocks_num) = DistributeBlocksPerSample(shape, block_sz, max_nblocks);
int64_t block = 0;
for (int s = 0; s < shape.size(); s++) {
T *sample_data = static_cast<T *>(output[s].data);
auto sample_size = volume(shape[s]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extract the TensorShape here and doctor the channel dimension here instead of relying on the caller passing a manipulated input shape.

Suggested change
auto sample_size = volume(shape[s]);
shape_in_pixels = shape[s];
if (channel_dim >= 0)
shape_in_pixels[channel_dim] = 1;l
auto sample_size = volume(shape_in_pixels);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

RNGKernel<T, Dist, false>
<<<gridDim, blockDim, 0, ws.stream()>>>(blocks_gpu, rngs, dists, blockdesc_count);
}
VALUE_SWITCH(use_default_dist ? 1 : 0, DefaultDist, (false, true), (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A random thought: perhaps we should have a BOOL_SWITCH and get rid of this default: statement nonsense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let's handle that in a separate PR.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2383324]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2383324]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2385625]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2385625]: BUILD PASSED

@jantonguirao jantonguirao merged commit ccdbb03 into NVIDIA:master May 19, 2021
@JanuszL JanuszL mentioned this pull request May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants