CropMirrorNormalize full pad support #2044

jantonguirao · 2020-06-22T15:08:19Z

Why we need this PR?

Pick one, remove the rest

Refactoring SliceFlipNormalizePadPermute kernels and CropMirrorNormalize operator to support slicing out of bounds (same as it was done for Slice/Crop)

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
Rewrote SliceFlipNormalizePadPermute CPU and GPU kernels to support padding
Adjusted CropMirrorNormalize to use the new kernels
Affected modules and functionalities:
CropMirrorNormalize CPU and GPU operators, SliceFlipNormalizePadPermute CPU and GPU kernels
Key points relevant for the review:
Kernels implementation
Validation and testing:
New tests added
Documentation (including examples):
N/A

JIRA TASK: [DALI-1462], [DALI-1464]

dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h

jantonguirao · 2020-06-23T15:53:55Z

!build

dali-automaton · 2020-06-23T16:06:53Z

CI MESSAGE: [1417255]: BUILD STARTED

jantonguirao · 2020-06-23T16:51:58Z

!build

dali-automaton · 2020-06-23T16:55:44Z

CI MESSAGE: [1417381]: BUILD STARTED

dali-automaton · 2020-06-23T18:19:01Z

CI MESSAGE: [1417381]: BUILD FAILED

dali/test/python/test_operator_crop_mirror_normalize.py

dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h

dali/kernels/slice/slice_flip_normalize_permute_pad_cuda_impl.cuh

dali/kernels/slice/slice_flip_normalize_permute_pad_gpu.h

Signed-off-by: Joaquin Anton <janton@nvidia.com>

mzient · 2020-06-29T13:36:02Z

dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h

+  VALUE_SWITCH(need_normalize ? 1 : 0, NeedNormalizeInt, (0, 1), (
+    VALUE_SWITCH(has_channels ? 1 : 0, HasChannelsInt, (0, 1), (
+      constexpr bool NeedNormalize = static_cast<bool>(NeedNormalizeInt);
+      constexpr bool HasChannels = static_cast<bool>(HasChannelsInt);


JanuszL · 2020-06-29T14:01:58Z

dali/test/python/test_operator_crop_mirror_normalize.py

+    for out_of_bounds_policy in ['pad', 'trim_to_shape']:
+        for device in ['gpu', 'cpu']:
+            for batch_size in [1, 3]:
+                for out_layout in ["HWC", "CHW"]:


Suggested change

for out_layout in ["HWC", "CHW"]:

for out_layout in ['HWC', 'CHW']:

To be consistent with other test parameters.

JanuszL · 2020-06-29T14:19:07Z

dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h

+    if (pad_before > 0) {
+      if (HasChannels && d == channel_dim) {
+        for (int64_t i = 0; i < pad_before; i++)
+          *output++ = *fill_values++;


I would add {} around this line as it is easy to forget that this loop executes only this line.

JanuszL · 2020-06-29T14:27:22Z

dali/kernels/slice/slice_flip_normalize_permute_pad_gpu.h

+        sample_desc.need_flip |= processed_args.in_strides[d] < 0;
+      need_flip |= sample_desc.need_flip;
+
+      // We the last dimension with the previous if:


We < what > last...?

JanuszL · 2020-06-29T14:28:40Z

dali/kernels/slice/slice_flip_normalize_permute_pad_gpu.h

+
+      norm_mul_gpu = context.scratchpad->ToGPU(
+          context.gpu.stream, make_span(norm_mul_cpu, num_samples * norm_args_size_));
+      CUDA_CALL(cudaGetLastError());


Maybe it is enough to check this only once for the whole kernel run?

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2020-06-29T16:36:06Z

!build

dali-automaton · 2020-06-29T16:40:30Z

CI MESSAGE: [1431307]: BUILD STARTED

dali-automaton · 2020-06-29T17:57:55Z

CI MESSAGE: [1431307]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2020-06-30T12:12:17Z

!build

dali-automaton · 2020-06-30T12:16:02Z

CI MESSAGE: [1433732]: BUILD STARTED

dali-automaton · 2020-06-30T14:15:13Z

CI MESSAGE: [1433732]: BUILD PASSED

mzient · 2020-07-01T06:41:06Z

dali/kernels/slice/slice_flip_normalize_permute_pad_common.h

  std::vector<float> mean;
  std::vector<float> inv_stddev;
-  int normalization_dim;
-  float padding_val = 0.0f;
+  std::vector<float> fill_values;


You can use SmallVector<float, 4> here (not sure about un-processed args).

mzient · 2020-07-01T06:50:23Z

dali/kernels/slice/slice_flip_normalize_permute_pad_gpu.h

+      }
+      sample_desc.effective_ndim = last_dim + 1;
+
+      sample_sizes[i] = volume(processed_args.out_shape);


sample_size vector is not really necessary - it's elements are written once and then read once in the same order - you could calculate the volume directly

size_t remaining = volume(processed_args_[i].out_shape);

(GitHub won't let me put this comment on that line :()

mzient · 2020-07-01T08:58:03Z

dali/operators/image/crop/crop_mirror_normalize.h

+  }
+
+  if (pad_channels) {
+    nchannels = NextPowerOfTwo(nchannels);  // modifies args.shape


use next_pow2 from core/util.h and remove the redundant implementation.

mzient · 2020-07-01T09:06:47Z

dali/test/python/test_operator_crop_mirror_normalize.py

+
+def check_cmn_with_out_of_bounds_policy_support(device, batch_size, dtype, input_layout, input_shape, output_layout, 
+                                                mirror_probability, mean, std, should_pad,
+                                                out_of_bounds_policy=None, fill_values=(0x76, 0xb9, 0x00)):


mzient · 2020-07-01T09:14:07Z

dali/test/python/test_operator_crop_mirror_normalize.py

+                                            dtype = dtype,
+                                            output_layout = output_layout,
+                                            crop_d = crop_d,
+                                            crop_h = crop_h,
+                                            crop_w = crop_w,
+                                            crop_pos_x = crop_pos_x,
+                                            crop_pos_y = crop_pos_y,
+                                            crop_pos_z = crop_pos_z,
+                                            mean = mean,
+                                            std = std,
+                                            pad_output = pad_output,
+                                            out_of_bounds_policy = out_of_bounds_policy,
+                                            fill_values = fill_values)


nitpick: weird indent

mzient · 2020-07-01T11:09:39Z

dali/kernels/slice/slice_flip_normalize_permute_pad_gpu.h

+    VALUE_SWITCH(need_pad ? 1 : 0, NeedPad, (false, true), (
+      VALUE_SWITCH(need_flip ? 1 : 0, NeedFlip, (false, true), (


Nitpick: you can put a comment about this conversion to int like:
// Convert switch argument to `int` to avoid compiler warning about unreachable case label
or similar - otherwise someone might try to "fix" it.

Signed-off-by: Joaquin Anton <janton@nvidia.com>

jantonguirao · 2020-07-01T15:44:48Z

!build

dali-automaton · 2020-07-01T15:57:51Z

CI MESSAGE: [1437164]: BUILD STARTED

dali-automaton · 2020-07-01T17:57:05Z

CI MESSAGE: [1437164]: BUILD PASSED

jantonguirao force-pushed the cmn_pad_support branch from 3f8d9a7 to a56fe39 Compare June 22, 2020 15:28

jantonguirao commented Jun 22, 2020

View reviewed changes

dali/kernels/slice/slice_flip_normalize_permute_pad_cpu.h Outdated Show resolved Hide resolved

jantonguirao force-pushed the cmn_pad_support branch from ff80241 to 9b4a0a4 Compare June 23, 2020 16:50

jantonguirao force-pushed the cmn_pad_support branch 6 times, most recently from 9a53545 to 014832a Compare June 25, 2020 11:07

jantonguirao changed the title ~~[WIP][DO NOT REVIEW] CropMirrorNormalize full pad support~~ CropMirrorNormalize full pad support Jun 25, 2020

jantonguirao requested a review from a team June 25, 2020 13:43