-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CropMirrorNormalize full pad support #2044
Conversation
3f8d9a7
to
a56fe39
Compare
!build |
CI MESSAGE: [1417255]: BUILD STARTED |
ff80241
to
9b4a0a4
Compare
!build |
CI MESSAGE: [1417381]: BUILD STARTED |
CI MESSAGE: [1417381]: BUILD FAILED |
9a53545
to
014832a
Compare
dali/kernels/slice/slice_flip_normalize_permute_pad_cuda_impl.cuh
Outdated
Show resolved
Hide resolved
dali/kernels/slice/slice_flip_normalize_permute_pad_cuda_impl.cuh
Outdated
Show resolved
Hide resolved
Signed-off-by: Joaquin Anton <janton@nvidia.com>
VALUE_SWITCH(need_normalize ? 1 : 0, NeedNormalizeInt, (0, 1), ( | ||
VALUE_SWITCH(has_channels ? 1 : 0, HasChannelsInt, (0, 1), ( | ||
constexpr bool NeedNormalize = static_cast<bool>(NeedNormalizeInt); | ||
constexpr bool HasChannels = static_cast<bool>(HasChannelsInt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh...
for out_of_bounds_policy in ['pad', 'trim_to_shape']: | ||
for device in ['gpu', 'cpu']: | ||
for batch_size in [1, 3]: | ||
for out_layout in ["HWC", "CHW"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for out_layout in ["HWC", "CHW"]: | |
for out_layout in ['HWC', 'CHW']: |
To be consistent with other test parameters.
if (pad_before > 0) { | ||
if (HasChannels && d == channel_dim) { | ||
for (int64_t i = 0; i < pad_before; i++) | ||
*output++ = *fill_values++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add {}
around this line as it is easy to forget that this loop executes only this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sample_desc.need_flip |= processed_args.in_strides[d] < 0; | ||
need_flip |= sample_desc.need_flip; | ||
|
||
// We the last dimension with the previous if: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We < what > last...?
|
||
norm_mul_gpu = context.scratchpad->ToGPU( | ||
context.gpu.stream, make_span(norm_mul_cpu, num_samples * norm_args_size_)); | ||
CUDA_CALL(cudaGetLastError()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is enough to check this only once for the whole kernel run?
!build |
CI MESSAGE: [1431307]: BUILD STARTED |
CI MESSAGE: [1431307]: BUILD FAILED |
5847a50
to
61f17ec
Compare
!build |
CI MESSAGE: [1433732]: BUILD STARTED |
CI MESSAGE: [1433732]: BUILD PASSED |
std::vector<float> mean; | ||
std::vector<float> inv_stddev; | ||
int normalization_dim; | ||
float padding_val = 0.0f; | ||
std::vector<float> fill_values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use SmallVector<float, 4>
here (not sure about un-processed args).
} | ||
sample_desc.effective_ndim = last_dim + 1; | ||
|
||
sample_sizes[i] = volume(processed_args.out_shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_size
vector is not really necessary - it's elements are written once and then read once in the same order - you could calculate the volume directly
size_t remaining = volume(processed_args_[i].out_shape);
(GitHub won't let me put this comment on that line :()
} | ||
|
||
if (pad_channels) { | ||
nchannels = NextPowerOfTwo(nchannels); // modifies args.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use next_pow2
from core/util.h
and remove the redundant implementation.
|
||
def check_cmn_with_out_of_bounds_policy_support(device, batch_size, dtype, input_layout, input_shape, output_layout, | ||
mirror_probability, mean, std, should_pad, | ||
out_of_bounds_policy=None, fill_values=(0x76, 0xb9, 0x00)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💚
dtype = dtype, | ||
output_layout = output_layout, | ||
crop_d = crop_d, | ||
crop_h = crop_h, | ||
crop_w = crop_w, | ||
crop_pos_x = crop_pos_x, | ||
crop_pos_y = crop_pos_y, | ||
crop_pos_z = crop_pos_z, | ||
mean = mean, | ||
std = std, | ||
pad_output = pad_output, | ||
out_of_bounds_policy = out_of_bounds_policy, | ||
fill_values = fill_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: weird indent
VALUE_SWITCH(need_pad ? 1 : 0, NeedPad, (false, true), ( | ||
VALUE_SWITCH(need_flip ? 1 : 0, NeedFlip, (false, true), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: you can put a comment about this conversion to int like:
// Convert switch argument to `int` to avoid compiler warning about unreachable case label
or similar - otherwise someone might try to "fix" it.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [1437164]: BUILD STARTED |
CI MESSAGE: [1437164]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Rewrote SliceFlipNormalizePadPermute CPU and GPU kernels to support padding
Adjusted CropMirrorNormalize to use the new kernels
CropMirrorNormalize CPU and GPU operators, SliceFlipNormalizePadPermute CPU and GPU kernels
Kernels implementation
New tests added
N/A
JIRA TASK: [DALI-1462], [DALI-1464]