-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a custom color space conversion kernel for all conversions #2907
Conversation
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
dali/kernels/imgproc/color_manipulation/color_space_conversion_kernel.cuh
Outdated
Show resolved
Hide resolved
@@ -100,6 +100,14 @@ void PlanarRGBToGray(Output *output, const Input *input, int64_t npixels, | |||
planar_rgb_to_gray<<<num_blocks, block_size, 0, stream>>>(output, input, npixels); | |||
} | |||
|
|||
template <typename Output, typename Input> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Adding this wrapper here because calling the kernel directly breaks the build for *.cc compilation units that include the nvjpeg decoupled API header.
vec<out_pixel_sz, Out> out; | ||
out[0] = gray[0]; | ||
out[1] = gray[0]; | ||
out[2] = gray[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vec<out_pixel_sz, Out> out; | |
out[0] = gray[0]; | |
out[1] = gray[0]; | |
out[2] = gray[0]; | |
vec<out_pixel_sz, Out> out(gray[0]); |
Will this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes, I'll try it out
dali/kernels/imgproc/color_manipulation/color_space_conversion_kernel.cuh
Outdated
Show resolved
Hide resolved
Signed-off-by: Joaquin Anton <janton@nvidia.com>
3ec8a36
to
2173d71
Compare
!build |
CI MESSAGE: [2310232]: BUILD STARTED |
CI MESSAGE: [2310232]: BUILD FAILED |
!build |
CI MESSAGE: [2313555]: BUILD STARTED |
CI MESSAGE: [2313555]: BUILD FAILED |
f8005e5
to
bf0b690
Compare
!build |
CI MESSAGE: [2314580]: BUILD STARTED |
CI MESSAGE: [2314580]: BUILD FAILED |
static constexpr int in_pixel_sz = 1; | ||
static DALI_HOST_DEV vec<out_pixel_sz, Out> convert(vec<in_pixel_sz, In> gray) { | ||
vec<out_pixel_sz, Out> out; | ||
out[0] = ConvertSatNorm<Out>(gray[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you compress the dynamic range to itu_r_bt_601
Y here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point.
static constexpr int in_pixel_sz = 3; | ||
static DALI_HOST_DEV vec<out_pixel_sz, Out> convert(vec<in_pixel_sz, In> ycbcr) { | ||
vec<out_pixel_sz, Out> out; | ||
out[0] = ConvertSatNorm<Out>(ycbcr[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise (but reverse).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
Signed-off-by: Joaquin Anton <janton@nvidia.com>
constexpr float scale = 1 / 1.164f; | ||
return ConvertSatNorm<Output>(y * scale + 0.0625f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr float scale = 1 / 1.164f; | |
return ConvertSatNorm<Output>(y * scale + 0.0625f); | |
constexpr float scale = 0.257f + 0.504f + 0.098f; | |
return ConvertSatNorm<Output>(y * scale + 0.0625f); |
This should be exactly that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto r = clamp<uint8_t>(gray + 1.596f * tmp_r, 0, 255); | ||
auto g = clamp<uint8_t>(gray - 0.813f * tmp_r - 0.392f * tmp_b, 0, 255); | ||
auto b = clamp<uint8_t>(gray + 2.017f * tmp_b, 0, 255); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clamp instead of ConvertSat<uint8_t>
? Do you wish the values to be truncated instead of rounded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
auto r = clamp<uint8_t>(ycbcr.x + 1.402f * tmp_r, 0, 255); | ||
auto g = clamp<uint8_t>(ycbcr.x - 0.34413629f * tmp_b - 0.71413629f * tmp_r, 0, 255); | ||
auto b = clamp<uint8_t>(ycbcr.x + 1.772f * tmp_b, 0, 255); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, this will truncate instead of rounding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: Joaquin Anton <janton@nvidia.com>
DALI_ENFORCE(layout == "HWC" || (layout.empty() && output_shape.sample_dim() == 3), | ||
make_string("Unexpected layout: ", layout, " shape: ", output_shape, | ||
". Expected data in HWC layout.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have a video or a volume? We're flattening other dimensions anyway. We're only interested in the the channel being the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [2319391]: BUILD STARTED |
CI MESSAGE: [2319391]: BUILD FAILED |
DALI_ENFORCE( | ||
in_layout.empty() || in_layout.find('C') == channel_dim, | ||
make_string("Channel dimension should be the last in the layout. Got ", in_layout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layouts are listed in the schema - all of them have trailing channel. Remove the check here or remove the list of layouts from the schema - whichever suits you better.
Also, see the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this
auto ndim = in_sh.sample_dim(); | ||
int nsamples = in_sh.num_samples(); | ||
auto in_layout = input.GetLayout(); | ||
int channel_dim = ndim - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the way it's written now doesn't convey the idea very well.
The actual channel dimension is what we find in the layout (if any) and then we should check that it meets the constraints. If we allow planar layouts in the future, we'll simply drop the enforce.
int channel_dim = ndim - 1; | |
int channel_dim = in_layout.contains('C') ? in_layout.find('C') : ndim - 1; | |
DALI_ENFORCE(channel_dim == ndim - 1, make_string("Channel dimension should be the last in the layout. Got ", in_layout)); |
If you insist on keeping the list of supported layouts in the schema, this ENFORCE is always satisfied (and the layout is never empty), so it would be:
int channel_dim = ndim - 1; | |
int channel_dim = in_layout.find('C'); | |
assert(channel_dim == ndim - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do (the second suggestion)
@@ -618,7 +618,6 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl { | |||
[this, sample, &in, output_data, shape](int tid) { | |||
SampleWorker(sample->sample_idx, sample->file_name, in.size(), tid, | |||
in.data<uint8_t>(), output_data, streams_[tid]); | |||
CacheStore(sample->file_name, output_data, shape, streams_[tid]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shape
is unused now and Clang build fails due to unused lambda capture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
!build |
CI MESSAGE: [2322214]: BUILD STARTED |
CI MESSAGE: [2322214]: BUILD FAILED |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
c237fe6
to
cf11ebd
Compare
!build |
CI MESSAGE: [2322464]: BUILD STARTED |
CI MESSAGE: [2322464]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
[Changed usage of NPP color space conversion functions by a custom kernel. The custom kernel was already in use for some of the conversions, this PRs extends it to all conversions.]
[ColorSpaceConversion op, ImageDecoder]
color space conversion kernel implementation
[NA]
[NA]
JIRA TASK: [DALI-2003]