Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a custom color space conversion kernel for all conversions #2907

Merged
merged 10 commits into from
Apr 30, 2021

Conversation

jantonguirao
Copy link
Contributor

@jantonguirao jantonguirao commented Apr 27, 2021

Why we need this PR?

Pick one, remove the rest

  • It fixes a bug in the usage of NPP color conversion functions that do not support in-place processing.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    [Changed usage of NPP color space conversion functions by a custom kernel. The custom kernel was already in use for some of the conversions, this PRs extends it to all conversions.]
  • Affected modules and functionalities:
    [ColorSpaceConversion op, ImageDecoder]
  • Key points relevant for the review:
    color space conversion kernel implementation
  • Validation and testing:
    [NA]
  • Documentation (including examples):
    [NA]

JIRA TASK: [DALI-2003]

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@JanuszL JanuszL self-assigned this Apr 27, 2021
@jantonguirao jantonguirao marked this pull request as draft April 27, 2021 09:47
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao jantonguirao changed the title Use a temporary buffer for NPP color conversion Use a custom color space conversion kernel for all conversions Apr 27, 2021
@@ -100,6 +100,14 @@ void PlanarRGBToGray(Output *output, const Input *input, int64_t npixels,
planar_rgb_to_gray<<<num_blocks, block_size, 0, stream>>>(output, input, npixels);
}

template <typename Output, typename Input>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Adding this wrapper here because calling the kernel directly breaks the build for *.cc compilation units that include the nvjpeg decoupled API header.

@jantonguirao jantonguirao marked this pull request as ready for review April 27, 2021 15:10
Comment on lines 63 to 66
vec<out_pixel_sz, Out> out;
out[0] = gray[0];
out[1] = gray[0];
out[2] = gray[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vec<out_pixel_sz, Out> out;
out[0] = gray[0];
out[1] = gray[0];
out[2] = gray[0];
vec<out_pixel_sz, Out> out(gray[0]);

Will this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, I'll try it out

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2310232]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2310232]: BUILD FAILED

@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2313555]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2313555]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2314580]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2314580]: BUILD FAILED

static constexpr int in_pixel_sz = 1;
static DALI_HOST_DEV vec<out_pixel_sz, Out> convert(vec<in_pixel_sz, In> gray) {
vec<out_pixel_sz, Out> out;
out[0] = ConvertSatNorm<Out>(gray[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you compress the dynamic range to itu_r_bt_601 Y here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.

static constexpr int in_pixel_sz = 3;
static DALI_HOST_DEV vec<out_pixel_sz, Out> convert(vec<in_pixel_sz, In> ycbcr) {
vec<out_pixel_sz, Out> out;
out[0] = ConvertSatNorm<Out>(ycbcr[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise (but reverse).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Comment on lines 105 to 106
constexpr float scale = 1 / 1.164f;
return ConvertSatNorm<Output>(y * scale + 0.0625f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr float scale = 1 / 1.164f;
return ConvertSatNorm<Output>(y * scale + 0.0625f);
constexpr float scale = 0.257f + 0.504f + 0.098f;
return ConvertSatNorm<Output>(y * scale + 0.0625f);

This should be exactly that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 132 to 134
auto r = clamp<uint8_t>(gray + 1.596f * tmp_r, 0, 255);
auto g = clamp<uint8_t>(gray - 0.813f * tmp_r - 0.392f * tmp_b, 0, 255);
auto b = clamp<uint8_t>(gray + 2.017f * tmp_b, 0, 255);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clamp instead of ConvertSat<uint8_t>? Do you wish the values to be truncated instead of rounded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 197 to 199
auto r = clamp<uint8_t>(ycbcr.x + 1.402f * tmp_r, 0, 255);
auto g = clamp<uint8_t>(ycbcr.x - 0.34413629f * tmp_b - 0.71413629f * tmp_r, 0, 255);
auto b = clamp<uint8_t>(ycbcr.x + 1.772f * tmp_b, 0, 255);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, this will truncate instead of rounding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Comment on lines 50 to 52
DALI_ENFORCE(layout == "HWC" || (layout.empty() && output_shape.sample_dim() == 3),
make_string("Unexpected layout: ", layout, " shape: ", output_shape,
". Expected data in HWC layout."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have a video or a volume? We're flattening other dimensions anyway. We're only interested in the the channel being the last one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2319391]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2319391]: BUILD FAILED

Comment on lines 48 to 50
DALI_ENFORCE(
in_layout.empty() || in_layout.find('C') == channel_dim,
make_string("Channel dimension should be the last in the layout. Got ", in_layout));
Copy link
Contributor

@mzient mzient Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layouts are listed in the schema - all of them have trailing channel. Remove the check here or remove the list of layouts from the schema - whichever suits you better.
Also, see the comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this

auto ndim = in_sh.sample_dim();
int nsamples = in_sh.num_samples();
auto in_layout = input.GetLayout();
int channel_dim = ndim - 1;
Copy link
Contributor

@mzient mzient Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the way it's written now doesn't convey the idea very well.
The actual channel dimension is what we find in the layout (if any) and then we should check that it meets the constraints. If we allow planar layouts in the future, we'll simply drop the enforce.

Suggested change
int channel_dim = ndim - 1;
int channel_dim = in_layout.contains('C') ? in_layout.find('C') : ndim - 1;
DALI_ENFORCE(channel_dim == ndim - 1, make_string("Channel dimension should be the last in the layout. Got ", in_layout));

If you insist on keeping the list of supported layouts in the schema, this ENFORCE is always satisfied (and the layout is never empty), so it would be:

Suggested change
int channel_dim = ndim - 1;
int channel_dim = in_layout.find('C');
assert(channel_dim == ndim - 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (the second suggestion)

@@ -618,7 +618,6 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl {
[this, sample, &in, output_data, shape](int tid) {
SampleWorker(sample->sample_idx, sample->file_name, in.size(), tid,
in.data<uint8_t>(), output_data, streams_[tid]);
CacheStore(sample->file_name, output_data, shape, streams_[tid]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape is unused now and Clang build fails due to unused lambda capture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2322214]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2322214]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2322464]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2322464]: BUILD PASSED

@jantonguirao jantonguirao merged commit bf4b465 into NVIDIA:master Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants