Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 #2867

Merged
merged 14 commits into from Apr 20, 2021

Conversation

jantonguirao
Copy link
Contributor

Why we need this PR?

Pick one, remove the rest

  • Refactoring to improve consistency of Image Decoder behavior across backends.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    Unified behavior in mixed and cpu versions of ImageDecoder operator
    Added support for alpha channel in PNG, and JP2 files (TIFF already supported)
    Added support for YCbCr and BGR in JP2
    Added consistency tests
  • Affected modules and functionalities:
    Image Decoder
  • Key points relevant for the review:
    Changes in the decoders
  • Validation and testing:
    New tests added
  • Documentation (including examples):
    NA

JIRA TASK: [DALI-1948]

DALI_EXTRA_VERSION Outdated Show resolved Hide resolved
@jantonguirao jantonguirao changed the title Image decoder consistency Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 Apr 15, 2021
}
data.shape = {height, width, image_info.num_components};
data.req_nchannels = NumberOfChannels(output_image_type_, data.shape[2]);
data.bpp = std::max<int>(data.bpp, comp.precision);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeats L460.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bug, it should be inside the for loop

@@ -527,7 +535,7 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl {
#endif
data.shape = {heights[0], widths[0], c};
data.subsampling = subsampling;

data.req_nchannels = NumberOfChannels(output_image_type_, c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be extracted after L556 s the same thing is repeated in L551?

Output g = ConvertNorm<Output>(input[tid + npixels]);
Output b = ConvertNorm<Output>(input[tid + 2 * npixels]);
Output *out = output + 3 * tid;
out[0] = kernels::rgb_to_y<Output>({r, g, b});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to convert before feeding to rgb_to_y, or maybe it should accept any type and only convert the final result?

Copy link
Contributor

@mzient mzient Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we're looking at a different design problem here: The componentwise functions (rbg_to_y, rgb_to_cb) work on unnormalized float values but this function normalizes values.
ConvertNorm<smaller_type> applied to the original components will result in loss of precision. The correct action would be to convert to float, convert to cbcr, add half-range to cb-cr (if needed), normalize the range and convertsat<output>. Note that cbcr component conversion has the nasty hardcoded 128 half range which will break things sooner than later.

auto r = ConvertNorm<Output>(input[tid]);
auto g = ConvertNorm<Output>(input[tid + npixels]);
auto b = ConvertNorm<Output>(input[tid + 2 * npixels]);
output[tid] = kernels::rgb_to_y<Output>({r, g, b});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as above.

compare_pipelines(img_decoder_pipe("cpu", out_type=img_out_type, files=files),
img_decoder_pipe("mixed", out_type=img_out_type, files=files),
batch_size=batch_size_test, N_iterations=3,
eps = eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eps = eps)
eps=eps)

("jpeg2k", "db/single/multichannel/with_alpha", 'jp2'),
("jpeg2k", "db/single/16bit", 'jp2'),
("png", "db/single/multichannel/with_alpha", 'png')]:
subdir = '' # In those paths the images are not organized in subdirs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about subdir = None and adding a way to handle this.


template <typename T>
__inline__ __device__ T rgb_to_cb(vec<3, T> rgb) {
return ConvertSat<T>(-0.16873589f * rgb.x - 0.33126411f * rgb.y + 0.50000000f * rgb.z + 128.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this + 128.0f here - it works well for uint8 outputs, but makes no sense for signed T and it's applicability is dubious with float output, too. Perhaps we should add half-range of unsigned T and don't add anything otherwise? It would affect JPEG distortion. We need to discuss this.

JanuszL and others added 11 commits April 19, 2021 11:50
- adds a check inside the decoder if the requested image format matches
  the underlying image
- adds check only or JPEG2000 as in case of JPEG only RGB images are supported,
  for the host fallback DALI always convert to the expected format if requested

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
…g. RGBA) to RGB.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao jantonguirao force-pushed the image_decoder_consistency branch 3 times, most recently from b031c07 to fd6a769 Compare April 19, 2021 15:19
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2282818]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2282818]: BUILD PASSED

"component 0 has a shape of {", comp.component_height, ", ",
comp.component_width, "} and component ", c, " has a shape of {",
height, ", ", width, "}"));
data.bpp = std::max<int>(data.bpp, comp.precision);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... if precision is defined per component, whereas you get max precision. I wonder what would happen in case of an image with different precision for different components... is that even possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put an enforce for that.

Comment on lines 339 to 347
void clear() {
encoded_length = 0;
bpp = 8;
selected_decoder = nullptr;
is_progressive = false;
req_nchannels = -1;
method = DecodeMethod::Host;
subsampling = NVJPEG_CSS_UNKNOWN;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's beyond the scope of this PR, but this method (and the fact that you can't just assign {} to reset it) is a manifestation of an underlying problem: this object serves two purposes. It is both SampleData and SampleDecoderContext or whatever the name should be. I don't think they should be aggregated this way - perhaps a valid solution would be to rename it to SampleContext and create a subaggregate (SampleData) which would contain the fields that are reset here (+ filename - which should probably (?) be reset here, too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. I'll add a TODO here for now. It deserves a separate PR.

@@ -182,7 +184,7 @@ def __init__(self, device, batch_size, num_threads=1, device_id=0):
super(MultichannelPipeline, self).__init__(batch_size, num_threads, device_id)
self.device = device

self.reader = ops.readers.File(file_root=multichannel_tiff_root)
self.reader = ops.readers.File(files = multichannel_tiff_files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Shouldn't named parameters go without spaces around assignment?

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpicks/comments, nothing really stopping it.


namespace dali {
namespace kernels {
namespace jpeg {

// TODO(janton): Purposedly not using the color space conversion utils here.
// We need better color space conversion utilities and we are risking performance
// using the conversion functions in color_space_conversion_impl.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What they are for in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are used for the image decoder color conversion. Those utils can convert between types, so to allow for going from a narrower to a wider type and vice-versa we have to normalize to float first. This introduces extra multiplications that don't really make sense for the JPEG distortion kernel, since we are working with uint8. Making generic and very optimized color conversion utils is beyond the scope of this PR, and should probably be tackled separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write this in the comment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and unify the color space conversion utils (we had 3 different ones)

Comment on lines 651 to 652
DALIDataType pixel_type = sample->bpp == 16 ? DALI_UINT16 : DALI_UINT8;
output_image.pixel_type = sample->bpp == 16 ? NVJPEG2K_UINT16 : NVJPEG2K_UINT8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALIDataType pixel_type = sample->bpp == 16 ? DALI_UINT16 : DALI_UINT8;
output_image.pixel_type = sample->bpp == 16 ? NVJPEG2K_UINT16 : NVJPEG2K_UINT8;
DALIDataType pixel_type;
if (sample->bpp == 16) {
pixel_type = DALI_UINT16;
output_image.pixel_type = NVJPEG2K_UINT16;
} else {
pixel_type = DALI_UINT8;
output_image.pixel_type = NVJPEG2K_UINT8;
}

just a suggestion

PermuteToInterleaved(output_data, i_out, comp_size, sample->shape[2], nvjpeg2k_cu_stream_);
if (need_processing) {
if (output_image_type_ == DALI_GRAY) {
// Converting to Gray, dropping extra channels if needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Converting to Gray, dropping extra channels if needed
// Converting to Gray, dropping alpha channels if needed

?

Comment on lines 178 to 189
pipe0 = pipe(device, out_type=out_type, files=files)
pipe0.build()
out0, shape0 = pipe0.run()
if device == 'mixed':
out0 = out0.as_cpu()
out0 = np.array(out0[0])
shape0 = np.array(shape0[0])
expected_channels = 4 if out_type == types.ANY_DATA else \
1 if out_type == types.GRAY else \
3
assert out0.shape[2] == expected_channels, \
f"Expected {expected_channels} but got {out0.shape[2]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe0 = pipe(device, out_type=out_type, files=files)
pipe0.build()
out0, shape0 = pipe0.run()
if device == 'mixed':
out0 = out0.as_cpu()
out0 = np.array(out0[0])
shape0 = np.array(shape0[0])
expected_channels = 4 if out_type == types.ANY_DATA else \
1 if out_type == types.GRAY else \
3
assert out0.shape[2] == expected_channels, \
f"Expected {expected_channels} but got {out0.shape[2]}"
pipe = pipe(device, out_type=out_type, files=files)
pipe.build()
out, shape = pipe.run()
if device == 'mixed':
out = out.as_cpu()
out = np.array(out[0])
shape = np.array(shape[0])
expected_channels = 4 if out_type == types.ANY_DATA else \
1 if out_type == types.GRAY else \
3
assert out.shape[2] == expected_channels, \
f"Expected {expected_channels} but got {out.shape[2]}"

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2286089]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2286089]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2286307]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2286307]: BUILD PASSED

@jantonguirao jantonguirao merged commit 94ccc7a into NVIDIA:master Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants