New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 #2867
Image Decoder: Unified behavior across backends,Alpha channel support in PNG and JP2, YCbCr support in JP2 #2867
Conversation
} | ||
data.shape = {height, width, image_info.num_components}; | ||
data.req_nchannels = NumberOfChannels(output_image_type_, data.shape[2]); | ||
data.bpp = std::max<int>(data.bpp, comp.precision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeats L460.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bug, it should be inside the for loop
@@ -527,7 +535,7 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl { | |||
#endif | |||
data.shape = {heights[0], widths[0], c}; | |||
data.subsampling = subsampling; | |||
|
|||
data.req_nchannels = NumberOfChannels(output_image_type_, c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be extracted after L556 s the same thing is repeated in L551?
Output g = ConvertNorm<Output>(input[tid + npixels]); | ||
Output b = ConvertNorm<Output>(input[tid + 2 * npixels]); | ||
Output *out = output + 3 * tid; | ||
out[0] = kernels::rgb_to_y<Output>({r, g, b}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to convert before feeding to rgb_to_y, or maybe it should accept any type and only convert the final result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we're looking at a different design problem here: The componentwise functions (rbg_to_y
, rgb_to_cb
) work on unnormalized float
values but this function normalizes values.
ConvertNorm<smaller_type>
applied to the original components will result in loss of precision. The correct action would be to convert to float, convert to cbcr, add half-range to cb-cr (if needed), normalize the range and convertsat<output>
. Note that cbcr component conversion has the nasty hardcoded 128 half range which will break things sooner than later.
auto r = ConvertNorm<Output>(input[tid]); | ||
auto g = ConvertNorm<Output>(input[tid + npixels]); | ||
auto b = ConvertNorm<Output>(input[tid + 2 * npixels]); | ||
output[tid] = kernels::rgb_to_y<Output>({r, g, b}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above.
compare_pipelines(img_decoder_pipe("cpu", out_type=img_out_type, files=files), | ||
img_decoder_pipe("mixed", out_type=img_out_type, files=files), | ||
batch_size=batch_size_test, N_iterations=3, | ||
eps = eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eps = eps) | |
eps=eps) |
("jpeg2k", "db/single/multichannel/with_alpha", 'jp2'), | ||
("jpeg2k", "db/single/16bit", 'jp2'), | ||
("png", "db/single/multichannel/with_alpha", 'png')]: | ||
subdir = '' # In those paths the images are not organized in subdirs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about subdir = None
and adding a way to handle this.
|
||
template <typename T> | ||
__inline__ __device__ T rgb_to_cb(vec<3, T> rgb) { | ||
return ConvertSat<T>(-0.16873589f * rgb.x - 0.33126411f * rgb.y + 0.50000000f * rgb.z + 128.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this + 128.0f
here - it works well for uint8 outputs, but makes no sense for signed T
and it's applicability is dubious with float
output, too. Perhaps we should add half-range of unsigned
T and don't add anything otherwise? It would affect JPEG distortion. We need to discuss this.
- adds a check inside the decoder if the requested image format matches the underlying image - adds check only or JPEG2000 as in case of JPEG only RGB images are supported, for the host fallback DALI always convert to the expected format if requested Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
…g. RGBA) to RGB. Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
b031c07
to
fd6a769
Compare
Signed-off-by: Joaquin Anton <janton@nvidia.com>
fd6a769
to
12084e5
Compare
!build |
CI MESSAGE: [2282818]: BUILD STARTED |
CI MESSAGE: [2282818]: BUILD PASSED |
"component 0 has a shape of {", comp.component_height, ", ", | ||
comp.component_width, "} and component ", c, " has a shape of {", | ||
height, ", ", width, "}")); | ||
data.bpp = std::max<int>(data.bpp, comp.precision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... if precision is defined per component, whereas you get max precision. I wonder what would happen in case of an image with different precision for different components... is that even possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put an enforce for that.
void clear() { | ||
encoded_length = 0; | ||
bpp = 8; | ||
selected_decoder = nullptr; | ||
is_progressive = false; | ||
req_nchannels = -1; | ||
method = DecodeMethod::Host; | ||
subsampling = NVJPEG_CSS_UNKNOWN; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's beyond the scope of this PR, but this method (and the fact that you can't just assign {}
to reset it) is a manifestation of an underlying problem: this object serves two purposes. It is both SampleData and SampleDecoderContext or whatever the name should be. I don't think they should be aggregated this way - perhaps a valid solution would be to rename it to SampleContext and create a subaggregate (SampleData) which would contain the fields that are reset here (+ filename - which should probably (?) be reset here, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. I'll add a TODO here for now. It deserves a separate PR.
@@ -182,7 +184,7 @@ def __init__(self, device, batch_size, num_threads=1, device_id=0): | |||
super(MultichannelPipeline, self).__init__(batch_size, num_threads, device_id) | |||
self.device = device | |||
|
|||
self.reader = ops.readers.File(file_root=multichannel_tiff_root) | |||
self.reader = ops.readers.File(files = multichannel_tiff_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Shouldn't named parameters go without spaces around assignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks/comments, nothing really stopping it.
|
||
namespace dali { | ||
namespace kernels { | ||
namespace jpeg { | ||
|
||
// TODO(janton): Purposedly not using the color space conversion utils here. | ||
// We need better color space conversion utilities and we are risking performance | ||
// using the conversion functions in color_space_conversion_impl.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What they are for in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used for the image decoder color conversion. Those utils can convert between types, so to allow for going from a narrower to a wider type and vice-versa we have to normalize to float first. This introduces extra multiplications that don't really make sense for the JPEG distortion kernel, since we are working with uint8. Making generic and very optimized color conversion utils is beyond the scope of this PR, and should probably be tackled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write this in the comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and unify the color space conversion utils (we had 3 different ones)
DALIDataType pixel_type = sample->bpp == 16 ? DALI_UINT16 : DALI_UINT8; | ||
output_image.pixel_type = sample->bpp == 16 ? NVJPEG2K_UINT16 : NVJPEG2K_UINT8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALIDataType pixel_type = sample->bpp == 16 ? DALI_UINT16 : DALI_UINT8; | |
output_image.pixel_type = sample->bpp == 16 ? NVJPEG2K_UINT16 : NVJPEG2K_UINT8; | |
DALIDataType pixel_type; | |
if (sample->bpp == 16) { | |
pixel_type = DALI_UINT16; | |
output_image.pixel_type = NVJPEG2K_UINT16; | |
} else { | |
pixel_type = DALI_UINT8; | |
output_image.pixel_type = NVJPEG2K_UINT8; | |
} |
just a suggestion
PermuteToInterleaved(output_data, i_out, comp_size, sample->shape[2], nvjpeg2k_cu_stream_); | ||
if (need_processing) { | ||
if (output_image_type_ == DALI_GRAY) { | ||
// Converting to Gray, dropping extra channels if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Converting to Gray, dropping extra channels if needed | |
// Converting to Gray, dropping alpha channels if needed |
?
pipe0 = pipe(device, out_type=out_type, files=files) | ||
pipe0.build() | ||
out0, shape0 = pipe0.run() | ||
if device == 'mixed': | ||
out0 = out0.as_cpu() | ||
out0 = np.array(out0[0]) | ||
shape0 = np.array(shape0[0]) | ||
expected_channels = 4 if out_type == types.ANY_DATA else \ | ||
1 if out_type == types.GRAY else \ | ||
3 | ||
assert out0.shape[2] == expected_channels, \ | ||
f"Expected {expected_channels} but got {out0.shape[2]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipe0 = pipe(device, out_type=out_type, files=files) | |
pipe0.build() | |
out0, shape0 = pipe0.run() | |
if device == 'mixed': | |
out0 = out0.as_cpu() | |
out0 = np.array(out0[0]) | |
shape0 = np.array(shape0[0]) | |
expected_channels = 4 if out_type == types.ANY_DATA else \ | |
1 if out_type == types.GRAY else \ | |
3 | |
assert out0.shape[2] == expected_channels, \ | |
f"Expected {expected_channels} but got {out0.shape[2]}" | |
pipe = pipe(device, out_type=out_type, files=files) | |
pipe.build() | |
out, shape = pipe.run() | |
if device == 'mixed': | |
out = out.as_cpu() | |
out = np.array(out[0]) | |
shape = np.array(shape[0]) | |
expected_channels = 4 if out_type == types.ANY_DATA else \ | |
1 if out_type == types.GRAY else \ | |
3 | |
assert out.shape[2] == expected_channels, \ | |
f"Expected {expected_channels} but got {out.shape[2]}" |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [2286089]: BUILD STARTED |
CI MESSAGE: [2286089]: BUILD FAILED |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [2286307]: BUILD STARTED |
CI MESSAGE: [2286307]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Unified behavior in
mixed
andcpu
versions of ImageDecoder operatorAdded support for alpha channel in PNG, and JP2 files (TIFF already supported)
Added support for YCbCr and BGR in JP2
Added consistency tests
Image Decoder
Changes in the decoders
New tests added
NA
JIRA TASK: [DALI-1948]