-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imgcodec module boilerplate (interfaces/placeholders/basic logic) #4029
Conversation
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [5221126]: BUILD STARTED |
CI MESSAGE: [5221126]: BUILD FAILED |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
!build |
CI MESSAGE: [5221299]: BUILD STARTED |
CI MESSAGE: [5221299]: BUILD PASSED |
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -0,0 +1,42 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste; nothing to review here.
@@ -0,0 +1,98 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important file
include/dali/imgcodec/image_codec.h
Outdated
@@ -0,0 +1,153 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important file
@@ -0,0 +1,94 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important file
Signed-off-by: Joaquin Anton <janton@nvidia.com>
"dali_operator_test.bin" \ | ||
"dali_imgcodec_test.bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was not added to Xavier tests? Was it intentional for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that one by accident. Good catch
std::vector<ImageCodec*> codec_ptrs_; | ||
}; | ||
|
||
class DLL_PUBLIC ImageFormatRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I understand this is sort of an "entry point" to this API. Can you share a bit on how this will be used by the operators? If there are any adapters or additional layers of abstraction necessary to use it in the operator maybe missing functionality should be included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly the entry point - there must be something on a higher level, but we're still considering whether that something should be a part of this library or left to the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant entry point to the part that is in this PR.
I am asking about this because I am a bit afraid that this design is incomplete with regards on how it will be integrated with DALI. This will probably cause some, maybe significant changes, when this integration eventually comes.
I think it would be good to have that figured out before we spend some significant effort into implementing it for particular formats.
} orientation; | ||
}; | ||
|
||
class ImageParser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this interface is supposed to be implemented by others I think it requires documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/dali/imgcodec/image_codec.h
Outdated
} | ||
}; | ||
|
||
class DLL_PUBLIC ImageCodec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed I think using the word "codec" in this context might be misleading. As far as I understand this is something that can give you an object that can be used to decode images encoded with some particular method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename to Decoder
include/dali/imgcodec/image_format.h
Outdated
* @param image | ||
* @return ImageFormat* | ||
*/ | ||
ImageFormat* GetImageFormat(ImageSource *image) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, we get ImageSource
from the reader, then we look here for matching ImageFormat
. We do so by using CanParse
method of a parser associated with that ImageFormat
. Is this correct?
And this is done, because ImageSource
is basically raw data with some small additions.
I think that more often than not, we know the format of a file based on its extension or some other external information (like DL dataset description). Having this, there is no need to dynamically check, which parser matches the ImageSource
and by extension what is the format.
I get that in general, maybe we do not now the format in the first place and we need this but I think that based on our use cases this is rather an exceptional situation. Is this assumption correct?
Are there any plans to include something like that? Or maybe parsers can depend on ImageSource.SourceInfo
for some additional context.
I am asking because I am a bit afraid that it will not be easy to write those CanParse
functions and this method of finding a format relies on them. Especially with many formats and matching format being at the end of the list, we might see some performance penalties, if the CanParse
methods are slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be basically Cut/Paste (with slight modifications) from functions like CheckIsJPEG
, CheckIsPNG
, etc which we already have. The biggest difference is that the new functions will work with ImageSource rather than raw memory.
Especially with many formats and matching format being at the end of the list, we might see some performance penalties, if the CanParse methods are slow.
We already do that - and sometimes more than once, as is the case of peek_image_shape
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the question is, do we want to replicate this approach or not?
Since this whole thing is being redone from scratch this is the time to question assumptions made before, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding CanParse. You need to check if the header of the file is the one you are looking for (reading the first few bytes). I don't think there's much point in "knowing" what the file format is, because you can have a cat.jpg
that is in fact a TIFF (we do have those in ImageNet). It's not until you read the header of the file that you know the format of the file.
I get that your concern is that most of the time, when a file is called "cat.jpg" is in fact a JPEG, and we could use that as a hint to try the JPEG parser first. I am not sure this is justified, when we are basically just checking one or two bytes in the header. It's equivalent to checking few bytes in the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
I agree that generally there is not much overhead to check this. Maybe I'm just a bit skewed by the issues from video space where the only 100% solid way to check something is to decode it :)
include/dali/imgcodec/image_codec.h
Outdated
} | ||
}; | ||
|
||
class DLL_PUBLIC ImageCodec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how these will be created and kept during the pipeline or program lifetime? Is there any registry or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see ImageFormatRegistry. You register formats and you register codecs for those formats. The ImageCodec
is used as a factory to create ImageCodecInstance
. There will likely be like a codec instance cache on a higher level abstraction that will be used by the ops. This will be part of the next PR.
// Note: We will have a single format registry for DALI
ImageFormatRegistry registry;
ImageFormat jpeg("jpeg", std::make_shared<JpegParser>());
jpeg.RegisterCodec(std::make_shared<JpegCodec>(), 0.0f);
// Inside a particular OP. We might want to abstract the logic of the cache to a separate entity (next PR)
auto format = registry.GetImageFormat(image_source);
for (auto codec : format.Codecs()) {
if (...) { // for example limiting to host mem codecs
if (codec instance in cache) {
// use codec instance
} else {
codec_instance = codec.Create(...)
// cache codec instance
// use codec instance
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was asking about where ImageCodec
comes from.
/** | ||
* @brief A source of data from image parsers and codecs | ||
*/ | ||
class DLL_PUBLIC ImageSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit how ImageSource
will be created? By the readers? Or output from the readers will be wrapped in the decoder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least initially the decoder will wrap the memory into an ImageSource
by calling FromHostMem
. In case of fused reader-decoders, we will likely employ other methods (most likely FromFilename
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are created by those From*
functions.
A decoder op will do FromHostMem(tensor_data, tensor_vol, tensor_source_info)
A reader op will do FromFilename(image_filename)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean regular reader will do FromFilename
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a future fused reader that produces decoded images, as we are planning to have (for partial reading purposes)
Signed-off-by: Joaquin Anton <janton@nvidia.com>
3158a75
to
e550226
Compare
Signed-off-by: Joaquin Anton <janton@nvidia.com>
81f5318
to
7d09862
Compare
include/dali/imgcodec/image_format.h
Outdated
* @param image | ||
* @return ImageFormat* | ||
*/ | ||
ImageFormat* GetImageFormat(ImageSource *image) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageFormat* GetImageFormat(ImageSource *image) const; | |
const ImageFormat* GetImageFormat(ImageSource *image) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/dali/imgcodec/image_codec.h
Outdated
@@ -0,0 +1,153 @@ | |||
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should filename be changed to image_decoder.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/dali/imgcodec/image_codec.h
Outdated
/** | ||
* @brief Creates an instance of a codec | ||
* | ||
* Note: For codecs that carry no state, this may just increase reference count on a singleton. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with the naming ImageDecoder
instead of ImageCodec
, the name codec no longer has any meaning.
* Note: For codecs that carry no state, this may just increase reference count on a singleton. | |
* Note: For decoders that carry no state, this may just increase reference count on a singleton. |
You might want to grep for the word codec in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
include/dali/imgcodec/image_source.h
Outdated
/** | ||
* @brief Creates an image source from data in device memory | ||
*/ | ||
static ImageSource FromDeviceMem(const void *mem, size_t size, std::string source_info = ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it is not supported now, but out of curiosity: do we want to pass the dev mem without some synchronization context or is it just the ops job and we don't want it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an excellent suggestion. I think we could use AccessOrder here.
Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
24f78dc
to
67f2d10
Compare
dali/imgcodec/parsers/bmp.h
Outdated
@@ -0,0 +1,33 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line above license header?
!build |
CI MESSAGE: [5260824]: BUILD STARTED |
CI MESSAGE: [5260824]: BUILD PASSED |
Signed-off-by: Joaquin Anton janton@nvidia.com
Co-authored-by: Michał Zientkiewicz mzient@gmail.com
Category:
New feature
Description:
Additional information:
Affected modules and functionalities:
New module
Key points relevant for the review:
Interfaces, overall design
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2734