Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imgcodec module boilerplate (interfaces/placeholders/basic logic) #4029

Merged
merged 9 commits into from
Jul 5, 2022

Conversation

jantonguirao
Copy link
Contributor

@jantonguirao jantonguirao commented Jun 30, 2022

Signed-off-by: Joaquin Anton janton@nvidia.com
Co-authored-by: Michał Zientkiewicz mzient@gmail.com

Category:

New feature

Description:

  • It starts a new library libdali_imgcodec, to be used to hold all the image decoding infrastructure, that we will use later to re-implement our image decoder operators.
  • The new architecture is meant to be easy to extend, and to take care of the differences between our image decoding implementations (ROI support, type conversion, CPU/GPU, etc)
  • This is an initial code drop, which will be followed up with an actual implementation

Additional information:

Affected modules and functionalities:

New module

Key points relevant for the review:

Interfaces, overall design

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2734

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5221126]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5221126]: BUILD FAILED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5221299]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5221299]: BUILD PASSED

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -0,0 +1,42 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste; nothing to review here.

@@ -0,0 +1,98 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important file

@@ -0,0 +1,153 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important file

@@ -0,0 +1,94 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important file

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Comment on lines +16 to +17
"dali_operator_test.bin" \
"dali_imgcodec_test.bin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was not added to Xavier tests? Was it intentional for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that one by accident. Good catch

std::vector<ImageCodec*> codec_ptrs_;
};

class DLL_PUBLIC ImageFormatRegistry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I understand this is sort of an "entry point" to this API. Can you share a bit on how this will be used by the operators? If there are any adapters or additional layers of abstraction necessary to use it in the operator maybe missing functionality should be included here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly the entry point - there must be something on a higher level, but we're still considering whether that something should be a part of this library or left to the operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant entry point to the part that is in this PR.

I am asking about this because I am a bit afraid that this design is incomplete with regards on how it will be integrated with DALI. This will probably cause some, maybe significant changes, when this integration eventually comes.

I think it would be good to have that figured out before we spend some significant effort into implementing it for particular formats.

} orientation;
};

class ImageParser {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this interface is supposed to be implemented by others I think it requires documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
};

class DLL_PUBLIC ImageCodec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed I think using the word "codec" in this context might be misleading. As far as I understand this is something that can give you an object that can be used to decode images encoded with some particular method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename to Decoder

* @param image
* @return ImageFormat*
*/
ImageFormat* GetImageFormat(ImageSource *image) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, we get ImageSource from the reader, then we look here for matching ImageFormat. We do so by using CanParse method of a parser associated with that ImageFormat. Is this correct?
And this is done, because ImageSource is basically raw data with some small additions.

I think that more often than not, we know the format of a file based on its extension or some other external information (like DL dataset description). Having this, there is no need to dynamically check, which parser matches the ImageSource and by extension what is the format.
I get that in general, maybe we do not now the format in the first place and we need this but I think that based on our use cases this is rather an exceptional situation. Is this assumption correct?

Are there any plans to include something like that? Or maybe parsers can depend on ImageSource.SourceInfo for some additional context.

I am asking because I am a bit afraid that it will not be easy to write those CanParse functions and this method of finding a format relies on them. Especially with many formats and matching format being at the end of the list, we might see some performance penalties, if the CanParse methods are slow.

Copy link
Contributor

@mzient mzient Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be basically Cut/Paste (with slight modifications) from functions like CheckIsJPEG, CheckIsPNG, etc which we already have. The biggest difference is that the new functions will work with ImageSource rather than raw memory.

Especially with many formats and matching format being at the end of the list, we might see some performance penalties, if the CanParse methods are slow.

We already do that - and sometimes more than once, as is the case of peek_image_shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the question is, do we want to replicate this approach or not?

Since this whole thing is being redone from scratch this is the time to question assumptions made before, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding CanParse. You need to check if the header of the file is the one you are looking for (reading the first few bytes). I don't think there's much point in "knowing" what the file format is, because you can have a cat.jpg that is in fact a TIFF (we do have those in ImageNet). It's not until you read the header of the file that you know the format of the file.

I get that your concern is that most of the time, when a file is called "cat.jpg" is in fact a JPEG, and we could use that as a hint to try the JPEG parser first. I am not sure this is justified, when we are basically just checking one or two bytes in the header. It's equivalent to checking few bytes in the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

I agree that generally there is not much overhead to check this. Maybe I'm just a bit skewed by the issues from video space where the only 100% solid way to check something is to decode it :)

}
};

class DLL_PUBLIC ImageCodec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how these will be created and kept during the pipeline or program lifetime? Is there any registry or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see ImageFormatRegistry. You register formats and you register codecs for those formats. The ImageCodec is used as a factory to create ImageCodecInstance. There will likely be like a codec instance cache on a higher level abstraction that will be used by the ops. This will be part of the next PR.

// Note: We will have a single format registry for DALI 
ImageFormatRegistry registry;
ImageFormat jpeg("jpeg", std::make_shared<JpegParser>());
jpeg.RegisterCodec(std::make_shared<JpegCodec>(), 0.0f);


// Inside a particular OP. We might want to abstract the logic of the cache to a separate entity (next PR)
auto format = registry.GetImageFormat(image_source);

for (auto codec : format.Codecs()) {
   if (...) {  // for example limiting to host mem codecs
     if (codec instance in cache) {
         // use codec instance
     } else {
         codec_instance = codec.Create(...)        
         // cache codec instance
         // use codec instance   
     }
   }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was asking about where ImageCodec comes from.

/**
* @brief A source of data from image parsers and codecs
*/
class DLL_PUBLIC ImageSource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a bit how ImageSource will be created? By the readers? Or output from the readers will be wrapped in the decoder?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least initially the decoder will wrap the memory into an ImageSource by calling FromHostMem. In case of fused reader-decoders, we will likely employ other methods (most likely FromFilename).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are created by those From* functions.

A decoder op will do FromHostMem(tensor_data, tensor_vol, tensor_source_info)

A reader op will do FromFilename(image_filename)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean regular reader will do FromFilename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a future fused reader that produces decoded images, as we are planning to have (for partial reading purposes)

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
* @param image
* @return ImageFormat*
*/
ImageFormat* GetImageFormat(ImageSource *image) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ImageFormat* GetImageFormat(ImageSource *image) const;
const ImageFormat* GetImageFormat(ImageSource *image) const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,153 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should filename be changed to image_decoder.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* @brief Creates an instance of a codec
*
* Note: For codecs that carry no state, this may just increase reference count on a singleton.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with the naming ImageDecoder instead of ImageCodec, the name codec no longer has any meaning.

Suggested change
* Note: For codecs that carry no state, this may just increase reference count on a singleton.
* Note: For decoders that carry no state, this may just increase reference count on a singleton.

You might want to grep for the word codec in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* @brief Creates an image source from data in device memory
*/
static ImageSource FromDeviceMem(const void *mem, size_t size, std::string source_info = "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it is not supported now, but out of curiosity: do we want to pass the dev mem without some synchronization context or is it just the ops job and we don't want it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent suggestion. I think we could use AccessOrder here.

Signed-off-by: Joaquin Anton <janton@nvidia.com>
Signed-off-by: Joaquin Anton <janton@nvidia.com>
@@ -0,0 +1,33 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line above license header?

Signed-off-by: Joaquin Anton <janton@nvidia.com>
@jantonguirao
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5260824]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5260824]: BUILD PASSED

@jantonguirao jantonguirao merged commit 10961ec into NVIDIA:main Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants