-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Run-Length Encodings and Pixelwise Masks in COCO Reader #2248
Conversation
5b1b3a6
to
64afc53
Compare
Hi, |
Hi @epaminondas, I think the easiest way would be for us (DALI) to create some samples and add the samples to DALI_extra so they can be used for this PR in unit tests. As you mentioned that you have your own datasets - can you share with us how to create valid data samples and what tools can we use? Otherwise we would be happy to accept some examples to DALI_extra, either some samples from your datasets if this is possible or newly crated ones. That requires signing the CLA and the contributor to have the rights to the work he is adding. The images that we used in that repository mostly came from pixabay, were licensed under CC0 with no attribution required. I'm open to any of this options, whichever suits you more. We will start the review, I have already one minor comment - our CI runs the |
Hi there, That said, I can very well annotate a bunch of images under CC0 from pixabay. I'll prepare a minimal dataset as soon as possible. I'll have a look at your linter results too. |
@epaminondas - I think you can use the images we already have in the repo, just create a separate COCO or update the existing one. Whatever works best for you. |
OK I'll do that ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To sum up:
- please check if the
rleMerge
is used as intended for masks comprising of more than one polygon. - please wrap the allocations in either std::vector or unique_ptrs (whatever suits the case more).
The merging of RLE into one Encoding looks ok to me but I would be happy to see a test for it if it is possible.
@@ -104,8 +114,16 @@ class COCOReader : public DataReader<CPUBackend, ImageLabelWrapper> { | |||
coords.size() * sizeof(float)); | |||
} | |||
|
|||
if (pixelwise_masks_) { | |||
auto &masks_output = ws.Output<CPUBackend>(3); | |||
masks_output.Resize({1, widths_[image_id], heights_[image_id]}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This outputs the masks as CWH
image. Is it intended? Most of DALI operators assume that data is HWC
by default (and some can handle channel-first).
You can also use Tensor::SetLayout(): https://github.com/NVIDIA/DALI/blob/master/dali/pipeline/data/tensor.h#L519
to indicate the layout, for example for the current "CWH":
masks_output.SetLayout("CWH");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the thing is, run-length encoding implies a width-first scan, so it seemed natural to output CWH or WHC directly.
Of course I could just transpose afterwards. But in my pipeline I just use an Operator for that during data augmentation, and that seems to do the trick..
I'll add the SetLayout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RLEs are scanned first horizontally, then vertically, then the storage order should be CHW or HWC. The layouts and shapes have the outer dimension first.
masks_output.Resize({1, widths_[image_id], heights_[image_id]}); | |
masks_output.Resize({heights_[image_id], widths_[image_id], 1}); |
Please double check it with a non-square mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, my mistake.
pycocotools actually uses column-major order : https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/mask.py#L51
so I meant height-first scan.
Typically I use
https://numpy.org/doc/stable/reference/generated/numpy.asfortranarray.html
when creating my annotations for that very reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what's the plan? I don't think it's good to produce a mask that has a layout incompatible with the image (which is row-major) unless an overwhelming majority of the models expect the mask in that order. Otherwise, we should go with row-major or perhaps configurable layout (swapping the axes is relatively straightforward - we just need to swap w, h and the coordinates of the points passed to rleFrPoly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's good to produce a mask that has a layout incompatible with the image (which is row-major) unless an overwhelming majority of the models expect the mask in that order.
Can't argue with that, I'd like to output HWC. I'm not sure what's the best way though, aside from the obvious transpose and a swap buffer after the final decoding.
we just need to swap w, h and the coordinates of the points passed to rleFrPoly
The problem is that some annotations are already in column-major RLE (the ones we pass through rleFrString).
Of course since I decode manually at the end of the function I could just write the row-major tensor column-by-column, but that's probably going to be a very inefficient memory access scheme, and the decoding is very clearly the bottleneck.
I can't figure out a way to convert from column-major RLE to row-major RLE.
What do you suggest @mzient ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Are we 100% sure that there are situations where the masks are indeed column major and the image is not?
- Do we need to have a transposed RLE or just transposed mask? If it's the latter, then it's quite easy to render the mask in the other direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah as soon as you use pycocotools, you'll get column-major masks with RLE. And since it's the "official" tool...
- Yup got it !
rleMerge(&frString[ann_cnt++], &R[label], 1, 0); | ||
} | ||
|
||
// Merge all the labels into a pair of vectors : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that isn't too much of a hassle, could you extract the part that prepares the one Encoding
from several RLEs for distinct labels and unit test it on some several simple cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do the unit test on the pixabay data first since I already prepared it, but I'll see to that afterwards
How do we go forward on this one ? But I still don't know what's the best way I could output HWC. It seems obvious I should do that, since that's clearly what the algorithms would expect in order to compute a cross-entropy loss for example. But since the spec mandates annotations to be encoded in column-major order, I'll have to transpose the tensor at some point. What do you think ? |
memset(mask, 0, h * w * sizeof(int)); | ||
for (uint i = 0; i < A.m; i++) | ||
for (uint j = 0; j < A.cnts[i]; j++) | ||
*mask++ = A.vals[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mask is indeed column-major, you can rewrite it like this:
memset(mask, 0, h * w * sizeof(int)); | |
for (uint i = 0; i < A.m; i++) | |
for (uint j = 0; j < A.cnts[i]; j++) | |
*mask++ = A.vals[i]; | |
memset(mask, 0, h * w * sizeof(int)); | |
uint x = 0, y = 0; | |
for (uint i = 0; i < A.m; i++) | |
for (uint j = 0; j < A.cnts[i]; j++) { | |
mask[x + y * w] = A.vals[i]; | |
if (++y >= h) { | |
y = 0; | |
x++; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'll do that !
I was a bit scared about memory access patterns in this loop which dominates the rest of the algorithm, but I guess it's premature optimization on my side.
OK the format is now HWC, and I added a test case. I have a commit ready for DALI_extra. It creates a new (minimal) dataset with 6 images I borrowed from db/coco/images, along with their mask. Do I still need to sign something in that case (I suppose not ?) How do I reference the version of DALI_extra needed, since the test case I added will obviously depend on it ? |
If you have created a new set of masks on your own you need to. Just follow this guide https://github.com/NVIDIA/DALI_extra/blob/master/NVIDIA_CLA_v1.0.1.docx (print, sign, scan, mail), and the issue a PR there. When it is merged bump up the SHA inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update DALI_EXTRA_VERSION with corresponding commit from DALI_extra.
We use this file to make sure code and test data are consistent. Thanks
@epaminondas I think you need to rebase as DALI_EXTRA_VERSION was updated in parallel in a different commit? |
beb57e4
to
52fc73f
Compare
!build |
CI MESSAGE: [1644615]: BUILD STARTED |
CI MESSAGE: [1644615]: BUILD FAILED |
@epaminondas
Could you fix them and run |
|
Oops missclick |
On it ! |
!build |
1 similar comment
!build |
CI MESSAGE: [1644724]: BUILD STARTED |
CI MESSAGE: [1644724]: BUILD FAILED |
|
cmake/Dependencies.common.cmake
Outdated
set(GENERATE_POSITION_INDEPENDENT_CODE ON CACHE BOOL "-fPIC") | ||
set(ENABLE_SHARED OFF CACHE BOOL "shared library target") | ||
set(ENABLE_STATIC ON CACHE BOOL "static library target") | ||
|
||
project(cocoapi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you need to create a project, and set this variables.
|
||
project(cocoapi) | ||
set(SOURCE_FILES third_party/cocoapi/common/maskApi.c) | ||
add_library(cocoapi STATIC ${SOURCE_FILES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add:
target_link_libraries(cocoapi PRIVATE "-pie")
set_target_properties(cocoapi PROPERTIES POSITION_INDEPENDENT_CODE ON)
``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I can try that. It obviously builds on my machine so don't know how to test it though.
I would add:
target_link_libraries(cocoapi PRIVATE "-pie") set_target_properties(cocoapi PROPERTIES POSITION_INDEPENDENT_CODE ON) ``
It does not build anymore on my machine if I add the
-pie
option, it makes GCC thinks I'm building an executable not a library.
Did you mean -fpic ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I have confused it with the part where we are building the tests.
I see that we set PIC
project wide in https://github.com/NVIDIA/DALI/blob/master/CMakeLists.txt#L140. So maybe if you remove this project(cocoapi)
it will just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm able to reproduce on my machine, I will take a look and come back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's enough to add
set_target_properties(cocoapi PROPERTIES POSITION_INDEPENDENT_CODE ON)
to make it (mostly) compile. I also needed to adjust the test to the changes for CocoLoader constructor:
Changing the dali/operators/reader/loader/loader_test.cc:118
to:
std::vector<int> heights;
std::vector<int> widths;
std::vector<int> offsets;
std::vector<float> boxes;
std::vector<int> labels;
std::vector<int> counts;
std::vector<std::vector<int> > masks_meta;
std::vector<std::vector<float> > masks_coords;
std::vector<std::vector<std::string> > masks_rles;
std::vector<std::vector<int> > masks_rles_idx;
std::vector<int> original_ids;
std::string file_root = testing::dali_extra_path() + "/db/coco/images";
std::string annotations_file = testing::dali_extra_path() + "/db/coco/instances.json";
auto coco_spec = OpSpec("COCOReader")
.AddArg("file_root", file_root)
.AddArg("annotations_file", annotations_file)
.AddArg("batch_size", 32)
.AddArg("device_id", 0)
.AddArg("dont_use_mmap", dont_use_mmap);
shared_ptr<dali::CocoLoader> reader(
new CocoLoader(
coco_spec, heights, widths, offsets, boxes, labels, counts, masks_meta,
masks_coords, masks_rles, masks_rles_idx, false, false, false, original_ids));
I was able to build and run the test successfully. Can you add that or similar change there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure !
Can you please add the contents of: https://github.com/cocodataset/cocoapi/blob/master/license.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the Acknowledgements.txt update and the missing parameter to the constructor, looks ok.
!build |
CI MESSAGE: [1646078]: BUILD STARTED |
CI MESSAGE: [1646078]: BUILD FAILED |
We have some problem with #2273 and related test data that encountered an issue with our CI, that's why this build is also failing. We will be resolving it hopefully soon. |
This also introduces a dependency on https://github.com/cocodataset/cocoapi (known as pycocotools in the python world). Its license is BSD. Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
Signed-off-by: Nicolas Lopez <nicolas.lopez@mines-paris.org>
285e220
to
0df3cbc
Compare
!build |
@epaminondas I took a liberty of rebasing your change to the latest master, the problems with #2273 are resolved, build should start in a moment and I hope we will be able to merge today. Edit: merged :) |
CI MESSAGE: [1651278]: BUILD STARTED |
CI MESSAGE: [1651278]: BUILD PASSED |
@epaminondas - you can test this change in the tomorrow DALI nightly build. |
Allright, thanks a lot for all the work ! |
Why we need this PR?
Run-length encodings annotations are used instead of polygons for semantic segmentation-style tasks (I believe the COCO people call them 'Stuff Segmentation').
It's also sometimes used in instance segmentation, tagged as "is_crowd" for things like grapes in a bowl for example, when you don't want an instance for each of them.
What happened in this PR?
The strings representing the RLEs in the JSON are parsed and stored in-memory, akin to the polygons.
At runtime, with the correct option, the polygons are rendered to RLEs, the strings are converted to RLEs, then all RLEs are merged, finally the result is decoded as a full-resolution mask.
An option "pixelwise_masks" was added to the COCO Reader, incompatible with the "masks" option.
All the non-trivial code is located inside void COCOReader::PixelwiseMasks(int image_id, int* mask).
I used CocoAPI to convert the RLE strings to runtime objects, and to render the polygons to RLEs.
This thus introduces a dependency on https://github.com/cocodataset/cocoapi (known as pycocotools in the
python world). Its license is BSD.
I suppose you require some unit testing. I tested the code on my own datasets, but I'll add unit testing if you give me directions.
I filled the relevant strings in the Spec argument, but I can add more details if needed.