-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RandomBBoxCrop errors while using crop_shape argument #2605
Conversation
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
!build |
CI MESSAGE: [1976940]: BUILD STARTED |
CI MESSAGE: [1976940]: BUILD PASSED |
auto &anchor_out = ws.template OutputRef<CPUBackend>(0); | ||
anchor_out.Resize(uniform_list_shape(num_samples, {ndim})); | ||
auto anchor_out_view = view<float>(anchor_out); | ||
DALIDataType output_dtype = has_crop_shape_ ? DALI_INT32 : DALI_FLOAT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a good idea to change the output type based on presence of one of the arguments.
Maybe I always should be DALI_FLOAT? Or just DALI_INT32 and use cast operator after that operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if user provided crop_shape argument (which represents the exact size in pixels of desired crop region) then output anchors and shapes should also be returned as integers. Returning them as relative float values could potentially result in output shape being off by 1 pixel due to fp arithmetic errors and this behaviour could break the pipeline in some use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about returning always FLOAT in that case (but with switch that is by default on that changes it to int - so we won't break the existing code while introducing new functionality).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requirement for different output type depending on whether crop_shape argument was provided or not is imposed because of the way Slice operator interprets its input arguments (int -> pixel coords, float -> normalized coords). I believe that current fix won't break any existing code - any previous usage of RandomBBoxCrop operator with crop_shape provided required additional cast to int to be declared explicitly in pipeline - current fix will provide this cast unnecessary, although the results will be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other solution I see is to always return int values (in pixel coordinates), even when using scaling instead of crop_shape (possibly with switch to return the output in normalized coords instead). However, this will probably require much more existing code to be rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another problem. We cannot return pixel coordinates with scaling provided as we lack image shape data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that returning different types should be OK, given that Slice
operator interprets those differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed internally with the team and came to the conclusion that changing the output type of the operator silently, based on the presence of another argument, is not a good user experience, since it could break some existing code. So, for the sake of backward compatibility we should keep DALI_FLOAT by default.
I can suggest some solutions:
- Use
normalized_anchor=False
,normalized_shape=False
in Slice, so that you don't need to cast to int32 - Cast to int32 (it adds extra overhead, not recommended)
- Add a output_dtype argument to this operator, and only use int32 when it was explicitly requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I've found pixel coordinates returned as a DALI_FLOAT confusing, because of the way Slice
works by default, although I understand your point. Currently I've reverted my changes (leaving only the sigsegv fix). Maybe additional clarification should be added to RandomBBoxCrop
docs?
for d in range(ndim): | ||
assert(anchor[d] >= 0.0 and anchor[d] <= input_shape[d]) | ||
assert shape[d] == expected_crop_shape[d], "{} != {}".format(shape, expected_crop_shape) | ||
assert(anchor[d] + shape[d] > 0.0 and anchor[d] + shape[d] <= input_shape[d]) | ||
assert(anchor[d] + shape[d] > 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removing the second check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bottom right corner can be outside original image boundaries if crop_shape > input_shape.
anchor_out.Resize(uniform_list_shape(num_samples, {ndim})); | ||
auto anchor_out_view = view<float>(anchor_out); | ||
DALIDataType output_dtype = has_crop_shape_ ? DALI_INT32 : DALI_FLOAT; | ||
TYPE_SWITCH(output_dtype, type2id, type, (int, float), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYPE_SWITCH(output_dtype, type2id, type, (int, float), ( | |
TYPE_SWITCH(output_dtype, type2id, T, (int, float), ( |
would be more in line with other parts of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
auto &anchor_out = ws.template OutputRef<CPUBackend>(0); | ||
anchor_out.Resize(uniform_list_shape(num_samples, {ndim})); | ||
auto anchor_out_view = view<float>(anchor_out); | ||
DALIDataType output_dtype = has_crop_shape_ ? DALI_INT32 : DALI_FLOAT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that returning different types should be OK, given that Slice
operator interprets those differently.
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
!build |
CI MESSAGE: [1993040]: BUILD STARTED |
CI MESSAGE: [1993040]: BUILD PASSED |
Signed-off-by: Kacper Kluk <kk406184@students.mimuw.edu.pl>
!build |
CI MESSAGE: [1997445]: BUILD STARTED |
CI MESSAGE: [1997445]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Anchor and shape outputs of RandomBBoxCrop call with provided crop_shape/input_shape are cast to int before being returned, which prevents pixel dimensions from being interpreted as relative by other operators (eg. Slice). Additionally, anchor selection is handled differently when crop_shape > input_shape, which prevents segfaults previously caused by using std::uniform_int_distribution with incorrect constructor params.
Changes to dali/operators/image/crop/bbox_crop.cc
NA
NA
NA
JIRA TASK: [Use DALI-XXXX or NA]