-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video resize #2164
Video resize #2164
Conversation
#include "dali/operators/image/crop/random_crop_attr.h" | ||
#include "dali/kernels/imgproc/resample/params.h" | ||
|
||
namespace dali { | ||
|
||
template <typename Backend> | ||
class RandomResizedCrop : public Operator<Backend> | ||
, protected ResizeBase | ||
, protected RandomCropAttr { | ||
, protected ResamplingFilterAttr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to inherit from ResamplingFilterAttr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can have it via composition. Other operators are already refactored.
DALI_INTERP_LINEAR, true) | ||
.AddOptionalArg("min_filter", "Filter used when scaling down", | ||
DALI_INTERP_LINEAR, true) | ||
.AddOptionalArg<DALIDataType>("dtype", "Output data type; must be same as input type of `float`. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "or float". Typo.
void ResamplingFilterAttr::PrepareFilterParams( | ||
const OpSpec &spec, const ArgumentWorkspace &ws, int num_samples) { | ||
if (num_samples < 0) | ||
num_samples = spec.GetArgument<int>("batch_size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can slowly stop using "batch_size" at all? Just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll see what I can do. It's just a fallback anyway.
} | ||
|
||
template <> | ||
void Resize<CPUBackend>::RunImpl(SampleWorkspace &ws) { | ||
void Resize<CPUBackend>::RunImpl(HostWorkspace &ws) { | ||
const int thread_idx = ws.thread_idx(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
const auto &input_shape = input.shape(); | ||
auto &attr_out = ws.OutputRef<CPUBackend>(1); | ||
const auto &attr_shape = attr_out.shape(); | ||
assert(attr_shape.num_samples() == input_shape.num_samples() && attr_shape.sample_dim() == 1 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you format in the same way as in L88
@@ -150,3 +150,7 @@ def test_video_resize(batch_size=2): | |||
for vp in video_reader_params: | |||
for rp in resize_params: | |||
yield run_for_params, batch_size, vp, rp | |||
|
|||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad idea. It quickly becomes obsolete. If you really must you can try to query this module for all available functions and run all that falls into the default nosetests regexp - https://nose.readthedocs.io/en/latest/usage.html#cmdoption-m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's here because nosetests launches new processes and defeats debugging attempts (and does other ugly things that have to be disabled with "no-smoke" and "don't-catch-fire" kind of flags).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then query this module for all available functions and run all that falls into the default nosetests regexp - https://nose.readthedocs.io/en/latest/usage.html#cmdoption-m
sounds like a way to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not sure about this. With how it is implemented now, we lose all of the nose functionality and it's still impractical to debug single test. In my opinion, this should be implemented either with this official API from nose or by your IDE plugin/setup (e.g. plugin for VS Code). And in both cases it does not belong with this PR, I think. I'm not even sure, if things like "how you debug unit tests" should be part of the code base. After all, we don't include other launch/debug configs, IDEs settings etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for official API - but I'd still prefer to have a way to run the tests with fewer levels of indirections.
I think I agree that this solution is not good after all - while it catches all tests, it's not what I really want here. I'll remove it and revert the changes to test_utils.
Regarding the plugin - I think it doesn't launch GDB, but a python debugger (?).
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
- Refactoring - ResizeAttr and ResamplingFilterAttr now used by composition - Bugfixes - Add possibility to launch test without nosetests. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
out_size[d] = in_size[d] * final_scale; | ||
scale[d] = final_scale; | ||
} else if (std::abs(scale[d]) != final_scale) { | ||
scale[d] = std::copysign(final_scale, scale[d]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be moved outside of the if - else
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do we need this if-else
at all?
!build |
CI MESSAGE: [1523577]: BUILD STARTED |
CI MESSAGE: [1523577]: BUILD FAILED |
dali/operators/image/resize/resize.h
Outdated
const TensorListShape<> &orig_shape) const { | ||
int N = orig_shape.num_samples(); | ||
int D = NumSpatialDims(); | ||
assert(orig_shape.sample_dim() == D); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't there non-spatial dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tested, apparently :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added.
output image is smaller than specified - e.g. 640x480 image with desired output | ||
size of 1920x1080 will actually produce 1920x1440 output. | ||
|
||
This argument is mutually exclusive with ``resize_longer`` and ``resize_shorter``)", "default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this argument is mutually exclusive with resize_longer and resize_shorter. IMO the resize mode should be compatible with those as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
.AddOptionalArg("resize_shorter", R"(The length of the shorter dimension of the resized image. | ||
This option is mutually exclusive with ``resize_longer`` and explicit size arguments | ||
The op will keep the aspect ratio of the original image. | ||
This option is equivalent to specifying the same size for all dimensions and ``mode="not_smaller"``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why you are assuming mode="not_smaller" here? I understood that resize_shorter is equivalent to resize_x, given that x is the shorter dimension (for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... it's not. At least it's not implemented this way. resize_shorter=S is implemented as:
resize(size=S, # broadcast to all dims
mode="not_smaller")
So it cannot be combined with any other mode, because it implies the mode.
I see no real-life use case when one would want to resize the shorter/longer edge and keep the others or combine it with other modes.
I assumed that the whole reason for having resize_shorter/resize_longer is to make the image large enough to be able to make square crops with that size. If we extend this to rectangular crops, we end up with rectangular size
and mode "not_smaller".
must be specified together with ``roi_start``. The coordinates follow the tensor shape order | ||
(same as ``size``). The coordinates can be either absolute (in pixels, the default) or | ||
relative (0..1), depending on the value of ``relative_roi`` argument. If a RoI origin is greater | ||
than RoI end, the region is flipped.))", nullptr, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
than RoI end, the region is flipped.))", nullptr, true) | |
than RoI end in any dimension, the resulting image will be flipped (mirrored) in that dimension.))", nullptr, true) |
int sample_idx) const { | ||
in_lo.resize(spatial_ndim_); | ||
in_hi.resize(spatial_ndim_); | ||
static constexpr float min_size = 1e-3f; // minimum size, in pixels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this value. This is not a number of pixels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a number in pixels, but it is a distance in pixels. You're free to have a RoI that is not placed at pixel boundary - indeed, with bicubic or Lanczos resampling, it's going to make sense and even look good.
} | ||
} | ||
if (has_max_size_) { | ||
for (int d = 0; d < spatial_ndim_; d++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be extracted to ApplyMaxSize
or something like that (I see that this code is repeated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the threshold value. It's repeated only twice and not very long. The third application of max size is different...
|
||
for (int d = 0; d < spatial_ndim_; d++) { | ||
DALI_ENFORCE(in_lo[d] != in_hi[d] || requested_size[d] == 0, | ||
"Cannot produce non-empty output from empty input"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print d, in_lo/in_hi, requested_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the size is 0, lo and hi will always be 0.
bool flip = out_sz < 0; | ||
params.dst_size[d] = std::max(1, round_int(std::fabs(out_sz))); | ||
if (flip) | ||
std::swap(params.src_lo[d], params.src_hi[d]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lo was > hi, and you swap them here. How does the flipping work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two ways to flip:
- negative output size
- lo > hi
The final result uses the latter, but AdjustOutputSize uses the former (hence the first swap).
|
||
template <typename Backend> | ||
struct ResizeBase<Backend>::Impl { | ||
using input_type = typename Workspace::template input_t<Backend>::element_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using input_type = typename Workspace::template input_t<Backend>::element_type; | |
using InputType = typename Workspace::template input_t<Backend>::element_type; |
Would read better IMO
…d examples. Fix imports in test_operator_resize_seq test. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Add test for save_attrs. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
!build |
CI MESSAGE: [1525664]: BUILD STARTED |
kernels::KernelContext ctx; | ||
|
||
for (int i = 0; i < GetNumFrames(); i++) { | ||
kernels::InTensorCPU<In, frame_ndim> dummy_input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'll redo it when dealing with 3D resize - but I want to avoid merging the changes now.
vector<float> shape; | ||
|
||
int out_d = GetShapeLikeArgument<float>(shape, spec, "size", ws, -1, -1); | ||
EXPECT_EQ(out_d, D) << "Diemsnionality should match the size of the tensors in the list."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_EQ(out_d, D) << "Diemsnionality should match the size of the tensors in the list."; | |
EXPECT_EQ(out_d, D) << "Dimensionality should match the size of the tensors in the list."; |
CI MESSAGE: [1525664]: BUILD PASSED |
Add a test for dtype. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
!build |
CI MESSAGE: [1526193]: BUILD STARTED |
CI MESSAGE: [1526193]: BUILD FAILED |
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
!build |
CI MESSAGE: [1526525]: BUILD STARTED |
CI MESSAGE: [1526525]: BUILD FAILED |
CI MESSAGE: [1526525]: BUILD PASSED |
CI MESSAGE: [1527950]: BUILD STARTED |
CI MESSAGE: [1527950]: BUILD PASSED |
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
JIRA TASK: DALI-1077 DALI-1537 DALI-1538