Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video resize #2164

Merged
merged 32 commits into from
Aug 7, 2020
Merged

Video resize #2164

merged 32 commits into from
Aug 7, 2020

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Jul 28, 2020

Why we need this PR?

Pick one, remove the rest

  • It adds new feature needed because we want video, channel-first (and 3D) Resize
  • Refactoring to allow different resize types and dimensionalities

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • Almost complete rewrite of Resize operators
  • Affected modules and functionalities:
    • Resize operator family
  • Key points relevant for the review:
    • ??
  • Validation and testing:
    • Existing tests for regressions
    • ResizeAttr tests for new parameterizations
    • Python tests?
  • Documentation (including examples):
    • Separate PR

JIRA TASK: DALI-1077 DALI-1537 DALI-1538

@mzient mzient requested a review from a team July 28, 2020 23:27
#include "dali/operators/image/crop/random_crop_attr.h"
#include "dali/kernels/imgproc/resample/params.h"

namespace dali {

template <typename Backend>
class RandomResizedCrop : public Operator<Backend>
, protected ResizeBase
, protected RandomCropAttr {
, protected ResamplingFilterAttr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to inherit from ResamplingFilterAttr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can have it via composition. Other operators are already refactored.

DALI_INTERP_LINEAR, true)
.AddOptionalArg("min_filter", "Filter used when scaling down",
DALI_INTERP_LINEAR, true)
.AddOptionalArg<DALIDataType>("dtype", "Output data type; must be same as input type of `float`. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "or float". Typo.

void ResamplingFilterAttr::PrepareFilterParams(
const OpSpec &spec, const ArgumentWorkspace &ws, int num_samples) {
if (num_samples < 0)
num_samples = spec.GetArgument<int>("batch_size");
Copy link
Contributor

@JanuszL JanuszL Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can slowly stop using "batch_size" at all? Just asking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll see what I can do. It's just a fallback anyway.

}

template <>
void Resize<CPUBackend>::RunImpl(SampleWorkspace &ws) {
void Resize<CPUBackend>::RunImpl(HostWorkspace &ws) {
const int thread_idx = ws.thread_idx();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

const auto &input_shape = input.shape();
auto &attr_out = ws.OutputRef<CPUBackend>(1);
const auto &attr_shape = attr_out.shape();
assert(attr_shape.num_samples() == input_shape.num_samples() && attr_shape.sample_dim() == 1 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you format in the same way as in L88

@@ -150,3 +150,7 @@ def test_video_resize(batch_size=2):
for vp in video_reader_params:
for rp in resize_params:
yield run_for_params, batch_size, vp, rp

if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad idea. It quickly becomes obsolete. If you really must you can try to query this module for all available functions and run all that falls into the default nosetests regexp - https://nose.readthedocs.io/en/latest/usage.html#cmdoption-m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's here because nosetests launches new processes and defeats debugging attempts (and does other ugly things that have to be disabled with "no-smoke" and "don't-catch-fire" kind of flags).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then query this module for all available functions and run all that falls into the default nosetests regexp - https://nose.readthedocs.io/en/latest/usage.html#cmdoption-m sounds like a way to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I'm not sure about this. With how it is implemented now, we lose all of the nose functionality and it's still impractical to debug single test. In my opinion, this should be implemented either with this official API from nose or by your IDE plugin/setup (e.g. plugin for VS Code). And in both cases it does not belong with this PR, I think. I'm not even sure, if things like "how you debug unit tests" should be part of the code base. After all, we don't include other launch/debug configs, IDEs settings etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for official API - but I'd still prefer to have a way to run the tests with fewer levels of indirections.
I think I agree that this solution is not good after all - while it catches all tests, it's not what I really want here. I'll remove it and revert the changes to test_utils.
Regarding the plugin - I think it doesn't launch GDB, but a python debugger (?).

mzient added 12 commits July 31, 2020 09:01
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
- Refactoring - ResizeAttr and ResamplingFilterAttr now used by composition
- Bugfixes
- Add possibility to launch test without nosetests.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
out_size[d] = in_size[d] * final_scale;
scale[d] = final_scale;
} else if (std::abs(scale[d]) != final_scale) {
scale[d] = std::copysign(final_scale, scale[d]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be moved outside of the if - else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we need this if-else at all?

@mzient
Copy link
Contributor Author

mzient commented Aug 5, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1523577]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1523577]: BUILD FAILED

const TensorListShape<> &orig_shape) const {
int N = orig_shape.num_samples();
int D = NumSpatialDims();
assert(orig_shape.sample_dim() == D);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't there non-spatial dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not tested, apparently :\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added.

output image is smaller than specified - e.g. 640x480 image with desired output
size of 1920x1080 will actually produce 1920x1440 output.

This argument is mutually exclusive with ``resize_longer`` and ``resize_shorter``)", "default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this argument is mutually exclusive with resize_longer and resize_shorter. IMO the resize mode should be compatible with those as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below.

.AddOptionalArg("resize_shorter", R"(The length of the shorter dimension of the resized image.
This option is mutually exclusive with ``resize_longer`` and explicit size arguments
The op will keep the aspect ratio of the original image.
This option is equivalent to specifying the same size for all dimensions and ``mode="not_smaller"``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why you are assuming mode="not_smaller" here? I understood that resize_shorter is equivalent to resize_x, given that x is the shorter dimension (for example).

Copy link
Contributor Author

@mzient mzient Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... it's not. At least it's not implemented this way. resize_shorter=S is implemented as:

resize(size=S, # broadcast to all dims
       mode="not_smaller")

So it cannot be combined with any other mode, because it implies the mode.
I see no real-life use case when one would want to resize the shorter/longer edge and keep the others or combine it with other modes.

I assumed that the whole reason for having resize_shorter/resize_longer is to make the image large enough to be able to make square crops with that size. If we extend this to rectangular crops, we end up with rectangular size
and mode "not_smaller".

must be specified together with ``roi_start``. The coordinates follow the tensor shape order
(same as ``size``). The coordinates can be either absolute (in pixels, the default) or
relative (0..1), depending on the value of ``relative_roi`` argument. If a RoI origin is greater
than RoI end, the region is flipped.))", nullptr, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
than RoI end, the region is flipped.))", nullptr, true)
than RoI end in any dimension, the resulting image will be flipped (mirrored) in that dimension.))", nullptr, true)

int sample_idx) const {
in_lo.resize(spatial_ndim_);
in_hi.resize(spatial_ndim_);
static constexpr float min_size = 1e-3f; // minimum size, in pixels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this value. This is not a number of pixels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a number in pixels, but it is a distance in pixels. You're free to have a RoI that is not placed at pixel boundary - indeed, with bicubic or Lanczos resampling, it's going to make sense and even look good.

}
}
if (has_max_size_) {
for (int d = 0; d < spatial_ndim_; d++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be extracted to ApplyMaxSize or something like that (I see that this code is repeated)

Copy link
Contributor Author

@mzient mzient Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the threshold value. It's repeated only twice and not very long. The third application of max size is different...


for (int d = 0; d < spatial_ndim_; d++) {
DALI_ENFORCE(in_lo[d] != in_hi[d] || requested_size[d] == 0,
"Cannot produce non-empty output from empty input");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print d, in_lo/in_hi, requested_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the size is 0, lo and hi will always be 0.

bool flip = out_sz < 0;
params.dst_size[d] = std::max(1, round_int(std::fabs(out_sz)));
if (flip)
std::swap(params.src_lo[d], params.src_hi[d]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lo was > hi, and you swap them here. How does the flipping work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways to flip:

  1. negative output size
  2. lo > hi
    The final result uses the latter, but AdjustOutputSize uses the former (hence the first swap).


template <typename Backend>
struct ResizeBase<Backend>::Impl {
using input_type = typename Workspace::template input_t<Backend>::element_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using input_type = typename Workspace::template input_t<Backend>::element_type;
using InputType = typename Workspace::template input_t<Backend>::element_type;

Would read better IMO

dali/pipeline/operator/common.h Show resolved Hide resolved
…d examples.

Fix imports in test_operator_resize_seq test.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Add test for save_attrs.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient
Copy link
Contributor Author

mzient commented Aug 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1525664]: BUILD STARTED

kernels::KernelContext ctx;

for (int i = 0; i < GetNumFrames(); i++) {
kernels::InTensorCPU<In, frame_ndim> dummy_input;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'll redo it when dealing with 3D resize - but I want to avoid merging the changes now.

vector<float> shape;

int out_d = GetShapeLikeArgument<float>(shape, spec, "size", ws, -1, -1);
EXPECT_EQ(out_d, D) << "Diemsnionality should match the size of the tensors in the list.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPECT_EQ(out_d, D) << "Diemsnionality should match the size of the tensors in the list.";
EXPECT_EQ(out_d, D) << "Dimensionality should match the size of the tensors in the list.";

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1525664]: BUILD PASSED

Add a test for dtype.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient
Copy link
Contributor Author

mzient commented Aug 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1526193]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1526193]: BUILD FAILED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient
Copy link
Contributor Author

mzient commented Aug 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1526525]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1526525]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1526525]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1527950]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1527950]: BUILD PASSED

@mzient mzient merged commit d931bca into NVIDIA:master Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants