Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sequence processing to warp operator #3879

Merged
merged 27 commits into from
May 30, 2022

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented May 9, 2022

Category:

New feature (non-breaking change which adds functionality)

Description:

Adds FHWC FDHWC processing to the warp affine operator. The transformation matrix can be specified per-frame and provided either as a positional or named argument.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

If enabling sequence processing and per-frame named (CPU) matrix argument only, the PR would have 2 lines of code.
However, matrix can be specified as a positional argument that can reside in GPU memory. This required extending the SequenceOperator functionality with broadcasting of per-sample positional arguments. However, for now, we still require GPU batches to be contiguous, which further required the SequenceOperator to actually allocate some tensorlists and perform copies. Further, the workspace is not cleared between iterations now.

The utility for testing sequence processing ops with per-frame args required some changes to specify positional arguments and the intended device:
#3901

To mark positional GPU argument as per-frame, per-frame operator was extended to work with GPU backend here:
#3900

The order property of TensorVector is not reflected properly by move assignment/constructor for TensorVector:
#3899

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: WAFFINE.09

JIRA TASK: DALI-2506

@stiepan stiepan marked this pull request as ready for review May 10, 2022 09:21
@stiepan stiepan changed the title Warp sequence processing Add sequence processing to warp operator May 10, 2022
@stiepan stiepan marked this pull request as draft May 10, 2022 09:49
@stiepan stiepan marked this pull request as ready for review May 12, 2022 09:14
@mzient mzient self-assigned this May 12, 2022
@@ -55,12 +110,13 @@ inline bool is_per_frame(const TensorVector<CPUBackend> &arg_input) {
* resize the outputs.
*/
template <typename Backend>
class SequenceOperator : public Operator<Backend> {
class SequenceOperator : public Operator<Backend>, public SampleBroadcasting<Backend> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this suffice?

Suggested change
class SequenceOperator : public Operator<Backend>, public SampleBroadcasting<Backend> {
class SequenceOperator : public Operator<Backend>, protected SampleBroadcasting<Backend> {

You're re-exposing the function BroadcastSamples anyway.

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is copy-paste error in the ExpandInput when unfolding, otherwise mostly stylistic comments and nitpicks.

@@ -72,6 +72,10 @@ class ArgumentWorkspace {
return *it->second.tvec;
}

TensorVector<CPUBackend>& UnsafeMutableArgumentInput(const std::string &arg_name) {
return const_cast<TensorVector<CPUBackend>&>(ArgumentInput(arg_name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm fan of this const cast (+ calling a const method from non-const one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should rethink what constness means in workspaces. But it's beyond the scope of this PR and we already have a similar functions for regular inputs.

Comment on lines 41 to 51
template <typename BatchType>
struct BatchBackend;

template <template <typename> class BatchContainer, typename Backend>
struct BatchBackend<BatchContainer<Backend>> {
using Type = Backend;
};

template <typename BatchType>
using batch_backend_t =
typename BatchBackend<std::remove_cv_t<std::remove_reference_t<BatchType>>>::Type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more generic than just sequence_operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a particular suitable file come to your head?

Copy link
Contributor

@mzient mzient May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I have mixed feelings, because it can backfire terribly. Effectively, it's just a utility that extracts the type template argument from a single-type template. Just see:

    std::mutex m;
    std::lock_guard<std::mutex> g(m);
    static_assert(std::is_same<batch_backend_t<decltype(g)>, std::mutex>::value);

If we were to make it a general utility, we could make Backend a typedef inside classes templated against Backend (Buffer, TensorXxx, Operator) and then this trait could be defined as

template <typename X>
using backend_t = typename X::Backend;

or possibly

template <typename X>
using backend_t = typename std::remove_cv_t<std::remove_reference_t<X>>::Backend;

That would be safe against accidental use on something that's not related to backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said all that, I'm fine with keeping it as is in this limited scope - perhaps this utility should even be moved inside the class, so it's not (ab)used elsewhere.

Copy link
Member Author

@stiepan stiepan May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I moved it to the type_traits.h and added enable_if_t<is_batch_container<...>> guard, this way it can be reused and not abused.

}

template <typename Type>
void setup_expanded_like(const Type &batch, Type &expanded_batch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general remark, which might be considered harmful for this PR, but I think we tend to suggest the FunctionName(outputs, inputs) order. (The camel case also, but we are totally inconsistent with it, and I also write some utils with snake case.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I even had it scribbled in my notes to change the order of arguments to match kernels/operators output/inputs, missed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, memcpy order is preferred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const auto &shape = data.shape();
TensorList<Backend> tl;
tl.ShareData(data);
void broadcast_samples(const TensorVector<Backend> &batch, TensorVector<Backend> &expanded_batch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a group comment for those overloads of broadcast_samples? I had to look up the num_expanded_samples in the tv_builder_like to see what it is used for. Also, if we expand according to the expand_extents, why do we need the num_expanded_samples?

Also, what is the format of the expand_extents - it's the shapes of the F[C] prefix, right?

Copy link
Member Author

@stiepan stiepan May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to the format: effectively it is as you say, more broadly the layout does not matter at this point: from batch (batch.num_samples() == expand_extents.num_samples()) we get expanded_batch.num_samples() == expanded_extents.num_elements() by repeating corresponding samples volume(expanded_extents[i]) times.

You are right that we could just compute the num_expanded_samples, it is just that it is already computed in the ExpandDesc (not to recompute it for validation etc.), so I pass it here as an argument. Later on there is even relevant assert in broadcast shapes (assert(num_expanded_samples == expand_extents.num_elements());)

@@ -266,19 +277,102 @@ inline TensorListShape<> unfold_outer_dims(const TensorListShape<> &shape, int n
}
}

inline TensorListShape<> broadcast_samples(const TensorListShape<> &shape, int num_expanded_samples,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe broadcast_sample_shapes? And please docstring :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



class ArgCb:
def __init__(self, name: Union[str, int], cb: Callable[[SampleDesc], np.ndarray], is_per_frame: bool, dest_device: str = "cpu"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lost the documentation of what the callback is supposed to be doing. We know it's just a single arg callback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -55,12 +110,13 @@ inline bool is_per_frame(const TensorVector<CPUBackend> &arg_input) {
* resize the outputs.
*/
template <typename Backend>
class SequenceOperator : public Operator<Backend> {
class SequenceOperator : public Operator<Backend>, public SampleBroadcasting<Backend> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can also say, that adding functionality via inheritance instead of composition is not desired, but I'm no OOP evangelist :P

Copy link
Member Author

@stiepan stiepan May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind the empty base optimization "pattern" (as in CPU case we do not add any members), but it probably makes precisely no noticeable difference here and you have a good point. On the other hand, can this hurt us in this case? Maybe I'll just follow Michał's suggestion to make it protected inheritance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got composition suggested once or twice, but I'm not asking for a rewrite, just saying it's a bit artificial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way, with inheritance having the benefit of less typing (and being already implemented).

Copy link
Member Author

@stiepan stiepan May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to protected inheritance

auto expanded_arg = ExpandArgumentLikeInput(arg_input, arg_name, input_idx);
auto expanded_handle = std::make_shared<TensorVector<CPUBackend>>(std::move(expanded_arg));
ExpandedAddArgument(arg_name, std::move(expanded_handle));
ExpandArgumentLikeInput(arg_input, ExpandedArg(arg_name), arg_name, input_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly the change from recreating the workspace every time to setting it up once and than just putting new expanded data in the tensor vector in every iteratorion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is exactly that.

[&](const auto &input) { return UnfoldBatch(input, input_desc); });
ProcessInput(ws, input_idx, [&](const auto &input) {
auto &expanded_input = ExpandedInput<detail::batch_backend_t<decltype(input)>>(input_idx);
UnfoldBatch(input, expanded_input, ref_expand_desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input_desc changed to ref_expand_desc. Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot-on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -15,6 +15,7 @@
import os
import random
import numpy as np
from typing import List, Union, Callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I am not reviewing this file too much, as it's in the other PR.

}
}

void SetupExpandedWorkspace(const workspace_t<Backend> &ws) {
Copy link
Contributor

@mzient mzient May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's only run once, how about

Suggested change
void SetupExpandedWorkspace(const workspace_t<Backend> &ws) {
void InitializeExpandedWorkspace(const workspace_t<Backend> &ws) {

?
(and in all related places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -607,6 +691,7 @@ class SequenceOperator : public Operator<Backend> {
private:
int expand_like_idx_ = -1;
bool is_expanding_ = false;
bool is_expanded_set_up_ = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded what?

Suggested change
bool is_expanded_set_up_ = false;
bool is_expanded_ws_initialized_ = false;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 209 to 210
template <typename Type>
void setup_expanded_like(const Type &batch, Type &expanded_batch) {
Copy link
Contributor

@mzient mzient May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a generic function that can work with anything - it has very specific requirements for the target type and that should be somehow reflected in the template parameter name.

Suggested change
template <typename Type>
void setup_expanded_like(const Type &batch, Type &expanded_batch) {
template <typename BatchObject>
void setup_expanded_like(const BatchObject &batch, BatchObject &expanded_batch) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 215 to 216
template <typename Type>
std::shared_ptr<Type> expanded_like(const Type &batch) {
Copy link
Contributor

@mzient mzient May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename Type>
std::shared_ptr<Type> expanded_like(const Type &batch) {
template <typename BatchObject>
std::shared_ptr<BatchObject> expanded_like(const BatchObject &batch) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


void push(const SliceView &view) {
void SetNext(const SliceView &view) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void SetNext(const SliceView &view) {
void SetNext(const SliceView &view) {
assert(size < tv_.num_samples());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

TensorVector<Backend> take() {
assert(size == tv_.num_samples());
return std::move(tv_);
int Size() {
Copy link
Contributor

@mzient mzient May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a size - that is constant and defined by tv_.num_samples(), isn't it?

Suggested change
int Size() {
int NextFrameIdx() const {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

TensorVector<Backend> tv_;
private:
TensorVector<Backend> &tv_;
int size = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int size = 0;
int next_ = 0;

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

stiepan added 11 commits May 20, 2022 16:53
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Add warp video test for named matrix param

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Enable positional arguments testing in the sequence test suite

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>

Accomodate existing per frame tests to the updated sequence util

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
This reverts commit f9be826.

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented May 23, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4911387]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4911387]: BUILD PASSED

Comment on lines +89 to +90
template <template <typename> class BatchContainer, typename Backend>
struct BatchBackend<BatchContainer<Backend>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. I won't fiight for it, but I think that adding a Backend typedef to relevant types would be a little better.

const auto &shape = batch.shape();
auto broadcast_shape = broadcast_sample_shapes(shape, num_expanded_samples, expand_extents);
expanded_batch.SetLayout(batch.GetLayout());
expanded_batch.Resize(broadcast_shape, batch.type());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expanded_batch.Resize(broadcast_shape, batch.type());
expanded_batch.set_order(stream);
expanded_batch.Resize(broadcast_shape, batch.type());

Now that we deal with GPU stuff...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 29 to 40
template <typename BatchType>
struct BatchBackend;

template <template <typename> class BatchContainer, typename Backend>
struct BatchBackend<BatchContainer<Backend>> {
using Type = Backend;
};

template <typename BatchType>
using batch_backend_t =
typename BatchBackend<std::remove_cv_t<std::remove_reference_t<BatchType>>>::Type;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate - please remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
done

this->TestUnfolding(*expanded_batch, batch, 2);
batch.Resize({{2, 2, 6}, {13, 4, 11}}, DALI_FLOAT);
batch.SetLayout("XYZ");
this->TestUnfolding(*expanded_batch, batch, 2);
Copy link
Contributor

@mzient mzient May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this->TestUnfolding(*expanded_batch, batch, 2);
this->TestUnfolding(*expanded_batch, batch, 1);
this->TestUnfolding(*expanded_batch, batch, 2);
this->TestUnfolding(*expanded_batch, batch, 3);

Can you also check unfolding of just one dim? I think these tests are cheap and unfolding 1 dim is where we want the best possible coverage.
I also suggest testing the corner case when we unfold down to scalars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a separate test case that just unfolds 1 dim (and the same for all three). I added two cases with multiple iters that test 1, 2 and 3 unfolds in a single test case between resizes.

this->TestUnfolding(*expanded_batch, batch, 2);
batch.Resize({{2, 2, 6}, {13, 4, 11}, {13, 4, 11}, {13, 4, 11}}, DALI_FLOAT);
batch.SetLayout("XYZ");
this->TestUnfolding(*expanded_batch, batch, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

batch.Resize({{2, 2, 6}, {13, 4, 11}, {13, 4, 11}, {13, 4, 11}}, DALI_FLOAT);
batch.SetLayout("XYZ");
this->template FillBatch<float>(batch, 7);
this->template TestBroadcasting<float>(*expanded_batch, batch, {{}, {}, {}, {}});
Copy link
Contributor

@mzient mzient May 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test broadcasting of scalars.

batch.Resize({{}, {}, {}, {}}});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented May 29, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4962767]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4962767]: BUILD FAILED

@stiepan
Copy link
Member Author

stiepan commented May 30, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4965559]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [4965559]: BUILD PASSED

@stiepan stiepan merged commit c30da52 into NVIDIA:main May 30, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
* Enable sequence support and per-frame arguments for warp
* Add support for broadcasting of accompanying non-sequence inputs
* Do not clear the expanded workspace between iterations

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants