Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix layout broadcasting arithm ops #4951

Merged
merged 7 commits into from
Jul 25, 2023

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Jul 20, 2023

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Without this change, it is possible to produce tensors whose layout does not match their dimensionality.

The arithm ops layout propagation works by picking just any non-empty layout and making sure that any other operands have either empty or equal layout. This leads to two issues:

  • layout of wrong length: if more ndim-ed tensor has no layout and the less ndim-ed tensor does, the output ends up with layout of incorrect lengt. Take a of shape (1, 2, 3, 4) and empty layout, and b of shape (4,) and layout "C". The output ends up to have 4dims and layout "C".
  • errors when the layout match but are not equal, for example FHWC and C.

The fix is to 1. check if the picked layout is not too short and if so, discard it, 2. allow for different layouts as long as one if suffix of the other.

Additional information:

It would be nice to follow-up the layout propagation with more involved mechanism that would:

  1. not discard partial information, i.e. when bigger dim-ed tensor does not have layout, but the samller dim-ed has, we should retain the information about layout suffix. Possibly with "*" wildcards in the layout (that would have consitently special treatment in equality).
  2. generalize the matching from common suffix to one of the layouts being subsequence of the other.

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3556

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jul 20, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9043057]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9043057]: BUILD FAILED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jul 20, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9048568]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9048568]: BUILD PASSED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* @brief Checks if the child is a (possibly improper) suffix of the parent.
* The child must not be longer than the parent.
*/
inline bool HasSuffix(const TensorLayout &parent, const TensorLayout &child) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More common name: EndsWith

Suggested change
inline bool HasSuffix(const TensorLayout &parent, const TensorLayout &child) {
inline bool EndsWith(const TensorLayout &parent, const TensorLayout &child) {

Copy link
Member Author

@stiepan stiepan Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, nice name.

* The child must not be longer than the parent.
*/
inline bool HasSuffix(const TensorLayout &parent, const TensorLayout &child) {
assert(parent.size() >= child.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary and actually harmful. Obviously a longer string is not a suffix of a shorter one.

Suggested change
assert(parent.size() >= child.size());
if (parent.size() < child.size())
return false;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jul 21, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9058724]: BUILD STARTED

if (!types_layout_inferred_) {
result_type_id_ = PropagateTypes<Backend>(*expr_, ws);
result_layout_ = GetCommonLayout<Backend>(*expr_, ws);
// sample_ndim is assumed to be constant between iterations.
if (result_layout_.size() != result_shape_.sample_dim()) {
// If the largest dimentional tensors do not have layout set but some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the largest dimentional tensors do not have layout set but some
// If the tensors with the most dimensions do not have a layout set but some

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -134,7 +134,7 @@ TEST(ArithmeticOpsTest, TreePropagationLayoutError) {
ptr->Resize({{1}, {2}}, DALI_INT32);
}
in[0]->SetLayout(TensorLayout());
in[1]->SetLayout(TensorLayout("HW"));
in[1]->SetLayout(TensorLayout("DH"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should test both cases - the old one should not throw, but instead extend the layout to "DHW".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tests for that in python though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the case.

/**
* @brief Checks if the child is a (possibly improper) suffix of the parent.
*/
inline bool EndsWith(const TensorLayout &parent, const TensorLayout &child) {
Copy link
Contributor

@mzient mzient Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:
parent and child don't capture the relation of the arguments well.

Suggested change
inline bool EndsWith(const TensorLayout &parent, const TensorLayout &child) {
inline bool EndsWith(const TensorLayout &what, const TensorLayout &with) {

or

Suggested change
inline bool EndsWith(const TensorLayout &parent, const TensorLayout &child) {
inline bool EndsWith(const TensorLayout &whole, const TensorLayout &suffix) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@stiepan
Copy link
Member Author

stiepan commented Jul 21, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9059365]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9059365]: BUILD PASSED

@stiepan stiepan merged commit 7b954bf into NVIDIA:main Jul 25, 2023
5 checks passed
JanuszL pushed a commit to JanuszL/DALI that referenced this pull request Oct 13, 2023
Fix layout broadcasting in arithm ops
* Allow layouts of different lenghts as long as they share common suffix
* Use empty layout if the inferred one is too short

---------
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants