Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min, max and clamp arithmetic ops #2298

Merged
merged 21 commits into from
Oct 7, 2020
Merged

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Sep 24, 2020

Why we need this PR?

Add min(a, b), max(a, b), clamp(v, lo, hi) as arithmetic ops.

Todo:

  • add tests with 0-dim inputs
  • docs

What happened in this PR?

  • What solution was applied:
    Added support for ternary operators + their type/value switching.
  • Affected modules and functionalities:
    Arithm Ops
  • Key points relevant for the review:
    Where to put named arithm op functions in DALI Python package?
  • Validation and testing:
    Nosetest that doesn't end in L1, probably should limit that
  • Documentation (including examples):
    TODO

JIRA TASK: [DALI-1628]

Comment on lines 955 to 962
def min(left, right):
return _arithm_op("min", left, right)

def max(left, right):
return _arithm_op("max", left, right)

def clamp(value, lo, hi):
return _arithm_op("clamp", value, lo, hi)
Copy link
Contributor

@mzient mzient Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe they should land in dali.fn? Both the usage mode and lowercase naming would play better with dali.fn.

Besides the module:

Suggested change
def min(left, right):
return _arithm_op("min", left, right)
def max(left, right):
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
return _arithm_op("clamp", value, lo, hi)
def min(left, right):
"""Fills the output with minima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("min", left, right)
def max(left, right):
"""Fills the output with maxima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
"""Produces a tensor of values from ``value`` clamped to the range ``lo``..``hi``."""
return _arithm_op("clamp", value, lo, hi)

@@ -115,8 +115,8 @@ DLL_PUBLIC DALIDataType PropagateTypes(ExprNode &expr, const workspace_t<Backend
}
auto &func = dynamic_cast<ExprFunc &>(expr);
int subexpression_count = func.GetSubexpressionCount();
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2,
"Only unary and binary expressions are supported");
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,
DALI_ENFORCE(0 < subexpression_count && subexpression_count <= kMaxArity,

To be consistent with L340.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -193,8 +193,8 @@ DLL_PUBLIC inline const TensorListShape<> &PropagateShapes(ExprNode &expr,
}
auto &func = dynamic_cast<ExprFunc &>(expr);
int subexpression_count = expr.GetSubexpressionCount();
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2,
"Only unary and binary expressions are supported");
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} else {
DALI_FAIL("Expression cannot have three scalar operands");
}
), DALI_FAIL("No suitable type found");); // NOLINT(whitespace/parens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you print type as well and tell which argument is the faulty one?

Comment on lines 429 to 433
auto v_ = static_cast<result_t<T, Min, Max>>(v);
auto lo_ = static_cast<result_t<T, Min, Max>>(lo);
auto hi_ = static_cast<result_t<T, Min, Max>>(hi);
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_;
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_;
Copy link
Contributor

@mzient mzient Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto v_ = static_cast<result_t<T, Min, Max>>(v);
auto lo_ = static_cast<result_t<T, Min, Max>>(lo);
auto hi_ = static_cast<result_t<T, Min, Max>>(hi);
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_;
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_;
return clamp<result_t<T, Min, Max>>(v, lo, hi);

dali/core/math_util.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

CUDA_CALL(cudaEventElapsedTime(&time, start, end));
std::cerr << "Elapsed Time: " << time << " s\n";

// time *= (1e+6f / kIters); // convert to nanoseconds / 100 samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove all of the profiling before posting final version.

@@ -195,6 +258,18 @@ using ExprImplGpuCT = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, fa
template <ArithmeticOp op, typename Result, typename Left, typename Right>
using ExprImplGpuTC = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, true, false>>;

// template <ArithmeticOp op, typename Result, typename First, typename Second, typename Third,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@klecki klecki force-pushed the ternary-ops branch 2 times, most recently from 2a3a457 to 2d2ecac Compare October 6, 2020 16:30
@klecki
Copy link
Contributor Author

klecki commented Oct 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1680060]: BUILD STARTED

@klecki
Copy link
Contributor Author

klecki commented Oct 6, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1680100]: BUILD STARTED

import nvidia.dali.ops
# Fully circular imports don't work. We need to import _arithm_op late and
# replace this trampoline function.
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the following simpler - but it's a matter of taste, I guess.

Suggested change
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op)
global _arithm_op
_arithm_op = nvidia.dali.ops._arithm_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied your code from data_node.py :)

numpy_in = get_numpy_input(dali_in, kinds[i], dali_in.dtype.type, target_type if target_type is not None else dali_in.dtype.type)
inputs.append(numpy_in)
out = as_cpu(pipe_out[arity]).at(sample_id)
return tuple(inputs) + (out,)
Copy link
Contributor

@mzient mzient Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More Pythonic? ;)

Suggested change
return tuple(inputs) + (out,)
return (*inputs, out)

Comment on lines +136 to +139
auto output = static_cast<Result *>(tile.output);
const void *first = tile.args[0];
const void *second = tile.args[1];
const void *third = tile.args[2];
Copy link
Contributor

@mzient mzient Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto output = static_cast<Result *>(tile.output);
const void *first = tile.args[0];
const void *second = tile.args[1];
const void *third = tile.args[2];
auto *__restrict__ output = static_cast<Result *>(tile.output);
const void *__restrict__ first = tile.args[0];
const void *__restrict__ second = tile.args[1];
const void *__restrict__ third = tile.args[2];

Just today I saw adding __restrict__ outperform caching in shared memory (in another kernel). The speedup was 1.7x

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1680100]: BUILD FAILED

* based on `as_ptr`
*/
template <bool as_ptr, typename T>
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void *__restrict__ ptr, DALIDataType type_id) {

template <bool as_ptr, typename T>
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
TYPE_SWITCH(type_id, type2id, AccessType, ARITHMETIC_ALLOWED_TYPES, (
const auto *access = reinterpret_cast<const AccessType*>(ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto *access = reinterpret_cast<const AccessType*>(ptr);
const auto *__restrict__ access = reinterpret_cast<const AccessType*>(ptr);

}

template <typename T>
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) {
DALI_HOST_DEV T Access(const T* __restrict__ ptr, int64_t idx) {

}

template <typename T>
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) {
DALI_HOST_DEV T Access(const void* __restrict__ ptr, int64_t idx, DALIDataType type_id) {

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there could be some gain from doing this:
const ExtendedTileDesc *__restrict__ tiles
the tile is read many times, so making it cacheable is desireable. There is some type-punning going on, so the compiler might be quite conservative here.

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try restrict, but it's optional.

@@ -0,0 +1,57 @@
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}
DALIDataType result = BinaryTypePromotion(types[0], types[1]);
if (types.size() > 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed 👍

DALI_HOST_DEV static constexpr result_t<L, R> impl(L l, R r) {
auto l_ = static_cast<result_t<L, R>>(l);
auto r_ = static_cast<result_t<L, R>>(r);
return l_ <= r_ ? l_ : r_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return l_ <= r_ ? l_ : r_;
return l_ < r_ ? l_ : r_;

This would work as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will change

@@ -0,0 +1,24 @@

// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,25 @@

// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

docs/math.rst Outdated
.. note::
Type promotion is commutative.

For more than two arguments, the resulting type is calculated as reduction from left to right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more than two arguments, the resulting type is calculated as reduction from left to right
For more than two arguments, the resulting type is calculated as a reduction from left to right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

docs/math.rst Outdated

For more than two arguments, the resulting type is calculated as reduction from left to right
- first calculating the result of operating on first two arguments, next between that intermediate
result and thirs argument and so on, untill we have only the result type left.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result and thirs argument and so on, untill we have only the result type left.
result and the third argument and so on, until we have only the result type left.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

docs/math.rst Outdated
Similarly to arithmetic expressions, one can use selected mathematical functions in the Pipeline
graph definition. They also accept :class:`nvidia.dali.pipeline.DataNode`,
:meth:`nvidia.dali.types.Constant` or regular Python value of type ``bool``, ``int``, or ``float``
as arguments. At least one of the inputs must be output of other DALI Operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as arguments. At least one of the inputs must be output of other DALI Operator.
as arguments. At least one of the inputs must be the output of other DALI Operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

from invoking other operators. Full documentation can be found in the dedicated documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from invoking other operators. Full documentation can be found in the dedicated documentation
from invoking other operators. Full documentation can be found in the dedicated section of the documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from invoking other operators. Full documentation can be found in the dedicated documentation
for :ref:`mathematical expressions`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for :ref:`mathematical expressions`.
:ref:`mathematical expressions`.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
There appears to be an issue with the clamp tests

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
docs/supported_ops.rst Outdated Show resolved Hide resolved
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Oct 7, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1682623]: BUILD STARTED

The same behaviour for invalid ranges

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Oct 7, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1682793]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1682623]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1682793]: BUILD PASSED

@klecki klecki merged commit 6c5d611 into NVIDIA:master Oct 7, 2020
@klecki klecki deleted the ternary-ops branch October 7, 2020 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants