-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add min, max and clamp arithmetic ops #2298
Conversation
dali/python/nvidia/dali/ops.py
Outdated
def min(left, right): | ||
return _arithm_op("min", left, right) | ||
|
||
def max(left, right): | ||
return _arithm_op("max", left, right) | ||
|
||
def clamp(value, lo, hi): | ||
return _arithm_op("clamp", value, lo, hi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe they should land in dali.fn? Both the usage mode and lowercase naming would play better with dali.fn.
Besides the module:
def min(left, right): | |
return _arithm_op("min", left, right) | |
def max(left, right): | |
return _arithm_op("max", left, right) | |
def clamp(value, lo, hi): | |
return _arithm_op("clamp", value, lo, hi) | |
def min(left, right): | |
"""Fills the output with minima of corresponding values in ``left`` and ``rigt``""" | |
return _arithm_op("min", left, right) | |
def max(left, right): | |
"""Fills the output with maxima of corresponding values in ``left`` and ``rigt``""" | |
return _arithm_op("max", left, right) | |
def clamp(value, lo, hi): | |
"""Produces a tensor of values from ``value`` clamped to the range ``lo``..``hi``.""" | |
return _arithm_op("clamp", value, lo, hi) |
@@ -115,8 +115,8 @@ DLL_PUBLIC DALIDataType PropagateTypes(ExprNode &expr, const workspace_t<Backend | |||
} | |||
auto &func = dynamic_cast<ExprFunc &>(expr); | |||
int subexpression_count = func.GetSubexpressionCount(); | |||
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2, | |||
"Only unary and binary expressions are supported"); | |||
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, | |
DALI_ENFORCE(0 < subexpression_count && subexpression_count <= kMaxArity, |
To be consistent with L340.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -193,8 +193,8 @@ DLL_PUBLIC inline const TensorListShape<> &PropagateShapes(ExprNode &expr, | |||
} | |||
auto &func = dynamic_cast<ExprFunc &>(expr); | |||
int subexpression_count = expr.GetSubexpressionCount(); | |||
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2, | |||
"Only unary and binary expressions are supported"); | |||
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} else { | ||
DALI_FAIL("Expression cannot have three scalar operands"); | ||
} | ||
), DALI_FAIL("No suitable type found");); // NOLINT(whitespace/parens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you print type as well and tell which argument is the faulty one?
dali/operators/math/expressions/expression_factory_instances/expression_impl_factory.h
Outdated
Show resolved
Hide resolved
auto v_ = static_cast<result_t<T, Min, Max>>(v); | ||
auto lo_ = static_cast<result_t<T, Min, Max>>(lo); | ||
auto hi_ = static_cast<result_t<T, Min, Max>>(hi); | ||
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_; | ||
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto v_ = static_cast<result_t<T, Min, Max>>(v); | |
auto lo_ = static_cast<result_t<T, Min, Max>>(lo); | |
auto hi_ = static_cast<result_t<T, Min, Max>>(hi); | |
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_; | |
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_; | |
return clamp<result_t<T, Min, Max>>(v, lo, hi); |
dali/core/math_util.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
CUDA_CALL(cudaEventElapsedTime(&time, start, end)); | ||
std::cerr << "Elapsed Time: " << time << " s\n"; | ||
|
||
// time *= (1e+6f / kIters); // convert to nanoseconds / 100 samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove all of the profiling before posting final version.
@@ -195,6 +258,18 @@ using ExprImplGpuCT = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, fa | |||
template <ArithmeticOp op, typename Result, typename Left, typename Right> | |||
using ExprImplGpuTC = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, true, false>>; | |||
|
|||
// template <ArithmeticOp op, typename Result, typename First, typename Second, typename Third, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
2a3a457
to
2d2ecac
Compare
!build |
CI MESSAGE: [1680060]: BUILD STARTED |
!build |
CI MESSAGE: [1680100]: BUILD STARTED |
import nvidia.dali.ops | ||
# Fully circular imports don't work. We need to import _arithm_op late and | ||
# replace this trampoline function. | ||
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the following simpler - but it's a matter of taste, I guess.
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op) | |
global _arithm_op | |
_arithm_op = nvidia.dali.ops._arithm_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied your code from data_node.py
:)
numpy_in = get_numpy_input(dali_in, kinds[i], dali_in.dtype.type, target_type if target_type is not None else dali_in.dtype.type) | ||
inputs.append(numpy_in) | ||
out = as_cpu(pipe_out[arity]).at(sample_id) | ||
return tuple(inputs) + (out,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More Pythonic? ;)
return tuple(inputs) + (out,) | |
return (*inputs, out) |
auto output = static_cast<Result *>(tile.output); | ||
const void *first = tile.args[0]; | ||
const void *second = tile.args[1]; | ||
const void *third = tile.args[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto output = static_cast<Result *>(tile.output); | |
const void *first = tile.args[0]; | |
const void *second = tile.args[1]; | |
const void *third = tile.args[2]; | |
auto *__restrict__ output = static_cast<Result *>(tile.output); | |
const void *__restrict__ first = tile.args[0]; | |
const void *__restrict__ second = tile.args[1]; | |
const void *__restrict__ third = tile.args[2]; |
Just today I saw adding __restrict__
outperform caching in shared memory (in another kernel). The speedup was 1.7x
CI MESSAGE: [1680100]: BUILD FAILED |
* based on `as_ptr` | ||
*/ | ||
template <bool as_ptr, typename T> | ||
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { | |
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void *__restrict__ ptr, DALIDataType type_id) { |
template <bool as_ptr, typename T> | ||
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { | ||
TYPE_SWITCH(type_id, type2id, AccessType, ARITHMETIC_ALLOWED_TYPES, ( | ||
const auto *access = reinterpret_cast<const AccessType*>(ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto *access = reinterpret_cast<const AccessType*>(ptr); | |
const auto *__restrict__ access = reinterpret_cast<const AccessType*>(ptr); |
} | ||
|
||
template <typename T> | ||
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) { | |
DALI_HOST_DEV T Access(const T* __restrict__ ptr, int64_t idx) { |
} | ||
|
||
template <typename T> | ||
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) { | |
DALI_HOST_DEV T Access(const void* __restrict__ ptr, int64_t idx, DALIDataType type_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there could be some gain from doing this:
const ExtendedTileDesc *__restrict__ tiles
the tile is read many times, so making it cacheable is desireable. There is some type-punning going on, so the compiler might be quite conservative here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try restrict, but it's optional.
@@ -0,0 +1,57 @@ | |||
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
DALIDataType result = BinaryTypePromotion(types[0], types[1]); | ||
if (types.size() > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if
is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed 👍
DALI_HOST_DEV static constexpr result_t<L, R> impl(L l, R r) { | ||
auto l_ = static_cast<result_t<L, R>>(l); | ||
auto r_ = static_cast<result_t<L, R>>(r); | ||
return l_ <= r_ ? l_ : r_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return l_ <= r_ ? l_ : r_; | |
return l_ < r_ ? l_ : r_; |
This would work as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will change
@@ -0,0 +1,24 @@ | |||
|
|||
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,25 @@ | |||
|
|||
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/math.rst
Outdated
.. note:: | ||
Type promotion is commutative. | ||
|
||
For more than two arguments, the resulting type is calculated as reduction from left to right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more than two arguments, the resulting type is calculated as reduction from left to right | |
For more than two arguments, the resulting type is calculated as a reduction from left to right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/math.rst
Outdated
|
||
For more than two arguments, the resulting type is calculated as reduction from left to right | ||
- first calculating the result of operating on first two arguments, next between that intermediate | ||
result and thirs argument and so on, untill we have only the result type left. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result and thirs argument and so on, untill we have only the result type left. | |
result and the third argument and so on, until we have only the result type left. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/math.rst
Outdated
Similarly to arithmetic expressions, one can use selected mathematical functions in the Pipeline | ||
graph definition. They also accept :class:`nvidia.dali.pipeline.DataNode`, | ||
:meth:`nvidia.dali.types.Constant` or regular Python value of type ``bool``, ``int``, or ``float`` | ||
as arguments. At least one of the inputs must be output of other DALI Operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as arguments. At least one of the inputs must be output of other DALI Operator. | |
as arguments. At least one of the inputs must be the output of other DALI Operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/supported_ops.rst
Outdated
from invoking other operators. Full documentation can be found in the dedicated documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from invoking other operators. Full documentation can be found in the dedicated documentation | |
from invoking other operators. Full documentation can be found in the dedicated section of the documentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/supported_ops.rst
Outdated
from invoking other operators. Full documentation can be found in the dedicated documentation | ||
for :ref:`mathematical expressions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for :ref:`mathematical expressions`. | |
:ref:`mathematical expressions`. |
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
dali/operators/math/expressions/expression_factory_instances/expression_impl_factory.h
Show resolved
Hide resolved
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [1682623]: BUILD STARTED |
The same behaviour for invalid ranges Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
!build |
CI MESSAGE: [1682793]: BUILD STARTED |
CI MESSAGE: [1682623]: BUILD FAILED |
CI MESSAGE: [1682793]: BUILD PASSED |
Why we need this PR?
Add min(a, b), max(a, b), clamp(v, lo, hi) as arithmetic ops.
Todo:
What happened in this PR?
Added support for ternary operators + their type/value switching.
Arithm Ops
Where to put named arithm op functions in DALI Python package?
Nosetest that doesn't end in L1, probably should limit that
TODO
JIRA TASK: [DALI-1628]