Reductions: min, max #2342

awolant · 2020-10-09T12:17:55Z

Why we need this PR?

It adds new feature needed because of TLT

What happened in this PR?

What solution was applied:
Added CPU operator for reductions: sum, min, max
Affected modules and functionalities:
Operators, Kernels
Key points relevant for the review:
Operator implementation
Validation and testing:
Added python test to compare against NumPy
Documentation (including examples):
Added doc strings

JIRA TASK: [DALI-1621]

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-10-09T12:27:23Z

!build

dali-automaton · 2020-10-09T12:30:32Z

CI MESSAGE: [1688761]: BUILD STARTED

dali/operators/generic/reduce/reduce.cc

mzient · 2020-10-09T12:33:04Z

dali/operators/generic/reduce/reduce.cc

+
+DALI_SCHEMA(ReduceBase)
+  .AddOptionalArg(
+    "axes",


Also support "axis_names", perhaps?

For sure. How about doing that in next PR? This is nice extension of the API, but right now this op can already be useful as it is.

Fine with me. There are already several operators using it (SliceAttr family, Erase,...). Maybe it'd be good to make a PR that extracts that to AxesAttr or something like this, so we reuse the implementation and the arg documentation.

dali/operators/generic/reduce/reduce.h

dali/kernels/reduce/reduce_cpu.h

dali/kernels/reduce/reductions.h

dali/operators/generic/reduce/reduce.cc

dali/operators/generic/reduce/reduce.h

dali/test/python/test_operator_reduce.py

dali-automaton · 2020-10-09T16:37:45Z

CI MESSAGE: [1688761]: BUILD FAILED

dali-automaton · 2020-10-09T19:26:52Z

CI MESSAGE: [1688761]: BUILD PASSED

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-10-10T09:38:51Z

!build

dali-automaton · 2020-10-10T09:40:47Z

CI MESSAGE: [1691238]: BUILD STARTED

dali/test/python/test_operator_reduce.py

dali/operators/generic/reduce/reduce.h

dali-automaton · 2020-10-10T12:40:39Z

CI MESSAGE: [1691238]: BUILD FAILED

dali/operators/generic/reduce/reduce.cc

Signed-off-by: Albert Wolant <awolant@nvidia.com>

JanuszL · 2020-10-12T12:04:10Z

dali/test/python/test_operator_reduce.py

+    pipe = Pipeline(batch_size=batch_size, num_threads=4, device_id=0)
+
+    with pipe:
+        input = fn.external_source(source = get_batch)


Can't you just:

Suggested change

input = fn.external_source(source = get_batch)

input = fn.external_source(source = batch_fn)

No. ExternalSource API for now does not work with some callables. I tried method and partial and it failed no a check inside. I think it could be reworked but not in this PR. For now I just wrapped it with ad hoc regular function so it works.

I put a comment about it.

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-10-12T12:52:32Z

!build

dali-automaton · 2020-10-12T12:55:48Z

CI MESSAGE: [1693996]: BUILD STARTED

jantonguirao · 2020-10-12T13:09:01Z

dali/kernels/reduce/reduce_gpu.cu

+template class SumGPU<uint16_t, uint16_t>;
+template class SumGPU<uint8_t, uint8_t>;
+
+


One crazy idea, if we plan to have many reductions that follow the same pattern we could have a:

#define REDUCTION_IMPL(Kernel, Impl) \ template <typename Out, typename In> \ class Kernel<Out, In>::Impl : public Impl<Out, In> { \ ...

Basically it'd cover all the repeated code (including template instantiation) and later you just do:

REDUCTION_IMPL(MinGPU, reduce_impl::MinImplGPU); REDUCTION_IMPL(MaxGPU, reduce_impl::MaxImplGPU); ...

I have mixed feeling about having so much inside a macro, but on the other hand, now there's a lot of boiler-plate here. Second opinion maybe? @mzient ?

dali-automaton · 2020-10-12T14:16:09Z

CI MESSAGE: [1693996]: BUILD PASSED

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-10-13T11:22:33Z

!build

dali-automaton · 2020-10-13T11:25:37Z

CI MESSAGE: [1697300]: BUILD STARTED

mzient · 2020-10-13T12:18:15Z

dali/kernels/reduce/reduce_gpu_impl.cuh

@@ -1263,6 +1263,18 @@ class SumImplGPU : public ReduceImplGPU<Out, In, default_sum_acc_t<Out, In>, Sum
  reductions::sum GetReduction() const { return {}; }
 };

+template <typename Out, typename In>
+class MinImplGPU : public ReduceImplGPU<Out, In, default_sum_acc_t<Out, In>, MinImplGPU<Out, In>> {


Suggested change

class MinImplGPU : public ReduceImplGPU<Out, In, default_sum_acc_t<Out, In>, MinImplGPU<Out, In>> {

class MinImplGPU : public ReduceImplGPU<Out, In, In>, MinImplGPU<Out, In>> {

dali/kernels/reduce/reduce_gpu_impl.cuh

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-10-13T12:48:30Z

!build

dali-automaton · 2020-10-13T12:59:22Z

CI MESSAGE: [1697500]: BUILD STARTED

dali-automaton · 2020-10-13T14:42:33Z

CI MESSAGE: [1697500]: BUILD PASSED

dali-automaton · 2020-10-13T16:00:04Z

CI MESSAGE: [1697500]: BUILD PASSED

awolant added 7 commits October 5, 2020 16:40

Add base class

11cf527

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Simple implementation

a56bfa2

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Type support

d3c6269

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Kernel Impl

ac18005

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Dims and threads

bb2c217

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Add tests

871df9b

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix lint

36060e4

Signed-off-by: Albert Wolant <awolant@nvidia.com>