Separable convolution #2009

klecki · 2020-06-08T18:20:56Z

Why we need this PR?

Adds separable convolution using Convolution Kernel, passing over all axes.

What happened in this PR?

What solution was applied:
SeparableConvolution kernel build by using several passes of ConvolutionKernel
Affected modules and functionalities:
Kernels, kernels tests
Key points relevant for the review:
Nothing fancy, mostly boilerplate
Validation and testing:
Gtest for kernel, baseline implementation moved to separate file.
Documentation (including examples):
[ Describe here if documentation and examples were updated. ]

JIRA TASK: [DALI-1425]

review-notebook-app · 2020-06-08T18:21:03Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

klecki · 2020-06-15T14:27:41Z

!build

dali-automaton · 2020-06-15T14:30:39Z

CI MESSAGE: [1395861]: BUILD STARTED

Wraps Convolution CPU kernel Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-06-15T14:34:06Z

!build

dali-automaton · 2020-06-15T14:35:46Z

CI MESSAGE: [1395876]: BUILD STARTED

dali-automaton · 2020-06-15T15:06:07Z

CI MESSAGE: [1395928]: BUILD STARTED

mzient · 2020-06-15T15:08:02Z

dali/kernels/imgproc/convolution/separable_convolution_cpu.h

+  void Run(KernelContext& ctx, const TensorView<StorageCPU, Out, ndim> out,
+           const TensorView<StorageCPU, const In, ndim>& in,
+           const std::array<TensorView<StorageCPU, const W, 1>, axes>& windows,
+           const std::array<W, axes>& scales = uniform_array<axes, W>(1.f)) {


Any real life use case for per-axis scale? They will be multiplied anyway.

Not really, got carried away, will simplify.

It's still there - simply missed or intentional?

Missed, now it's here

mzient · 2020-06-15T15:18:59Z

dali/kernels/imgproc/convolution/separable_convolution_cpu.h

+    KernelRequirements req;
+
+    ScratchpadEstimator se;
+    se.add<W>(AllocType::Host, volume(in_shape));


Why do you want the intermediate image to be of type W? What if you have float input/output and integer weights?

Also, you don't need this buffer when your intermediate element type is the same as Out.

Why do you want the intermediate image to be of type W? What if you have float input/output and integer weights?

For now I assume that W is float. It can be generalized as well, and we can parametrize every possible step here with configurable type. Do you want me to do it?

Also, you don't need this buffer when your intermediate element type is the same as Out.

Yes, I don't need it, but the in-place for first step will be slower, and for everything other than W==Out it's still needed so I just opted for that.

As for the intermediate data, maybe it would indeed be better to just store the result of arithmetic operation. Will do that.

mzient · 2020-06-15T15:23:38Z

dali/kernels/imgproc/convolution/separable_convolution_cpu.h

+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 1, false>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 1, false> {};
+
+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 2, true>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 1, true> {};
+
+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 2, false>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 2, false> {};
+
+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 3, true>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 2, true> {};
+
+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 3, false>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 3, false> {};
+
+template <typename Out, typename In, typename W>
+struct SeparableConvolutionCpu<Out, In, W, 4, true>
+    : public SeparableConvolutionCpuImpl<Out, In, W, 3, true> {};


What is it for? Why not just rename XxxImpl to Xxx?

So the Kernel is parametrized with the number of actual dimensions and not only data dimensions + bool for the channels.

It's a personal opinion, but I find it harder to use this way - I mean, I find it more intuitive to parameterize 2D convolution with channels and without channels with <2>, not <2> or <3>. From the operator standpoint, you still need to handle it explicitly, either in value switch or an if, so there's no difference.

Removed, now it's less code which has some benefits.

mzient · 2020-06-15T15:28:12Z

dali/kernels/imgproc/convolution/separable_convolution_cpu.h

+ * TODO(klecki): For more dimension, fusing a permute step when writing the result
+ * could allow for processing all steps with innermost, contiguous dimension.
+ * For example DHWC->DWHC->HWDC->DHWC, while applying convolutions for W, H, D respectively.


I wouldn't mark it as TODO - it doesn't seem like a good idea to transpose on the fly; when we employ some automatic vectorization, it will most certainly be defeated by transposition.

mzient · 2020-06-15T15:32:08Z

dali/kernels/imgproc/convolution/separable_convolution_cpu.h

+    KernelRequirements req;
+
+    ScratchpadEstimator se;
+    se.add<W>(AllocType::Host, volume(in_shape));


Likewise - you may get by without this buffer.

I know, but probably I would get a comment that it can be faster. I left it same for all variants for simplicity as well.

include/dali/core/util.h

dali-automaton · 2020-06-15T17:02:54Z

CI MESSAGE: [1395876]: BUILD PASSED

dali-automaton · 2020-06-15T17:16:24Z

CI MESSAGE: [1395928]: BUILD PASSED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-06-18T17:03:52Z

!build

dali-automaton · 2020-06-18T17:05:25Z

CI MESSAGE: [1406047]: BUILD STARTED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

dali-automaton · 2020-06-18T18:37:54Z

CI MESSAGE: [1406047]: BUILD PASSED

klecki · 2020-06-18T18:46:22Z

!build

dali-automaton · 2020-06-18T18:55:18Z

CI MESSAGE: [1406394]: BUILD STARTED

dali-automaton · 2020-06-18T19:03:55Z

CI MESSAGE: [1406394]: BUILD FAILED

klecki · 2020-06-19T08:05:41Z

!build

dali-automaton · 2020-06-19T08:10:40Z

CI MESSAGE: [1408269]: BUILD STARTED

dali-automaton · 2020-06-19T10:19:44Z

CI MESSAGE: [1408269]: BUILD PASSED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-06-19T10:41:18Z

!build

dali-automaton · 2020-06-19T10:45:23Z

CI MESSAGE: [1408534]: BUILD STARTED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-06-19T10:59:37Z

!build

dali-automaton · 2020-06-19T11:05:27Z

CI MESSAGE: [1408572]: BUILD STARTED

dali-automaton · 2020-06-19T14:05:28Z

CI MESSAGE: [1408572]: BUILD PASSED

klecki force-pushed the separable-convolution branch 2 times, most recently from 398df59 to 206941e Compare June 15, 2020 14:26

klecki marked this pull request as ready for review June 15, 2020 14:27

Add SeparableConvolution CPU kernel

3ee9b3e

Wraps Convolution CPU kernel Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki force-pushed the separable-convolution branch from 206941e to 3ee9b3e Compare June 15, 2020 14:33

mzient reviewed Jun 15, 2020

View reviewed changes

JanuszL reviewed Jun 15, 2020

View reviewed changes

include/dali/core/util.h Show resolved Hide resolved

JanuszL approved these changes Jun 16, 2020

View reviewed changes

Review adjustments

f905d27

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Remove per axis scales

5c78d2d

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

ndim -> axes for kernel

d21d6b7

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Remove old declaration

466e75c

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

mzient approved these changes Jun 19, 2020

View reviewed changes

klecki merged commit d2ebb14 into NVIDIA:master Jun 19, 2020

klecki deleted the separable-convolution branch June 19, 2020 15:14

Separable convolution #2009

Separable convolution #2009

Conversation

klecki commented Jun 8, 2020 • edited Loading

Why we need this PR?

What happened in this PR?

review-notebook-app bot commented Jun 8, 2020

klecki commented Jun 15, 2020

dali-automaton commented Jun 15, 2020

klecki commented Jun 15, 2020

dali-automaton commented Jun 15, 2020

dali-automaton commented Jun 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Jun 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Jun 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jun 15, 2020

dali-automaton commented Jun 15, 2020

klecki commented Jun 18, 2020

dali-automaton commented Jun 18, 2020

dali-automaton commented Jun 18, 2020

klecki commented Jun 18, 2020

dali-automaton commented Jun 18, 2020

dali-automaton commented Jun 18, 2020

klecki commented Jun 19, 2020

dali-automaton commented Jun 19, 2020

dali-automaton commented Jun 19, 2020

klecki commented Jun 19, 2020

dali-automaton commented Jun 19, 2020

klecki commented Jun 19, 2020

dali-automaton commented Jun 19, 2020

dali-automaton commented Jun 19, 2020

klecki commented Jun 8, 2020 •

edited

Loading

mzient Jun 15, 2020 •

edited

Loading

mzient Jun 15, 2020 •

edited

Loading