Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NewWarpAffine -> WarpAffine; optimize CPU warp for affine mapping. #1387

Merged
merged 5 commits into from
Oct 18, 2019

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Oct 15, 2019

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Why we need this PR?

  • It replaces old WarpAffine with new implementation

What happened in this PR?

  • Renaming
  • Adjusted tests
  • Added specialized implementation for CPU warp affine (strength reduction)
  • Rename output_type to output_dtype and border to fill_value to match old API

JIRA TASK: [DALI-1095]

@@ -34,6 +34,12 @@ namespace dali {

#define SAVE_TMP_IMAGES 0

#if SAVE_TMP_IMAGES
namespace {
int tmp_img_idx = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global indexing - avoid overwriting results from different template instance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider any thread safety here?


Sampler<static_interp, InputType> sampler(in);

vec2 dsdx = mapping.transform.col(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since affine mapping is linear, we can do this trick to have one vector addition instead of matrix multiplication per pixel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write that as a comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Also, I'll add tiling to prevent excessive accumulation of error in case of very wide images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mzient mzient force-pushed the ReplaceWarp branch 2 times, most recently from 1fb6d43 to fc9644b Compare October 15, 2019 18:38
@mzient
Copy link
Contributor Author

mzient commented Oct 15, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [945947]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [945947]: BUILD FAILED

@mzient
Copy link
Contributor Author

mzient commented Oct 16, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [947389]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [947389]: BUILD FAILED

@@ -85,38 +87,56 @@ class WarpCPU {
}

private:
template <DALIInterpType static_interp>
template <DALIInterpType static_interp, typename Mapping_>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Mapping (no underscore) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when Mapping is AffineMapping2D, then we've got multiple function definition.


Sampler<static_interp, InputType> sampler(in);

vec2 dsdx = mapping.transform.col(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write that as a comment here

fill_value = 128,
interp_type = types.INTERP_LINEAR,
use_image_center = True)
matrix = [1.0, 0.8, -0.8*112, 0.0, 1.2, -0.2*112],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you fix the indentation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's something more substantial, I'll do that, otherwise it'll go with rotate PR. Is that OK?

@@ -63,9 +63,9 @@ def warp_fixed(img):
return warp_fn


class NewWarpPipeline(Pipeline):
class WarpPipeline(Pipeline):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you run (locally is enough) a simple test that compares WarpPipeline to NewWarpPipeline (use compare_pipelines for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test in test_pipeline.py which compares against OpenCV and it still passes after the switch - I only needed to remove the use_image_center flag, which now is not supported (it makes little sense for WarpAffine).

@@ -25,6 +25,7 @@ if (BUILD_BENCHMARK)
"${CMAKE_CURRENT_SOURCE_DIR}/displacement_cpu_bench.cc"
"${CMAKE_CURRENT_SOURCE_DIR}/crop_bench.cc"
"${CMAKE_CURRENT_SOURCE_DIR}/crop_mirror_normalize_bench.cc"
"${CMAKE_CURRENT_SOURCE_DIR}/warp_affine_gpu_bench.cc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: warp affine CPU benchmark

const OpArg params = {"matrix", "1.0, 0.8, 0.0, 0.0, 1.2, 0.0", DALI_FLOAT_VEC};
this->RunTest("WarpAffine", &params, 1);
this->RunTest("OldWarpAffine", &params, 1, false, 0.1 /* 0.1 percent */);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we:

  1. verify (with compare_pipelines in python) that both WarpAffine and OldWarpAffine produce the same results (with a reasonable eps)
  2. Fix and run the CPU benchmarks and there is no performance degradation
    I'd say we remove OldWarpAffine completely

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient mzient force-pushed the ReplaceWarp branch 3 times, most recently from dc82910 to 4edfca4 Compare October 17, 2019 15:08
@mzient
Copy link
Contributor Author

mzient commented Oct 17, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [949161]: BUILD STARTED

Removed OldWarpAffine.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Oct 17, 2019

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [949199]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [949199]: BUILD PASSED

@@ -97,6 +97,7 @@ DALI_HOST_DEV
constexpr vec<out_n> affine(const mat<out_n, in_n + 1> &transform, const vec<in_n> &v) {
vec<out_n> out = {};
for (int i = 0; i < out_n; i++) {
// NOTE: accumulating directly in out[i] prodced noticeably slower code in GCC 7.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NOTE: accumulating directly in out[i] prodced noticeably slower code in GCC 7.4
// NOTE: accumulating directly in out[i] produced noticeably slower code in GCC 7.4

Copy link
Contributor Author

@mzient mzient Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thanks! Will do the same as with indent.

@mzient mzient merged commit 1e5b845 into NVIDIA:master Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants