-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NewWarpAffine -> WarpAffine; optimize CPU warp for affine mapping. #1387
Conversation
@@ -34,6 +34,12 @@ namespace dali { | |||
|
|||
#define SAVE_TMP_IMAGES 0 | |||
|
|||
#if SAVE_TMP_IMAGES | |||
namespace { | |||
int tmp_img_idx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global indexing - avoid overwriting results from different template instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we consider any thread safety here?
|
||
Sampler<static_interp, InputType> sampler(in); | ||
|
||
vec2 dsdx = mapping.transform.col(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since affine mapping is linear, we can do this trick to have one vector addition instead of matrix multiplication per pixel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write that as a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Also, I'll add tiling to prevent excessive accumulation of error in case of very wide images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
1fb6d43
to
fc9644b
Compare
!build |
CI MESSAGE: [945947]: BUILD STARTED |
CI MESSAGE: [945947]: BUILD FAILED |
!build |
CI MESSAGE: [947389]: BUILD STARTED |
CI MESSAGE: [947389]: BUILD FAILED |
@@ -85,38 +87,56 @@ class WarpCPU { | |||
} | |||
|
|||
private: | |||
template <DALIInterpType static_interp> | |||
template <DALIInterpType static_interp, typename Mapping_> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Mapping
(no underscore) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because when Mapping is AffineMapping2D, then we've got multiple function definition.
|
||
Sampler<static_interp, InputType> sampler(in); | ||
|
||
vec2 dsdx = mapping.transform.col(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write that as a comment here
fill_value = 128, | ||
interp_type = types.INTERP_LINEAR, | ||
use_image_center = True) | ||
matrix = [1.0, 0.8, -0.8*112, 0.0, 1.2, -0.2*112], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix the indentation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's something more substantial, I'll do that, otherwise it'll go with rotate PR. Is that OK?
@@ -63,9 +63,9 @@ def warp_fixed(img): | |||
return warp_fn | |||
|
|||
|
|||
class NewWarpPipeline(Pipeline): | |||
class WarpPipeline(Pipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you run (locally is enough) a simple test that compares WarpPipeline
to NewWarpPipeline
(use compare_pipelines
for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a test in test_pipeline.py which compares against OpenCV and it still passes after the switch - I only needed to remove the use_image_center
flag, which now is not supported (it makes little sense for WarpAffine).
dali/benchmark/CMakeLists.txt
Outdated
@@ -25,6 +25,7 @@ if (BUILD_BENCHMARK) | |||
"${CMAKE_CURRENT_SOURCE_DIR}/displacement_cpu_bench.cc" | |||
"${CMAKE_CURRENT_SOURCE_DIR}/crop_bench.cc" | |||
"${CMAKE_CURRENT_SOURCE_DIR}/crop_mirror_normalize_bench.cc" | |||
"${CMAKE_CURRENT_SOURCE_DIR}/warp_affine_gpu_bench.cc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: warp affine CPU benchmark
const OpArg params = {"matrix", "1.0, 0.8, 0.0, 0.0, 1.2, 0.0", DALI_FLOAT_VEC}; | ||
this->RunTest("WarpAffine", ¶ms, 1); | ||
this->RunTest("OldWarpAffine", ¶ms, 1, false, 0.1 /* 0.1 percent */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we:
- verify (with compare_pipelines in python) that both WarpAffine and OldWarpAffine produce the same results (with a reasonable eps)
- Fix and run the CPU benchmarks and there is no performance degradation
I'd say we removeOldWarpAffine
completely
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
dc82910
to
4edfca4
Compare
!build |
CI MESSAGE: [949161]: BUILD STARTED |
Removed OldWarpAffine. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
!build |
CI MESSAGE: [949199]: BUILD STARTED |
CI MESSAGE: [949199]: BUILD PASSED |
@@ -97,6 +97,7 @@ DALI_HOST_DEV | |||
constexpr vec<out_n> affine(const mat<out_n, in_n + 1> &transform, const vec<in_n> &v) { | |||
vec<out_n> out = {}; | |||
for (int i = 0; i < out_n; i++) { | |||
// NOTE: accumulating directly in out[i] prodced noticeably slower code in GCC 7.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NOTE: accumulating directly in out[i] prodced noticeably slower code in GCC 7.4 | |
// NOTE: accumulating directly in out[i] produced noticeably slower code in GCC 7.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Thanks! Will do the same as with indent.
Signed-off-by: Michal Zientkiewicz michalz@nvidia.com
Why we need this PR?
What happened in this PR?
output_type
tooutput_dtype
andborder
tofill_value
to match old APIJIRA TASK: [DALI-1095]