As of 5eacc87, the motion feature extractor has been reimplemented to operate on integer buffers. While this has already resulted in a considerable speed up (see below), there is still room for more optimizations. While motion is already the fastest feature extractor, it is a good first feature extractor to optimize since it is relatively simple and will also lay the foundations for further optimizations on the VIF and ADM feature extractors in the coming weeks.
commit 5eacc87ba1eb8c4c7ce543333de5b56e75bc6218
Author: Kyle Swanson <kswanson@netflix.com>
Date: Thu Apr 30 10:37:40 2020 -0700
libvmaf: implement fixed point motion feature extractor
Compared to the previous implementation, this is a ~4x speedup
float_motion motion
fps="37.07" fps="153.00"
Co-authored-by: IttiamVijayakumarGR <62744303+IttiamVijayakumarGR@users.noreply.github.com>
The algorithm for this feature extractor is as follows:
- Blur the input buffers using a 5-tap Gaussian convolution.
- Calculate the SAD between these blurred buffers.
- Normalize the SAD score.
In terms of time spent in this code, it mostly convolution (64.5%), and in second place the SAD calculation (12.7%). Please note that for the convolution there is a separate path for 8-bit inputs and 10-bit inputs. Please download and have a look at the SVG flame graph in a web browser, you may click through call stacks and see time spent in each individual function.
Both the SAD function and the convolution function are set in the feature extractor initialization via function pointers here. These functions should be re-implemented using AVX2 and/or AVX-512 and these function pointers should be set if those instruction sets are available and not masked. For historical reasons, there is a global variable called cpu where you may read these cpu flags (this global variable will be moved into the feature extractor context soon, but for now use the global variable).
As for implementation, please leave src/feature/integer_motion.c pure C. This file should serve as a C reference implementation, but also be cross-platform, free of #ifdefs, assembly, and intrinsics. If you were to write a AVX-512 version of one of these functions, please do so in a new file src/feature/x86/motion_avx512.c and provide a header which exposes your functions.
To run vmaf_rc with only the motion feature extractor enabled, use the following command. To manually mask the CPU instruction set and force the C versions of the functions additionally set the --cpumask flag to -1.
./build/tools/vmaf_rc \
--no_prediction \
--reference ./y4ms/ducks.y4m \
--distorted ./y4ms/ducks_dist.y4m \
--output output.xml \
--feature motion
To measure speedups, I've been relying on /usr/bin/time, but you may also read the fps from the output XML.
As of 5eacc87, the motion feature extractor has been reimplemented to operate on integer buffers. While this has already resulted in a considerable speed up (see below), there is still room for more optimizations. While motion is already the fastest feature extractor, it is a good first feature extractor to optimize since it is relatively simple and will also lay the foundations for further optimizations on the VIF and ADM feature extractors in the coming weeks.
The algorithm for this feature extractor is as follows:
In terms of time spent in this code, it mostly convolution (64.5%), and in second place the SAD calculation (12.7%). Please note that for the convolution there is a separate path for 8-bit inputs and 10-bit inputs. Please download and have a look at the SVG flame graph in a web browser, you may click through call stacks and see time spent in each individual function.
Both the SAD function and the convolution function are set in the feature extractor initialization via function pointers here. These functions should be re-implemented using AVX2 and/or AVX-512 and these function pointers should be set if those instruction sets are available and not masked. For historical reasons, there is a global variable called
cpuwhere you may read these cpu flags (this global variable will be moved into the feature extractor context soon, but for now use the global variable).As for implementation, please leave
src/feature/integer_motion.cpure C. This file should serve as a C reference implementation, but also be cross-platform, free of#ifdefs, assembly, and intrinsics. If you were to write a AVX-512 version of one of these functions, please do so in a new filesrc/feature/x86/motion_avx512.cand provide a header which exposes your functions.To run
vmaf_rcwith only the motion feature extractor enabled, use the following command. To manually mask the CPU instruction set and force the C versions of the functions additionally set the--cpumaskflag to-1../build/tools/vmaf_rc \ --no_prediction \ --reference ./y4ms/ducks.y4m \ --distorted ./y4ms/ducks_dist.y4m \ --output output.xml \ --feature motionTo measure speedups, I've been relying on
/usr/bin/time, but you may also read the fps from the output XML.