Optimise dedispersion routine#62
Conversation
|
After a discussion with @csbnw we noticed the following:
Note that the original code has more granular benchmarking in the We should add the same here to enable a more direct comparison, but in release mode |
Added the timings to Conclusions:
|
Whoops.. I was comparing the PACE variant that uses the reference kernel with the optimized FDD kernel of the original implementation. Using both This matches the PACE implementation! The PACE impl. is 3% slower, but I think that's to be expected. Surprisingly our kernel is slightly faster, but the initialization, FFTs, and copying data around take more time in our impl. |
csbnw
left a comment
There was a problem hiding this comment.
I have one minor comment, other than that it looks good to go. 🙌
Apply suggestion Co-authored-by: Bram Veenboer <bram.veenboer@gmail.com>
This MR relates to #58, with the goal of bringing the execution time of the dedispersion kernel down to 1.5-2 seconds. Here we specifically use the
testfddbinary as a benchmark.Some improvements in no particular order:
std::cosf()andstd::sinf()instead ofstd::cos()andstd::sin, which seems to speed-up the kernel by around 0.5 seconds.Other additions:
FDDPlan::execute()