Reduce usage of pow() and log()#92
Conversation
* Adds profiling tags, standardise seed and set num threads to 1 in issl_rhi140 example Co-authored-by: Coco Yeung <coco.yeung22@imperial.ac.uk>
Add input files for different weather conditions --------- Co-authored-by: Coco Yeung <coco.yeung22@imperial.ac.uk>
* Add profiling flags * Replaced pow() * Aerosol * Added more optimisations * Code clean-up --------- Co-authored-by: Coco Yeung <coco.yeung22@imperial.ac.uk>
Co-authored-by: Coco Yeung <coco.yeung22@imperial.ac.uk>
|
Thanks for the PR! This looks very good. The performance improvements here are really good, I've gotten a 1.47x speedup on Hex c029 vs main. The proposed changes make sense and reduce a lot of the repeat calculations! I put a few comments in the review mostly on readibility changes as expanding the powers into multiplications adds a lot of clutter. There are few things I think we'd want to address:
Other minor notes: You already did a lot of optimization work on the functions in Comparison with mainComparing it to outputs on The maximum relative differences for any variable at any point in the grid is are of the order of 1e-4 to 1e-8 which I think is acceptable. There are a couple of relative differences at 1e-1 but those correspond to absolute differences of at most 1e-8 so I am not worried about those. Here's the full output of the diff if you are interestedI am happy to discuss any of this more, and thanks again! I'd be curious to have @sdeastham 's opinion on the divergence from main, other than that once you've addressed this changes this looks good to me. |
| inline const Vector_1D& getBinVCenters() const { return bin_VCenters; }; | ||
| inline const Vector_1D& getBinEdges() const { return bin_Edges; }; | ||
| inline const Vector_1D& getBinSizes() const { return bin_Sizes; }; | ||
| inline const Vector_1D& getLogBinEdges() const { return log_Bin_Edges; }; |
There was a problem hiding this comment.
Saving the log bins edges is great!
| case 1: return Moment<1>(); | ||
| case 2: return Moment<2>(); | ||
| case 3: return Moment<3>(); | ||
| default: return Moment<4>(); |
There was a problem hiding this comment.
The templating is a good idea! I do think we need to handle the default case better, right now any input > 3 returns the 4th moment which is definitely wrong. I don't think we use moments larger than 3 anyway so to keep the template we could either throw (I don't if that works with templating) or revert back to the default implementation.
But I think this is a silent error that'd be very confusing.
This applies to all other implementations of the template scheme
There was a problem hiding this comment.
Made changes to return the correct moment for n > 3
lrobion
left a comment
There was a problem hiding this comment.
Thanks for making all the changes, and thanks for updating the PR with the latest main!
I added two small comments based on your changes, and could you please remove the following example input files:
examples/issl_rhi140/default.yamlexamples/issl_rhi140/rhi.yamlexamples/issl_rhi140/shear.yamlexamples/issl_rhi140/temperature.yamlexamples/issl_rhi140/w.yaml
In the current version of the code running the issl_rhi140 example, around 20% of computational time is spent on the

pow()function when running on 1 thread: see perf profiling belowChanges have been made to the code to reduce the reliance on
pow(). The most notable changes were:nis only ever 0, 1, 2, or 3, and savinglog()results in the loop to prevent repeated calculationspow()as it is only called by LAGRIDPlumeModel.cpp where the powernis 0; it also returns the value before it is multiplied by thelogBinRatiosince the loop which calls it also calculates it.Other changes made are to compute
pow(x, 2)asx*xorpow(x, 1/3)ascbrt(x)instead, although I believe that made minimal difference as the compiler could already make those optimisations.This resulted in a decrease in % time spent in the functions


pow()andlog(): see perf profiling belowThe real time decrease for the different weather conditions are below:
<style> </style>The changes are also zero-diff with the current main branch, verified by taking the difference between the ice mass over time and the number of ice particles at each output time.