-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor optimizations for ZHermiteSpline #2035
Conversation
Don't communicate unless needed.
Use templated interpolate_internal method to avoid code duplication.
Make y_offset const (and so remove setYOffset method). Also remove 'const int ncz' from inner loop of ZHermiteSpline interpolation.
Another small optimization, avoids need for modulo operators in methods called from rhs function.
Allows us to access k_corner with a flat index, which is mostly what is needed. Should be a small optimization.
Although the cost of using a |
Co-authored-by: Peter Hill <zed.three@gmail.com>
efeef05
to
34e5fff
Compare
@ZedThree for my use-case I actually don't need a BOUT-dev/src/mesh/interpolation/hermite_spline_z.cxx Lines 129 to 134 in 34e5fff
I think this should optimize out the conditional for skip_mask from the inside of the loop when I've never set one (so has_mask = false ).
Edit: Added a comment to the code to say this. |
Simplifies the loop in interpolate().
34e5fff
to
dcefe70
Compare
// make k_corner be in the range 0<=k_corner<nz | ||
k_corner[i.ind] = ((k_corner[i.ind] % ncz) + ncz) % ncz; | ||
// Convert z-index to Ind3D | ||
k_corner[i.ind] = Ind3D((i.x()*ncy + i.y())*ncz + corner_zind, ncy, ncz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we want the (x, y)
indices from the current loop iteration but we know the z index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - basically we want `k_corner(x, y, z) = {x, y, floor(delta_z(x, y, z))}
I notice now I should have used x
and y
instead of i.x()
and i.y()
, since they're there already.
I tried running simulations to compare
shiftedmetric
andshiftedmetricinterp
. I haven't got as far as comparing the results yet, but was expectingshiftedmetricinterp
to run a bit faster, because of not needing to do FFTs. Actually, both seem to take similar numbers of iterations, butshiftedmetricinterp
is slower - the whole simulation is about 10% slower, I haven't separated out thetoFieldAligned
/fromFieldAligned
calls yet...This PR is an attempt to optimize
ZHermiteSpline.interpolate()
, but doesn't seem to have made a noticeable difference.Any suggestions welcome... I'm guessing this loop might be failing to vectorize?
BOUT-dev/src/mesh/interpolation/hermite_spline_z.cxx
Lines 156 to 179 in 8342203
The one other idea I had is that if we required an axisymmetric grid, we could use a
Field2D
index-shift instead of allowing theField3D
k_corner
, and then there might be more optimizations possible. I'm not currently planning to run big simulations withshiftedmetricinterp
that might make this worthwhile, and it would remove some functionality in the interests of performance, so I don't think it's worth pursuing (at least for the moment).