Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor optimizations for ZHermiteSpline #2035

Merged
merged 12 commits into from
May 29, 2020
Merged

Conversation

johnomotani
Copy link
Contributor

I tried running simulations to compare shiftedmetric and shiftedmetricinterp. I haven't got as far as comparing the results yet, but was expecting shiftedmetricinterp to run a bit faster, because of not needing to do FFTs. Actually, both seem to take similar numbers of iterations, but shiftedmetricinterp is slower - the whole simulation is about 10% slower, I haven't separated out the toFieldAligned/fromFieldAligned calls yet...

This PR is an attempt to optimize ZHermiteSpline.interpolate(), but doesn't seem to have made a noticeable difference.

Any suggestions welcome... I'm guessing this loop might be failing to vectorize?

BOUT_FOR(i, f.getRegion(region)) {
const int x = i.x();
const int y = i.y();
const int z = i.z();
if (with_mask) {
if (skip_mask(x, y, z))
continue;
}
// Due to lack of guard cells in z-direction, we need to ensure z-index
// wraps around
const int z_mod = k_corner[i.ind];
const int z_mod_p1 = (z_mod < ncz - 1) ? (z_mod + 1) : 0;
const int y_next = y + y_offset;
// Interpolate in Z
f_interp(x, y_next, z) =
+f(x, y_next, z_mod) * h00[i] + f(x, y_next, z_mod_p1) * h01[i]
+ fz(x, y_next, z_mod) * h10[i] + fz(x, y_next, z_mod_p1) * h11[i];
ASSERT2(finite(f_interp(x, y_next, z)) || x < localmesh->xstart
|| x > localmesh->xend);
}

The one other idea I had is that if we required an axisymmetric grid, we could use a Field2D index-shift instead of allowing the Field3D k_corner, and then there might be more optimizations possible. I'm not currently planning to run big simulations with shiftedmetricinterp that might make this worthwhile, and it would remove some functionality in the interests of performance, so I don't think it's worth pursuing (at least for the moment).

Don't communicate unless needed.
Use templated interpolate_internal method to avoid code duplication.
Make y_offset const (and so remove setYOffset method). Also remove
'const int ncz' from inner loop of ZHermiteSpline interpolation.
Another small optimization, avoids need for modulo operators in methods
called from rhs function.
Allows us to access k_corner with a flat index, which is mostly what is
needed. Should be a small optimization.
@ZedThree
Copy link
Member

ZedThree commented May 26, 2020

skip_mask was used before we had Region. If we were to write these interpolations from scratch today, we'd just use Region instead. That's probably the next place to go.

Although the cost of using a Region to do the masking is that we take a Region as an argument to interpolate, so we'd have to construct a mask on each call. We could mitigate that cost with memoisation -- cache the newly constructed masked region and region argument name, and check if it's the same as last call. That's likely the case for the vast majority of calls.

Co-authored-by: Peter Hill <zed.three@gmail.com>
@johnomotani
Copy link
Contributor Author

johnomotani commented May 26, 2020

@ZedThree for my use-case I actually don't need a skip_mask. Agreed a custom Region is the long-term solution. My short term fix was to create two branches by templating the body of the interpolate method

Field3D ZHermiteSpline::interpolate(const Field3D& f, const std::string& region) const {
if (has_mask) {
return interpolate_internal<true>(f, region);
}
return interpolate_internal<false>(f, region);
}

I think this should optimize out the conditional for skip_mask from the inside of the loop when I've never set one (so has_mask = false).

Edit: Added a comment to the code to say this.

// make k_corner be in the range 0<=k_corner<nz
k_corner[i.ind] = ((k_corner[i.ind] % ncz) + ncz) % ncz;
// Convert z-index to Ind3D
k_corner[i.ind] = Ind3D((i.x()*ncy + i.y())*ncz + corner_zind, ncy, ncz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we want the (x, y) indices from the current loop iteration but we know the z index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - basically we want `k_corner(x, y, z) = {x, y, floor(delta_z(x, y, z))}

I notice now I should have used x and y instead of i.x() and i.y(), since they're there already.

ZedThree
ZedThree previously approved these changes May 29, 2020
@johnomotani johnomotani merged commit 63b2c9d into next May 29, 2020
@johnomotani johnomotani deleted the zhermitespline-optimize branch May 29, 2020 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants