Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache Coordinates::zlength() result #2471

Merged
merged 7 commits into from
Jan 20, 2022
Merged

Conversation

ZedThree
Copy link
Member

@ZedThree ZedThree commented Dec 6, 2021

At least a partial fix for #2470

We could use a Field2D directly for zlength_cache and check zlength_cache.isAllocated(), except that we don't expose a method for de-allocating a field's data array, and so can't (easily) invalidate the cache. Hence, using a unique_ptr<Field2D>.

I think the only thing to be aware of now is if you change dz and don't call Coordinates::geometry then zlength won't be updated.

ZedThree and others added 2 commits December 6, 2021 14:32
At least a partial fix for #2470

We could use a `Field2D` directly for `zlength_cache` and check
`zlength_cache.isAllocated()`, except that we don't expose a method
for de-allocating a field's data array, and so can't (easily)
invalidate the cache. Hence, using a `unique_ptr<Field2D>`
@ZedThree ZedThree added the bugfix label Dec 6, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2021

clang-tidy review says "All clean, LGTM! 👍"

This completely recovers the performance of 4.4
@ZedThree
Copy link
Member Author

ZedThree commented Dec 6, 2021

Returning const Field2D& from zlength now seems to completely recover the performance of v4.4

src/mesh/coordinates.cxx Outdated Show resolved Hide resolved
@ZedThree
Copy link
Member Author

ZedThree commented Dec 7, 2021

Looks like a potential race condition with the OpenMP build. I guess calculating zlength needs to be in a BOUT_OMP(critical) region?

@d7919
Copy link
Member

d7919 commented Dec 7, 2021

Looks like a potential race condition with the OpenMP build. I guess calculating zlength needs to be in a BOUT_OMP(critical) region?

I guess when you were recalculating in geometry we didn't hit this as we'd always have the cached version available in our OMP regions later? I suspect it might make sense to pull the zlength call out of the OMP loops (presumably in the Lapace methods) as a minor optimisation anyway.

@ZedThree
Copy link
Member Author

ZedThree commented Dec 7, 2021

I guess when you were recalculating in geometry we didn't hit this as we'd always have the cached version available in our OMP regions later?

Yes, I suspect so.

I suspect it might make sense to pull the zlength call out of the OMP loops (presumably in the Lapace methods) as a minor optimisation anyway.

I'm not sure that will have much of an effect since we're now returning a reference, so zlength() should be pretty cheap (after it's been cached). It wouldn't hurt though.

Although, I think it's probably wise to make the check in zlength a critical region (so that it will always work correctly) and as we'd be hitting that on each iteration, that could be a significant hit.

Comment on lines +982 to +991
BOUT_OMP(critical)
if (not zlength_cache) {
zlength_cache = std::make_unique<Field2D>(0., localmesh);

#if BOUT_USE_METRIC_3D
BOUT_FOR_SERIAL(i, dz.getRegion("RGN_ALL")) { (*zlength_cache)[i] += dz[i]; }
#else
(*zlength_cache) = dz * nz;
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BOUT_OMP(critical)
if (not zlength_cache) {
zlength_cache = std::make_unique<Field2D>(0., localmesh);
#if BOUT_USE_METRIC_3D
BOUT_FOR_SERIAL(i, dz.getRegion("RGN_ALL")) { (*zlength_cache)[i] += dz[i]; }
#else
(*zlength_cache) = dz * nz;
#endif
}
if (not zlength_cache) {
BOUT_OMP(critical)
if (not zlength_cache) {
auto tmp = std::make_unique<Field2D>(0., localmesh);
#if BOUT_USE_METRIC_3D
BOUT_FOR_SERIAL(i, dz.getRegion("RGN_ALL")) { (*tmp)[i] += dz[i]; }
#else
(*tmp) = dz * nz;
#endif
zlength_cache = tmp;
}
}

I haven't tested it, but I think something like that should avoid the locking in most cases - as the final update is (mostly?) atomic ...
However, not sure it is worth it - as it should be easy enough to pull out of OMP loops ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand checking the condition twice, but why is tmp necessary?

It's easy enough to pull out of most loops, although it's a bit tricky for Delp2 because it's a few function calls deep inside the loop, so would need to be passed in through several layers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Delp2 be changed to use the tridag interface which takes kwave directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp is needed so zlength_cache is immediately valid, and not first set to 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dschwoerer Ah, I see, ok

@d7919 In principle, but currently the kwave overload is protected in Laplacian and not public.

Looking further, Delp2 doesn't use OpenMP, so this wouldn't affect it anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion this change probably isn't worth it, as all of the potentially affected library code is now fixed or not using OpenMP, and I suspect most/all user code that uses zlength won't be using OpenMP

@bendudson bendudson merged commit af3828f into next Jan 20, 2022
@bendudson bendudson deleted the fix-delp2-performance-regression branch January 20, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants