-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bit shaving results in non-layout independent History? #1941
Comments
Part of the scheme involves extracting the average before the bit shaving. I suspect that this is the culprit. |
Ok, as @tclune pointed out, the bit-shaving algorithm that we still use that was implemented by Arlindo a long time ago, is a little more complicated than I realized (I have not looked at it in at least half a decade) and maybe doing some for a math to prevent bias after the shaving, in which case if this is done on the distributed arrays may be the issue. I will confirm this by seeing if doing on the bit shaving on the gathered arrays on the server side fixes this, although this brings up other thorny design issues we will have to address if indeed solves the problem. |
@rlucches @rtodling @lltakacs I've confirmed that if I hack the bit shaving over on the server when the data is gathered it seems to fix any bit shaving related layout problem. (rather than do the bit shaving on the History/griddedio side when the data is still distributed). Now the trickier part is doing this in a clean way in the current history/griddedio/pfio output server system we have. Hopefully this is not too bad to shift this over to the server. I will let everyone know when I have a solution and I imagine whatever I do in MAPL develop will need to be backported if that GEOS-IT is using that MAPL 2.8.0.X series where we have been collecting patches as needed to avoid forcing a bigger number update. |
The GMAO-OPS @rlucches team reported that the GEOS-IT system was giving different diagnostic output at different layouts (but the model itself is layout independent as far as the checkpoints) but @lltakacs traced it down to the bit shaving as the culprit.
Indeed, I've been able to reproduce with "main" branch of the GEOSgcm fixture as of 1/20/2023 (that's when model I just ran was cloned and built). At C48, I have collection that is output on the native grid of the fields, the fields are 2D, so there's no processing. If I turn on bit shaving I get different results at 1x12 and 4x48 for the collection at the first write. If I turn off bit-shaving, the collection is layout independent.
I'll investigate; I'm perplexed as the bit shaving should be an element wise operation, just not seeing how the layout could possibly matter, but apparently it does.
The text was updated successfully, but these errors were encountered: