New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI IO file write with >2GB individual writes #12873
Comments
My thoughts to solve this so far: While A) seems to be the simplest and best, it doesn't work for I find C) to be quite complicated and I don't want to include a library just for that purpose. That leaves B), which is also a bit annoying as it requires changes to the file layout / offset computation. Am I missing something? |
Requiring MPI 4 is not great. I don't know an installation that has that -- in fact, I didn't know that was even out. I don't understand the comment about padding. Let's say you have a buffer of size 678 bytes to write, couldn't you just create an MPI data type of 678 (That only leaves the issue of the bizarre bug @tamiko discovered a while ago whereby his MPI implementation did not release the memory associated with the data structure objects if I recall correctly. It may also have been about custom MPI operators.) |
You have two options: You seem to suggest A), while I was thinking B) might be a lot easier to pull off. I have to admit that I don't understand exactly what https://github.com/jeffhammond/BigMPI/blob/5300b18cc8ec1b2431bf269ee494054ee7bd9f72/src/type_contiguous_x.c#L74 does to get the right datatype. Edit: The code is much simpler for char, actually. I think we can go with that. |
|
Yes, this code looks correct. It creates the MPI equivalent of
which has exactly the right size and if you map it directly onto your buffer covers its entirety. You'd then call |
This fixes save/load of fixed and variable checkpointing where individual ranks write more than 2GBs of data. Part of dealii#12873 and dealii#12752
#13611 is the facility to use |
We are using
MPI_File_write_at
with data type CHAR to write large blobs of data in several places, likedealii/source/distributed/tria_base.cc
Lines 1525 to 1531 in 61d7023
Note that the
count
parameter (the number of bytes to write) is a signedint
. It can easily overflow if we try to write more than 2 GB from a single rank. This (and many other places) need to be fixed.The text was updated successfully, but these errors were encountered: