-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PnetCDF module #456
PnetCDF module #456
Conversation
I'm reviewing this now, but probably a good starting point here would be to work on rebasing this on top of #726, which increases Darshan's maximum modules from 16 to 64 -- these PnetCDF modules push us beyond our current limits on |
This is now rebased on our |
I started a small review for more concise things, but there are a few things that stood out to me at a high-level that it might be good for us all to discuss:
|
Like in MPI-IO, file create and open in PnetCDF are collective only. I think using INDEP_OPENS/COLL_OPENS in Darshan was a mistake (and misleading). |
This sounds fine with me. |
The last time PnetCDF added new I/O-related APIs was version 1.6.0, released in February 2, 2015. (FYI. the latest release is 1.12.3.) We can make 1.6.0 the minimum requirement in configure.ac. Its Release Note contains all history, just keyword search "New APIs". |
What if we assigned this module a different ID and just treated them as two separate things? That way if the parser encounters a log with the old-style pnetcdf module it will still emit it in case someone needs that. It doesn't seem worth the effort to up-convert since the scope is so different. If we go that route (new module id, keep ability to print old module id) then I guess we want to make sure the prefix is slightly different. I'm not sure about this idea yeah, just wanted to offer it as an idea for discussion. |
Awesome, thanks for the details. I'll add in an autoconf check for this version number and will mention in the darshan-runtime docs about the version dependency (though it sounds like these functions are all sufficiently old, that we're unlikely to hit this problem in practice). |
Backward compatibility refers to a new version (Darshan parser) being able to read the log files created by an older version. So, when the new parser encountered PNETCDF_INDEP_OPENS and PNETCDF_COLL_OPENS, it should merge them into PNETCDF_OPEN (same for PNETCDF_CREATE). One question, how does Darshan tell an open/create is independent? |
Okay, I'll take a shot at updating the module along these lines. |
I think one key downside to this approach is just that utilities would have to be aware of both formats (old/new) and try to handle gracefully. It probably would just simplify things generally for us to only expose the new format going forward? Looking closer, the only issue with the up-conversion is just that there are no longer Also, FWIW, we've already taken a similar approach in the HDF5 module, in that it was previously a barebones module that is now being upconverted into the new more detailed format. |
Looks like in the PnetCDF module, an open is marked as independent if the input communicator size == 1. I'm not sure it's semantically useful to capture that distinction, though. Presumably if the input communicator only contains 1 process, it will be clear from the rank information captured by Darshan how many processes opened it? |
That makes sense. Upconverting seems reasonable to me. |
Yeah, that's right. I agree it's not a useful counter distinction for pnetcdf moving forward. |
@wkliao -- is there a reliable way to grab the PnetCDF library version across versions that you recommend? I found |
In PnetCDF header file, pnetcdf.h, there are a few C constants defined for version numbers. Below is an example.
|
To answer my own question from the release notes:
I'll just use this unless you have any objections. |
Ahh, thanks. I'll try this out. |
Here is a code fragment I have been using. |
Autoconf code enforcing library version >= 1.6.0 has been pushed via bd6369a |
This should be done via 1792807. There is one additional simplification we could consider in that we could just get rid of the |
Is the same implemented for HDF5? given H5Fcreate and H5Fopen are two separate APIs. |
The HDF5 module bins both of those calls up into |
I am wondering whether separating the counters can be useful for |
I agree it's useful to be able to distinguish opens/creates in the theoretical sense, since they likely have different performance characteristics on different systems. We maybe have just been careful binning things in a That's just some background on why I think we've avoided |
I added a more detailed review that captures most of my comments/feedback for the PnetCDF wrappers contained in the m4 code. I'm happy to make the code changes necessary to address these, just wanted to make sure you had a chance to respond to any feedback. |
I'll use this comment to keep track of things that aren't currently tracked in a review comment, but that we would like to see addressed before merging this in.
|
- this allows Darshan to capture whether variables are record variables or not
* allows minimal PnetCDF data captured in prior versions of Darshan to be upconverted to new module formats so it can be parsed by users
@@ -12,7 +12,7 @@ def mod_agg_iohist(self, mod, mode='append'): | |||
""" | |||
|
|||
# sanitation and guards | |||
supported = ["POSIX", "MPI-IO", "H5D"] | |||
supported = ["POSIX", "MPI-IO", "H5D", "PNETCDF_VAR", "PNETCDF_FILE"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerjereddy just noticed this when doing a final read-through
I think we can remove PNETCDF_FILE
from the supported list, as there are no access histogram counters in the file-level records for PnetCDF (they are stored in the variable-level counters). I removed locally and tests pass, so just wanted to double check there wasn't a reason before removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the changes to cli/summary.py
you can see there are similar guards for supported modules for I/O histograms that does not include PNETCDF_FILE
, which makes total sense, so I just removed it from the supported list here, too. Must have been an oversight.
@@ -436,15 +436,21 @@ def register_figures(self): | |||
opcounts_mods.append("H5D") | |||
elif "H5F" in self.report.modules: | |||
opcounts_mods.append("H5F") | |||
elif "PNETCDF_VAR" in self.report.modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerjereddy Shouldn't this be if
rather than elif
, otherwise we'd run into issues if a log had both PnetCDF and HDF5 modules active? i.e., use an if/elif block for HDF5 and a separate if/elif block for PnetCDF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, and can't the same be said for the elif
above as well with the new config modularity? I suspect we need more test files to catch that stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed s.t. a different if
/elif
clause is used for PNETCDF_FILE
/PNETCDF_VAR
than for H5F
/H5D
. Otherwise, it doesn't look like opcounts plots would be generated for both PNETCDF and HDF5 if a log happened to have both (probably very rare).
I'm not sure if there's other places in this logic that need fixed. My understanding from the preceding comment is that you only want one of H5F
/H5D
or PNETCDF_FILE
/PNETCDF_VAR
, as the H5D
and PNETCDF_VAR
modules know how to additionally include the file-level data.
I spot checked a couple of summary reports and confirmed tests succeed with the changes, so think I'm content with the changes for now, but could still be some corner cases to sort out.
If you click through the sub-100 % patch diff coverage in CI, those missing lines may be related to your concern above, assuming it isn't a codecov fusing issue. |
* PnetCDF module and HDF5 module could conceivably be included in the same report
It will be great to see this PR merged soon. |
@wkliao @shanedsnyder is it possible to silence some of the warnings that pop out with -Wall? I'm seeing a large volume of warnings like the following with gcc 12.2:
I don't believe these are important; I suspect that it's just an artifact of how the m4 generated code is used for various wrappers. It does make the build output noisy, though, so it could obfuscate meaningful warnings. I don't know what the options are for this situation; is there something simple like a gcc pragma maybe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. If there is a way to make the -Wall warnings quieter that would be great, but I approve regardless. Everything runs fine for me. It's need to see the file:variable notation in the record name fields for the VAR module.
Those variables are probably used before, not longer. |
Add/update the PnetCDF module.