-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add E3SM-IO logs from Theta with heatmap/dxt data #23
Conversation
hey, now you can help fix the CI ;) Looks like DXT tracing adds two orders of magnitude to the binary log file size? |
Yeah, I'll try to sort out the CI issues, then get these both merged in. Though from @nawtrey's #22 (comment), sounds like CI is going to be failing until we get heatmap support properly in pydarshan via darshan-hpc/darshan#615 |
That's right, we can't escape chicken-and-egg problems once the CI gets to the proper test failures. |
DXT tracing is especially expensive in these examples because of the shear number of operations (there's around a million of them). |
Well, we could merge in conservative guards in |
When I was trying to produce some sample cross-comparison plots on branch
|
Could be something in that branch, since it was just designed as a quick hack to compare the heatmaps on top of what Jakob did. |
I hit that same error you mention ("Record nbins is not consistent with current heatmap."), just using darshan-hpc/darshan#665 directly, so I don't think it's anything related to changes in your fork. I also don't hit the error with the diagonal log you submitted, so don't think it's something that affects all logs with heatmap data. The logs in this PR do have APMPI data, which your logs do not, maybe it's somehow related to that. I don't know, but I'll keep looking into it before merging. |
I suppose unintentionally problematic logs are especially welcome in the logs repo |
This is true. Also, I think there could be a bug in the heatmap module bindings PR that's just being triggered by this larger scale log. I'm going to merge both of these then provide details back on darshan-hpc/darshan#665 |
This PR includes 2 darshan logs of a larger-scale run of the E3SM-IO benchmark (I case) on the Theta supercomputer at ALCF:
e3sm_io_heatmap_and_dxt.darshan
contains both heatmap and DXT module datae3sm_io_heatmap_only.darshan
contains just heatmap data for a separate run of the same benchmarkThe README provides more details on job size, benchmark configuration, etc.
These logs serve a similar purpose as the ones in #22, just from a larger-scale job with lots of read/write activity.