You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bug in older ROMIO versions (see pmodels/mpich@538f8abb) can lead to crashes in Darshan's log writing to Lustre file systems (floating point exceptions due to divide by zero).
A couple of workarounds include:
clearing Darshan's default log hints (either at runtime or at configure time)
forcing Darshan to open it's logfile with ufs: prefix to avoid going through ROMIO Lustre driver
Since unsetting Darshan's hints can lead to poor performance due to every application rank opening the log file, we should consider adding a configure option to Darshan to force a ufs: prefix to it's log file, to allow sidestepping a potentially buggy ROMIO Lustre driver.
The text was updated successfully, but these errors were encountered:
We still see issues similar to this from time-to-time, but the Lustre ADIO fix cited here is nearing a decade-old. It's likely that this has been resolved on the vast majority of systems running ROMIO.
We should continue to keep an eye out for issues like this, see if there are new or otherwise unfixed bugs in ROMIO code, and update Darshan documentation to better describe known issues and workarounds.
Closing this issue for now, though, there aren't really actionable details here.
A bug in older ROMIO versions (see pmodels/mpich@538f8abb) can lead to crashes in Darshan's log writing to Lustre file systems (floating point exceptions due to divide by zero).
A couple of workarounds include:
ufs:
prefix to avoid going through ROMIO Lustre driverSince unsetting Darshan's hints can lead to poor performance due to every application rank opening the log file, we should consider adding a configure option to Darshan to force a
ufs:
prefix to it's log file, to allow sidestepping a potentially buggy ROMIO Lustre driver.The text was updated successfully, but these errors were encountered: