Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darshan's log write can trigger crashes in older ROMIO versions #424

Closed
shanedsnyder opened this issue Jul 29, 2021 · 1 comment
Closed
Milestone

Comments

@shanedsnyder
Copy link
Contributor

A bug in older ROMIO versions (see pmodels/mpich@538f8abb) can lead to crashes in Darshan's log writing to Lustre file systems (floating point exceptions due to divide by zero).

A couple of workarounds include:

  • clearing Darshan's default log hints (either at runtime or at configure time)
  • forcing Darshan to open it's logfile with ufs: prefix to avoid going through ROMIO Lustre driver

Since unsetting Darshan's hints can lead to poor performance due to every application rank opening the log file, we should consider adding a configure option to Darshan to force a ufs: prefix to it's log file, to allow sidestepping a potentially buggy ROMIO Lustre driver.

@shanedsnyder shanedsnyder added this to the 3.3.2 milestone Jul 29, 2021
@shanedsnyder shanedsnyder modified the milestones: 3.4, 3.4.1 Jun 15, 2022
@shanedsnyder shanedsnyder modified the milestones: 3.4.1, 3.4.2 Dec 1, 2022
@shanedsnyder
Copy link
Contributor Author

We still see issues similar to this from time-to-time, but the Lustre ADIO fix cited here is nearing a decade-old. It's likely that this has been resolved on the vast majority of systems running ROMIO.

We should continue to keep an eye out for issues like this, see if there are new or otherwise unfixed bugs in ROMIO code, and update Darshan documentation to better describe known issues and workarounds.

Closing this issue for now, though, there aren't really actionable details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant