Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output directory creation can fail on lustre filesystems #53

Open
dtrudg opened this issue Feb 15, 2016 · 8 comments
Open

Output directory creation can fail on lustre filesystems #53

dtrudg opened this issue Feb 15, 2016 · 8 comments

Comments

@dtrudg
Copy link

dtrudg commented Feb 15, 2016

We are using cufflinks on an HPC system with the main project space on a lustre filesystem. We noticed cufflinks routinely failing for users when they specified an absolute path for an output directory with -o

On further investigation it appears that the mkpath funtion in src/common.cpp is recursive and calling mkdir at each level of the requested path, from root to tip. If a level exists it expects, and handles, an EEXIST error from mkdir.

Unfortunately lustre can return EPERM for mkdir against an existing directory:

https://jira.hpdd.intel.com/browse/LU-4185

This is a long-standing issue with lustre, which doesn't appear to be going away. We don't hit it in other software that does recursive directory creation as they generally call stat to find how much of the structure exists, rather than relying on EEXIST being returned from mkdir.

I will patch cufflinks locally for this issue, to use stat in this manner. Would this be considered as a PR?

@dtrudg dtrudg changed the title Output directory can fail on lustre filesystems. Output directory creation can fail on lustre filesystems Feb 15, 2016
@jrdemasi
Copy link

Hey, @ctrapnell! Just curious how open you would be if I submitted a PR and tried to tackle this one to actually accepting a patch? Some people like their code to always be their code.

@ctrapnell
Copy link
Contributor

That would be great! Thanks for offering to help.

Cole

Cole Trapnell

Sent from
https://polymail.io/?utm_source=polymail&utm_medium=referral&utm_campaign=signature

On Tue, Aug 16, 2016 at 11:28 AM jrdemasi

<
mailto:jrdemasi notifications@github.com

wrote:

a, pre, code, a:link, body { word-wrap: break-word !important; }

Hey,
https://github.com/ctrapnell
! Just curious how open you would be if I submitted a PR and tried to tackle this one to actually accepting a patch? Some people like their code to always be their code.

You are receiving this because you were mentioned.

Reply to this email directly,
#53 (comment)
, or
https://github.com/notifications/unsubscribe-auth/AAR_GTFMYYd6SElKNYZjwic6fdsihIM5ks5qggE5gaJpZM4HarXG
.

@jrdemasi
Copy link

@dctrud Didn't actually read the entirety of your post, but am not going to double up on a PR if you've already patched it. Can you let me know your progress on this?

Thanks!

@dtrudg
Copy link
Author

dtrudg commented Sep 17, 2016

@jrdemasi - sorry for very slow repy. Didn't actually patch locally in the end. We have a workaround for our workflows to just create directories first, and our vendor is going to supply a patched lustre as we have similar issues with other software, particularly MATLAB.

@thiell
Copy link

thiell commented Mar 4, 2017

cufflinks should get mkpath fixed (particularly this: https://github.com/cole-trapnell-lab/cufflinks/blob/master/src/common.cpp#L283-L289) and call stat() because POSIX doesn't dictate that EEXIST should be returned first by mkdir in case of multiple errors, like EPERM and EEXIST, when for example directories were pre-created by another user and thus not writable. This behavior can happen on NFS too and should be fixed by the application.

@jrdemasi
Copy link

jrdemasi commented Mar 4, 2017

@thiell Is that issue directly related to the lustre bug described in this? I actually have a path for this but have been extremely lazy in making a PR (sorry, @ctrapnell!)

@thiell
Copy link

thiell commented Mar 4, 2017

@jrdemasi yes.. it is related to this issue, but what I say is that it is first a cufflinks bug and not a lustre bug, as it is POSIX (weird) behavior that can also occur on NFS. I just noticed that another person proposed a PR (#80) for that issue too, it seems. We have several reports of users having this problem on our site so I ended up here. Recent versions of Lustre will include a workaround for this, as it is recognized that this POSIX behavior is inconsistent, but cufflinks should be fixed too IMHO.

@jrdemasi
Copy link

jrdemasi commented Mar 4, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants