-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why does os.rename only sometimes fail on s3fs? #100
Comments
Are those failed files greater than 5 GB? |
No, much smaller. However I'm thinking this is actually a different issue. It looks like maybe the file is in the manifest, but doesn't actually exist? What is happening here?
|
There seems to be 3 separate threads trying to download the same file, which basically caused race condition. Not sure if the file appeared multiple times in the manifest or if something went wrong in the download tool.
|
Yes, it turns out the manifest has several versions of the file that are all specified to be written to the same path:
What can we do about this? Or do we just ignore it? |
And to go back to the original question, what is the issue with os.rename on s3fs? In my testing os.rename works just fine on s3fs (if there isn't some other reason why the rename would fail on ANY fs) |
NDA should either fix their backend to not generate manifest with duplicated files or have the client side do de-duplication. @gregmagdits In terms of s3fs, unlike a real filesystem rename is done by a copy and a remove operation, so not atomic. For files larger than 5GB, it has to use the multipart upload interface to do server side copy instead of a straight object copy. This kind of operation is generally less efficient for S3 and more error prone. For example, the official AWS S3 VFS mountpoint doesn't even want to support this operation awslabs/mountpoint-s3#506. I think the C++ s3fs-fuse is handling large file rename correctly. May I ask if your final destination is S3, why would you use s3fs? I think nda-tools technically support download to S3 directly. (We ran into issues with nda implementation before and they were supposed to be fixed) |
Because we're writing a fairly generic tool (rapidtide-cloud to run rapidtide in AWS Batch), and one of the things that I want to be generic is the backing of our data-ingest location. I suppose I could write two codepaths depending on whether the backing is s3fs or a traditional filesystem, but that would be annoying. |
There is already a back-end procedure that renames files when there is a name collision. (related to #88 ) . We will look into the cause of the duplicated entries in package 1184998 and send an update |
Package 1184998 is the original package that was created by HCP before the de-dupe procedure was in place. |
nda-tools/NDATools/Download.py
Line 464 in f459a02
I'm finding that this
os.rename
call only sometimes fails on a s3fs mountpoint. Any idea why that is?The text was updated successfully, but these errors were encountered: