Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why does os.rename only sometimes fail on s3fs? #100

Open
dmd opened this issue Jun 13, 2024 · 9 comments
Open

why does os.rename only sometimes fail on s3fs? #100

dmd opened this issue Jun 13, 2024 · 9 comments

Comments

@dmd
Copy link

dmd commented Jun 13, 2024

# TODO - this doesnt work when using s3fs...add ticket to make it easy to download using s3fs

I'm finding that this os.rename call only sometimes fails on a s3fs mountpoint. Any idea why that is?

@liningpan
Copy link

Are those failed files greater than 5 GB?

@dmd
Copy link
Author

dmd commented Jun 13, 2024

No, much smaller. However I'm thinking this is actually a different issue. It looks like maybe the file is in the manifest, but doesn't actually exist? What is happening here?

$ downloadcmd --username bfrederick -dp 1184998 --file-regex Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
Running NDATools Version 0.2.27
Using configuration file from /Users/dmd/.NDATools/settings.cfg
proceeding as nda user: bfrederick

No value specified for --workerThreads. Using the default option of 7
Important - You can configure the thread count setting using the --workerThreads argument to maximize your download speed.


Getting Package Information...

Package-id: 1184998
Name: HCPAgingAllFiles
Has associated files?: Yes
Number of files in package: 1414736
Total Package Size: 22.33TB

Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/package_file_metadata_1184998.txt.gz.partial
Completed download /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/package_file_metadata_1184998.txt.gz

S3 links for files that failed to download will be written out to /Users/dmd/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20240613T1226584kewmcog.csv. You can attempt to download these files later by running:
	downloadcmd -dp 1184998 --file-regex Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv -u bfrederick -d /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998 -wt 7 -t "/Users/dmd/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20240613T1226584kewmcog.csv"


Beginning download of 3 files (9.72MB) matching Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv to /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998 using 7 threads
Adding 5 files to download queue. Queue contains 5 files

Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/imagingcollection01/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/image03/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Completed download /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
[Errno 2] No such file or directory: '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial' -> '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv'
Traceback (most recent call last):
  File "/Users/dmd/venvs/apc-ve/lib/python3.11/site-packages/NDATools/Download.py", line 590, in download_from_s3link
    self.download_local(download_request, err_if_exists)
  File "/Users/dmd/venvs/apc-ve/lib/python3.11/site-packages/NDATools/Download.py", line 465, in download_local
    os.rename(download_request.partial_download_abs_path, download_request.completed_download_abs_path)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial' -> '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv'

Completed download /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/imagingcollection01/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
Completed download /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/image03/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
[Errno 2] No such file or directory: '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial' -> '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv'
Traceback (most recent call last):
  File "/Users/dmd/venvs/apc-ve/lib/python3.11/site-packages/NDATools/Download.py", line 590, in download_from_s3link
    self.download_local(download_request, err_if_exists)
  File "/Users/dmd/venvs/apc-ve/lib/python3.11/site-packages/NDATools/Download.py", line 465, in download_local
    os.rename(download_request.partial_download_abs_path, download_request.completed_download_abs_path)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial' -> '/Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv'


Finished processing all download requests @ 2024-06-13 12:27:00.466081.
     Total download requests 5
     Total errors encountered: 2

 Exiting Program...

@liningpan
Copy link

There seems to be 3 separate threads trying to download the same file, which basically caused race condition. Not sure if the file appeared multiple times in the manifest or if something went wrong in the download tool.

Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial
Starting download: /Users/dmd/NDA/nda-tools/downloadcmd/packages/1184998/fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv.partial

@dmd
Copy link
Author

dmd commented Jun 13, 2024

Yes, it turns out the manifest has several versions of the file that are all specified to be written to the same path:

$ grep Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f package_file_metadata_1184998.txt | csvcut  -c 4 | sort | uniq -c
   3 fmriresults01/HCA6110138_V1_MR/MNINonLinear/Results/rfMRI_REST2_PA/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
   1 image03/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv
   1 imagingcollection01/HCA6110138_V1_MR/unprocessed/rfMRI_REST2_PA/LINKED_DATA/PHYSIO/Physio_combined_76a6ae9e-b032-42a0-a0be-a30e9cf6c52f.csv

What can we do about this? Or do we just ignore it?

@dmd
Copy link
Author

dmd commented Jun 13, 2024

And to go back to the original question, what is the issue with os.rename on s3fs? In my testing os.rename works just fine on s3fs (if there isn't some other reason why the rename would fail on ANY fs)

@liningpan
Copy link

NDA should either fix their backend to not generate manifest with duplicated files or have the client side do de-duplication. @gregmagdits

In terms of s3fs, unlike a real filesystem rename is done by a copy and a remove operation, so not atomic. For files larger than 5GB, it has to use the multipart upload interface to do server side copy instead of a straight object copy. This kind of operation is generally less efficient for S3 and more error prone. For example, the official AWS S3 VFS mountpoint doesn't even want to support this operation awslabs/mountpoint-s3#506. I think the C++ s3fs-fuse is handling large file rename correctly.

May I ask if your final destination is S3, why would you use s3fs? I think nda-tools technically support download to S3 directly. (We ran into issues with nda implementation before and they were supposed to be fixed)

@dmd
Copy link
Author

dmd commented Jun 14, 2024

Because we're writing a fairly generic tool (rapidtide-cloud to run rapidtide in AWS Batch), and one of the things that I want to be generic is the backing of our data-ingest location. I suppose I could write two codepaths depending on whether the backing is s3fs or a traditional filesystem, but that would be annoying.

@gregmagdits
Copy link
Contributor

There is already a back-end procedure that renames files when there is a name collision. (related to #88 ) . We will look into the cause of the duplicated entries in package 1184998 and send an update

@gregmagdits
Copy link
Contributor

Package 1184998 is the original package that was created by HCP before the de-dupe procedure was in place.
You can create a new package from the original in order to get rid of the duplicate entries. To do this you need to login to NDA, navigate to the packages dashboard, select 'shared packages' from the drop down, and click 'add to my data packages' from the actions menu. This will create a new package from the original and runs the procedure which removes duplicates by appending unique suffixes to files as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants