-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
downloadcmd renames files from .nii.gz to .nii_N.gz #88
Comments
My workaround right now is simply:
but I'd much rather not have to do that. |
Mappings from s3 urls to locations on disk are stored in the package metadata file. These mappings are generated at the time of package creation. S3-urls are converted to relative file locations by removing the s3://NDAR_Central_x/submission_xxxxx prefix and replacing it with the data-structure to which the file was submitted. For example
should both become
Renaming files is necessary when files from 2 submissions have the same name after the submission_xxxx prefix in the s3url however it appears the logic is only taking the file-name into account (and not the file-path). We will adjust the logic and at the same time, we will update the logic to ensure the extension remains intact. This will not require an update to the client tool, but it will require users to create new packages in order to see the changes. We will post an update to this ticket when the fix has been deployed. |
Is this still planned to be fixed, or ... |
After doing some more investigation it turned out that the procedure correctly renamed the files in your package because there were other files in your package that had the same exact file-name and path. The current logic already tries to preserve the extension by adding the _# before the last ".". This logic isn't sophisticated enough to preserve extensions like .nii.gz and we started looking into potential solutions but for the time being this ticket has been de-prioritized |
I'm not sure I understand how a package can have multiple files with the same filename and path. That seems like a bug in the design of the packaging system - as in, it should not be possible to create a package like that, the package build should fail and the creator should try again. |
Yes, it was overlooked when the system was first designed. In our s3 repo, files are grouped under a submission-id, i.e.
When users create packages, we group files by data-structure instead of submission-id, so each of the three files mentioned above would map to
When the system was designed it did not take into consideration that the same file name can be used across multiple submissions, which would result in naming collisions during packaging. I do not think throwing an error during the package creation step would be ideal since it unnecessarily prevents users from accessing the rest of their data in their package, which at the moment takes a fair amount of time to create. |
Maybe I'm misunderstanding what a submission is, but if a file like that is submitted multiple times, is it expected for all the files to be identical, or not? If they are not identical, could either the first or the last submitted have the "original" (correct) filename, and all the others be renamed? The problem is that analyses expect the files to be named a specific thing - the names often have a structure or meaning to them or are even referenced by name in other files. |
NDA doesn't know whether the files are identical. In this case, since they come from the same collection I would suspect that they are, but you can calculate and compare the md5 sums to double check. |
downloadcmd appears to be renaming some files it downloads.
Here we have four identically named files but in 4 different directories (rest1 vs 2, AP vs PA). When I download them, I expect them to end up all named
brainmask_fs.2.nii.gz
, in those four directories.I use that to create a 4 line long file which will be the s3 files I request.
Run the downloadcmd:
Note it has renamed the files!
This is problematic for two reasons:
The filenames are not predictable - I expect to be able to operate on the filename I told it to download, not one downloadcmd makes up
FSL tools like FLIRT refuse to open these files because the extension is wrong.
The text was updated successfully, but these errors were encountered: