Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

having full local paths in .csv for data files still requires some location to be entered #14

Open
yarikoptic opened this issue Jun 26, 2019 · 4 comments

Comments

@yarikoptic
Copy link
Contributor

which makes little sense. I think analysis should be done in code and if paths are absolute, there should be no question asked. So I had to enter / directory as the location:

/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-recognitionFood_acq-MB8_run-02_bold.nii.gz
/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-rest_acq-MB8_run-01_bold.nii.gz
Press the "Enter" key to specify directory/directories OR an s3 location by entering -s3 <bucket name> to locate your associated files:/
Building Package

which later on failed anyways with

  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 352, in batch_update_status
    list_data = self.generate_data_for_request(status)
  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 132, in generate_data_for_request
    size = self.full_file_path[file][1]
KeyError: 'data/NDA/output_v2/sub-719218_ses-01_task-rest_acq-MB8_run-01_bold.metadata.zip'
@obenshaindw
Copy link
Contributor

@yarikoptic thanks for the issue. We can consider adding first a try that looks for the files using the paths provided (make the assumption paths are absolute), then prompt user for help in locating the files.

We definitely want to preserve the feature that allows paths to be relative and not start at the root, as NDA will preserve these paths into the creation of S3 Objects and future downloads of the data. The inability for the tool to match the strings in the CSV file to file locations on the user's system (or S3 Object storage) is one of the more frequent HelpDesk tickets, and we definitely need to look at how to make this easier and/or better documented. @ericearl had similar frustrations in getting the tool to locate manifest files and the associated files therein.

Any suggestions for how to make this better are welcome, I think we will try to address all current open issues in our next sprint. Depending on what that looks like we may try to tackle some improvements to bids2nda as well.

@yarikoptic
Copy link
Contributor Author

Well, in the simplest case it is just a matter of using os.path.isabs on the paths and making sure that leading / is not stripped away for them.

But related issue is that ATM that AFAIK the entirety of the full path is now used to establish the "paths into the creation of S3 Objects", so for a file with full path /home/login/sensitiveinfo/blah/datasetroot/subdir/file.nii.gz all of the components (home/login/sensitiveinfo/blah/datasetroot/subdir/file.nii.gz) will be reflected in the path on S3, correct? Ideally, home/login/sensitiveinfo/blah/datasetroot/ should not be a part of it since not really pertinent to the internal dataset hierarchy. A few ideas on possible ways to handle that:

  • strip away all path prefixes using those provided in -l option thus leaving only relative paths within datasets. Additional check should be done that there is no conflict(s) (if there is multiple root directories - relative paths from their tops might collide)
    • may be worth forcing having all materials to be available from under a single top level directory (i.e. /home/login/sensitiveinfo/blah/datasetroot/) and thus forcing having only a single path to be provided to -l.
    • may be, if any absolute path is found, worth forcing user to provide "dataset top directory" via -l, checking that full path is under it first.
  • make it automagical by stripping away common prefix in all the paths (which will be /home/login/sensitiveinfo/blah/datasetroot/ if there is multiple subdirs to be uploaded), but it might be too magical/easy to break.

@obenshaindw
Copy link
Contributor

which makes little sense. I think analysis should be done in code and if paths are absolute, there should be no question asked. So I had to enter / directory as the location:

/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-recognitionFood_acq-MB8_run-02_bold.nii.gz
/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-rest_acq-MB8_run-01_bold.nii.gz
Press the "Enter" key to specify directory/directories OR an s3 location by entering -s3 <bucket name> to locate your associated files:/
Building Package

which later on failed anyways with

  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 352, in batch_update_status
    list_data = self.generate_data_for_request(status)
  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 132, in generate_data_for_request
    size = self.full_file_path[file][1]
KeyError: 'data/NDA/output_v2/sub-719218_ses-01_task-rest_acq-MB8_run-01_bold.metadata.zip'

@yarikoptic can you provide a copy of the file you started with and the arguments you provided? We are looking to make some improvements to how the script handles file paths and are having a little difficulty reproducing the exact issue. Thanks.

@yarikoptic
Copy link
Contributor Author

@yarikoptic can you provide a copy of the file you started with and the arguments you provided?

eh heh, quite a bit of time has passed. We are still looking around but I am afraid it might be gone. So far got only the one with relative paths

We are looking to make some improvements to how the script handles file paths and are having a little difficulty reproducing the exact issue.

I am surprised now that I have not posted the version (commit) of the nda-tools we used, but judging from the date, if anything fixed it should be in the e9ba5b6..66c48a0 range. But I do not spot anything which could be relevant. I guess for the next upload we will try to protocol the situation better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants