-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New "standard" path resolution logic #3435
Conversation
…commands From the diff: Any path is returned as an absolute path. If, and only if, a dataset object instance is given as `ds`, relative paths are interpreted as relative to the given dataset. In all other cases, relative paths are treated as relative to the current working directory.
This should be a standard pattern, but has to be worked around with external list comprehensions. But this requires dataset localization each time, hence needlessly slow.
Codecov Report
@@ Coverage Diff @@
## master #3435 +/- ##
==========================================
- Coverage 91.2% 91.13% -0.08%
==========================================
Files 265 265
Lines 34489 34766 +277
==========================================
+ Hits 31457 31684 +227
- Misses 3032 3082 +50
Continue to review full report at Codecov.
|
All tests pass -- minus the neuroimaging procedure issue that is not specific to this PR. |
Just to make sure I/we get it right. It means, that in the command line it will always be relative to the CWD, so for any operation on relative paths users would need to
I was hoping that it should be doable due to all the centralization (AnnotatePaths?) of paths handling... I think we really wouldn't want to treat paths differently or is that already the case? |
Not sure I can follow. If you have a relative path at the cmdline, it is typically relative to CWD (where else would you have it from?). So there is no CDing needed in general. If you have a path relative to any other reference then yes, as there is no way we could know which one that would be. We used to make the distinction between
I cannot comprehensively describe the status quo. I have no plans to make any changes to annotate paths other then a final |
[ci skip]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good, and from my POV shouldn't wait for all annotated-paths-based commands to be reworked or for someone to try to make annotate_paths follow the same convention.
Let's go for incremental improvement now, instead of ultimate awesomeness later. |
download_url() needs to be updated in two ways for the new path resolution logic from dataladgh-3435: (1) the dataset argument is assumed to be a dataset instance if it is specified but as of fc25ac5 EnsureDataset no longer converts a string dataset to an instance and (2) paths are to be taken as relative to the current working directory unless a dataset _instance_ is given. We could update get_dataset_pwds(), but let's instead stop using it in download_url() because it was used to avoid duplicating some logic and, with the new path resolution, most of that logic has fallen away. Also, the fate of get_dataset_pwds() isn't clear because its other caller, run(), also needs to be updated for dataladgh-3435 but that is likely to be more involved. Fixes datalad#3468.
download_url() needs to be updated in two ways for the new path resolution logic from dataladgh-3435: (1) the dataset argument is assumed to be a dataset instance if it is specified but as of fc25ac5 EnsureDataset no longer converts a string dataset to an instance and (2) paths are to be taken as relative to the current working directory unless a dataset _instance_ is given. We could update get_dataset_pwds(), but let's instead stop using it in download_url() because it was used to avoid duplicating some logic and, with the new path resolution, most of that logic has fallen away. Also, the fate of get_dataset_pwds() isn't clear because its other caller, run(), also needs to be updated for dataladgh-3435 but that is likely to be more involved. Fixes datalad#3468.
download_url() needs to be updated in two ways for the new path resolution logic from dataladgh-3435: (1) the dataset argument is assumed to be a dataset instance if it is specified but as of fc25ac5 EnsureDataset no longer converts a string dataset to an instance and (2) paths are to be taken as relative to the current working directory unless a dataset _instance_ is given. We could update get_dataset_pwds(), but let's instead stop using it in download_url() because it was used to avoid duplicating some logic and, with the new path resolution, most of that logic has fallen away. Also, the fate of get_dataset_pwds() isn't clear because its other caller, run(), also needs to be updated for dataladgh-3435 but that is likely to be more involved. Fixes datalad#3468.
download_url() needs to be updated in two ways for the new path resolution logic from dataladgh-3435: (1) the dataset argument is assumed to be a dataset instance if it is specified but as of fc25ac5 EnsureDataset no longer converts a string dataset to an instance and (2) paths are to be taken as relative to the current working directory unless a dataset _instance_ is given. We could update get_dataset_pwds(), but let's instead stop using it in download_url() because it was used to avoid duplicating some logic and, with the new path resolution, most of that logic has fallen away. Also, the fate of get_dataset_pwds() isn't clear because its other caller, run(), also needs to be updated for dataladgh-3435 but that is likely to be more involved. Fixes datalad#3468.
EnsureDataset no longer converts a string to a dataset instance as of fc25ac5 (dataladgh-3435). This causes issues in commands that assume the dataset argument, if specified, is a dataset instance. A few spots have already been updated (9af316f, bba278f, f38d72c). Fix what are hopefully the only remaining issues, aside from those with run(), which will be dealt with separately. Note that most of the commands were checked with a combination of a command line call with `-d .` and looking at the code for the dataset handling, so it is likely that a couple of issues have been overlooked, but hopefully the bulk of the problems are resolved at this point.
This function was moved to utils.py to avoid duplicating some logic, but it is no longer used anywhere outside of run and run wrappers. To update get_dataset_pwds() for the new path handling (dataladgh-3435), it will need to use distribution.dataset.Dataset, which doesn't seem appropriate to import into utils.py, so let's move the helper back to run.py. We can move it somewhere else if the need arises.
Following the changes in dataladgh-3435, teach run() that a specified dataset might not be a dataset instance and that paths should be relative to the dataset only if an instance is given. The main change in behavior then is exposed when using --dataset with a run call from the command line. This can lead to some pretty weird cases (e.g., the directory in which the command is executed can be outside the dataset where the record is made). But using 'run --dataset ...' from the command line isn't typical or (arguably) recommended, so these cases probably aren't something to guard against.
Talked about it multiple times: here it is.
Any absolute
path
given to a command is used as such. Any relativepath
is interpreted as beingrelative to the CWD, unless an also provided
dataset
argument is aDataset
instance (not just apath to or inside a dataset).
I briefly looked into implementing this for all commands, but that seems to be a bottomless pit -- to expensive for me and for now. Instead, I implemented it for
rev_resolve_path()
that is used by all commands that already comply with the structure proposed in #3192.distribution.subdatasets
->local.subdatasets
#3429)EnsureDataset
into a NOOP with documentation purpose only. This constraint class auto instantiates Dataset objects based on command line arguments (paths), essentially disabling the ability to detect and act on the key distinction of the decision logic above.km: Looks fine (and outputs come in as absolute paths, so string/instance for dataset would be the same anyway).