Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "standard" path resolution logic #3435

Merged
merged 9 commits into from
May 21, 2019
Merged

New "standard" path resolution logic #3435

merged 9 commits into from
May 21, 2019

Conversation

mih
Copy link
Member

@mih mih commented May 20, 2019

Talked about it multiple times: here it is.

Any absolute path given to a command is used as such. Any relative pathis interpreted as being
relative to the CWD, unless an also provided dataset argument is a Dataset instance (not just a
path to or inside a dataset).

I briefly looked into implementing this for all commands, but that seems to be a bottomless pit -- to expensive for me and for now. Instead, I implemented it for rev_resolve_path() that is used by all commands that already comply with the structure proposed in #3192.

  • implement logic change
  • adjust tests
  • add ability to process multiple paths without re-localization a dataset each and every time (immediately benefits distribution.subdatasets -> local.subdatasets #3429)
  • turn EnsureDataset into a NOOP with documentation purpose only. This constraint class auto instantiates Dataset objects based on command line arguments (paths), essentially disabling the ability to detect and act on the key distinction of the decision logic above.
  • @kyleam please double-check 971620e
    km: Looks fine (and outputs come in as absolute paths, so string/instance for dataset would be the same anyway).

mih added 3 commits May 20, 2019 07:46
…commands

From the diff:

    Any path is returned as an absolute path. If, and only if, a dataset
    object instance is given as `ds`, relative paths are interpreted as
    relative to the given dataset. In all other cases, relative paths are
    treated as relative to the current working directory.
This should be a standard pattern, but has to be worked around with
external list comprehensions. But this requires dataset localization
each time, hence needlessly slow.
@codecov
Copy link

codecov bot commented May 20, 2019

Codecov Report

Merging #3435 into master will decrease coverage by 0.07%.
The diff coverage is 97.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3435      +/-   ##
==========================================
- Coverage    91.2%   91.13%   -0.08%     
==========================================
  Files         265      265              
  Lines       34489    34766     +277     
==========================================
+ Hits        31457    31684     +227     
- Misses       3032     3082      +50
Impacted Files Coverage Δ
datalad/distribution/tests/test_dataset.py 99.67% <100%> (ø) ⬆️
datalad/interface/run.py 100% <100%> (ø) ⬆️
datalad/core/local/tests/test_save.py 100% <100%> (ø) ⬆️
datalad/distribution/dataset.py 96.51% <96.42%> (-0.28%) ⬇️
datalad/support/gitrepo.py 86.58% <0%> (-3.4%) ⬇️
datalad/downloaders/http.py 83.73% <0%> (-2.78%) ⬇️
datalad/metadata/search.py 83.68% <0%> (+0.47%) ⬆️
datalad/downloaders/tests/test_http.py 92.32% <0%> (+1.23%) ⬆️
datalad/interface/run_procedure.py 91.82% <0%> (+1.88%) ⬆️
datalad/metadata/metadata.py 89.69% <0%> (+4.17%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cddb04c...9308db9. Read the comment docs.

@mih
Copy link
Member Author

mih commented May 20, 2019

All tests pass -- minus the neuroimaging procedure issue that is not specific to this PR.

@yarikoptic
Copy link
Member

Any absolute path given to a command is used as such. Any relative pathis interpreted as being
relative to the CWD, unless an also provided dataset argument is a Dataset instance (not just a
path to or inside a dataset).

Just to make sure I/we get it right. It means, that in the command line it will always be relative to the CWD, so for any operation on relative paths users would need to cd to that directory, correct? If so, in my experience it is an ok compromise -- I don't remember relying on relative paths with specification of the dataset (as the ways to achive cd effect) much since paths completion is not there then.

I briefly looked into implementing this for all commands, but that seems to be a bottomless pit -- to expensive for me and for now.

I was hoping that it should be doable due to all the centralization (AnnotatePaths?) of paths handling... I think we really wouldn't want to treat paths differently or is that already the case?

@mih
Copy link
Member Author

mih commented May 20, 2019

Just to make sure I/we get it right. It means, that in the command line it will always be relative to the CWD, so for any operation on relative paths users would need to cd to that directory, correct?

Not sure I can follow. If you have a relative path at the cmdline, it is typically relative to CWD (where else would you have it from?). So there is no CDing needed in general. If you have a path relative to any other reference then yes, as there is no way we could know which one that would be. We used to make the distinction between ./here and here the former being relative to CWD and latter relative to a given datasets, but only if there was any, relative to CWD otherwise -- but we decided that this is overcomplicated and prone to failure.

I was hoping that it should be doable due to all the centralization (AnnotatePaths?) of paths handling... I think we really wouldn't want to treat paths differently or is that already the case?

I cannot comprehensively describe the status quo. I have no plans to make any changes to annotate paths other then a final git rm. If this has to be done in one go, it needs to wait to all commands are reimplemented without it.

@kyleam
Copy link
Contributor

kyleam commented May 20, 2019

For posterity: Related to #3282 and #2589.

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good, and from my POV shouldn't wait for all annotated-paths-based commands to be reworked or for someone to try to make annotate_paths follow the same convention.

@mih
Copy link
Member Author

mih commented May 21, 2019

Let's go for incremental improvement now, instead of ultimate awesomeness later.

@mih mih merged commit 8514268 into datalad:master May 21, 2019
@mih mih deleted the rf-paths branch May 21, 2019 04:45
kyleam added a commit to kyleam/datalad that referenced this pull request Jun 4, 2019
download_url() needs to be updated in two ways for the new path
resolution logic from dataladgh-3435: (1) the dataset argument is assumed to
be a dataset instance if it is specified but as of fc25ac5
EnsureDataset no longer converts a string dataset to an instance
and (2) paths are to be taken as relative to the current working
directory unless a dataset _instance_ is given.

We could update get_dataset_pwds(), but let's instead stop using it in
download_url() because it was used to avoid duplicating some logic
and, with the new path resolution, most of that logic has fallen away.
Also, the fate of get_dataset_pwds() isn't clear because its other
caller, run(), also needs to be updated for dataladgh-3435 but that is likely
to be more involved.

Fixes datalad#3468.
kyleam added a commit to kyleam/datalad that referenced this pull request Jun 4, 2019
download_url() needs to be updated in two ways for the new path
resolution logic from dataladgh-3435: (1) the dataset argument is assumed to
be a dataset instance if it is specified but as of fc25ac5
EnsureDataset no longer converts a string dataset to an instance
and (2) paths are to be taken as relative to the current working
directory unless a dataset _instance_ is given.

We could update get_dataset_pwds(), but let's instead stop using it in
download_url() because it was used to avoid duplicating some logic
and, with the new path resolution, most of that logic has fallen away.
Also, the fate of get_dataset_pwds() isn't clear because its other
caller, run(), also needs to be updated for dataladgh-3435 but that is likely
to be more involved.

Fixes datalad#3468.
kyleam added a commit to kyleam/datalad that referenced this pull request Jun 4, 2019
download_url() needs to be updated in two ways for the new path
resolution logic from dataladgh-3435: (1) the dataset argument is assumed to
be a dataset instance if it is specified but as of fc25ac5
EnsureDataset no longer converts a string dataset to an instance
and (2) paths are to be taken as relative to the current working
directory unless a dataset _instance_ is given.

We could update get_dataset_pwds(), but let's instead stop using it in
download_url() because it was used to avoid duplicating some logic
and, with the new path resolution, most of that logic has fallen away.
Also, the fate of get_dataset_pwds() isn't clear because its other
caller, run(), also needs to be updated for dataladgh-3435 but that is likely
to be more involved.

Fixes datalad#3468.
kyleam added a commit to kyleam/datalad that referenced this pull request Jun 25, 2019
download_url() needs to be updated in two ways for the new path
resolution logic from dataladgh-3435: (1) the dataset argument is assumed to
be a dataset instance if it is specified but as of fc25ac5
EnsureDataset no longer converts a string dataset to an instance
and (2) paths are to be taken as relative to the current working
directory unless a dataset _instance_ is given.

We could update get_dataset_pwds(), but let's instead stop using it in
download_url() because it was used to avoid duplicating some logic
and, with the new path resolution, most of that logic has fallen away.
Also, the fate of get_dataset_pwds() isn't clear because its other
caller, run(), also needs to be updated for dataladgh-3435 but that is likely
to be more involved.

Fixes datalad#3468.
kyleam added a commit to kyleam/datalad that referenced this pull request Jul 11, 2019
EnsureDataset no longer converts a string to a dataset instance as of
fc25ac5 (dataladgh-3435).  This causes issues in commands that assume the
dataset argument, if specified, is a dataset instance.  A few spots
have already been updated (9af316f, bba278f, f38d72c).  Fix what
are hopefully the only remaining issues, aside from those with run(),
which will be dealt with separately.

Note that most of the commands were checked with a combination of a
command line call with `-d .` and looking at the code for the dataset
handling, so it is likely that a couple of issues have been
overlooked, but hopefully the bulk of the problems are resolved at
this point.
kyleam added a commit to kyleam/datalad that referenced this pull request Jul 11, 2019
This function was moved to utils.py to avoid duplicating some logic,
but it is no longer used anywhere outside of run and run wrappers.  To
update get_dataset_pwds() for the new path handling (dataladgh-3435), it will
need to use distribution.dataset.Dataset, which doesn't seem
appropriate to import into utils.py, so let's move the helper back to
run.py.  We can move it somewhere else if the need arises.
kyleam added a commit to kyleam/datalad that referenced this pull request Jul 11, 2019
Following the changes in dataladgh-3435, teach run() that a specified dataset
might not be a dataset instance and that paths should be relative to
the dataset only if an instance is given.

The main change in behavior then is exposed when using --dataset with
a run call from the command line.  This can lead to some pretty weird
cases (e.g., the directory in which the command is executed can be
outside the dataset where the record is made).  But using 'run
--dataset ...'  from the command line isn't typical or (arguably)
recommended, so these cases probably aren't something to guard
against.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants