Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: subdatasets(contains=...) will report matching paths in result #3743

Merged
merged 3 commits into from
Oct 2, 2019

Conversation

mih
Copy link
Member

@mih mih commented Oct 1, 2019

Sorting paths (for locally present content, and unavailable stuff) into
containing (present) datasets is a key task for helper code such as
annotate_paths(). This change equips subdatasets() with the ability
to make similar reports by including a 'contains' property in its
results whenever a respective set of paths has been provided.
Such a result looks like this:

{
  "action": "subdataset",
  "contains": [
    "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1/sub2"
  ],
  "gitmodule_datalad-id": "1b1e4b70-e472-11e9-9214-f0d5bf7b5561",
  "gitmodule_name": "sub1",
  "gitmodule_url": "./sub1",
  "gitshasum": "15428154a3573488503151ddc42051281221e8f8",
  "parentds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr",
  "path": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1",
  "refds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr",
  "revision": "15428154a3573488503151ddc42051281221e8f8",
  "state": "absent",
  "status": "ok",
  "type": "dataset"
}

where paths are reported as a 'contains' list with fully resolved items.

This seems like a useful feature on its own, but is part of a move towards fixing #3368 and #3469

Sorting paths (for locally present content, and unavailable stuff) into
containing (present) datasets is a key task for helper code such as
annotate_paths(). This change equips `subdatasets()` with the ability
to make similar reports by including a 'contains' property in its
results whenever a respective set of paths has been provided.
Such a result looks like this:

```
{
  "action": "subdataset",
  "contains": [
    "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1/sub2"
  ],
  "gitmodule_datalad-id": "1b1e4b70-e472-11e9-9214-f0d5bf7b5561",
  "gitmodule_name": "sub1",
  "gitmodule_url": "./sub1",
  "gitshasum": "15428154a3573488503151ddc42051281221e8f8",
  "parentds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr",
  "path": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1",
  "refds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr",
  "revision": "15428154a3573488503151ddc42051281221e8f8",
  "state": "absent",
  "status": "ok",
  "type": "dataset"
}
```

where paths are reported as a 'contains' list with fully resolved items.
This is improves its utility for sorting parts into datasets further.
Anything that doesn't match a subdataset must either be part of the
existing base dataset, or does not exist at all.
@codecov
Copy link

codecov bot commented Oct 1, 2019

Codecov Report

Merging #3743 into master will decrease coverage by 22.14%.
The diff coverage is 87.5%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3743       +/-   ##
===========================================
- Coverage   82.98%   60.83%   -22.15%     
===========================================
  Files         273      273               
  Lines       35918    35933       +15     
===========================================
- Hits        29807    21861     -7946     
- Misses       6111    14072     +7961
Impacted Files Coverage Δ
datalad/local/tests/test_subdataset.py 36.55% <0%> (-63.45%) ⬇️
datalad/local/subdatasets.py 75% <100%> (+3.15%) ⬆️
datalad/distribution/get.py 83% <100%> (+0.13%) ⬆️
datalad/support/tests/test_fileinfo.py 12.24% <0%> (-87.76%) ⬇️
datalad/support/tests/test_repodates.py 12.96% <0%> (-87.04%) ⬇️
datalad/interface/tests/test_ls_webui.py 14.28% <0%> (-85.72%) ⬇️
datalad/tests/test_protocols.py 16.36% <0%> (-83.64%) ⬇️
...ad/distributed/tests/test_create_sibling_gitlab.py 17.07% <0%> (-82.93%) ⬇️
datalad/interface/tests/test_save.py 17.2% <0%> (-82.8%) ⬇️
datalad/metadata/tests/test_aggregation.py 16.26% <0%> (-82.78%) ⬇️
... and 121 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bb3d22...f634bea. Read the comment docs.

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming my push gets the tests in a passing state)

if not contains_hits:
# we are not looking for this subds, because it doesn't
# match the target path
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main price we pay for this added information is not aborting the for c in contains iteration early if one of the elements is a hit. OK.

If an argument given to `contains` doesn't have a subdataset match,
subdatasets() now yields a record with status="impossible" rather than
nothing.
@kyleam
Copy link
Contributor

kyleam commented Oct 2, 2019

appveyor failure from the known stalling of test_autoenabled_remote_msg.

@mih
Copy link
Member Author

mih commented Oct 2, 2019

Thx @kyleam for the review and this fix! I will keep this open for now, until I know it is suitable for use in get().

@mih
Copy link
Member Author

mih commented Oct 2, 2019

Ok, it can go in now. get is kinda done.

@mih mih merged commit 13ee498 into datalad:master Oct 2, 2019
@mih mih deleted the enh-subdatasets branch October 2, 2019 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants