-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: subdatasets(contains=...) will report matching paths in result #3743
Conversation
Sorting paths (for locally present content, and unavailable stuff) into containing (present) datasets is a key task for helper code such as annotate_paths(). This change equips `subdatasets()` with the ability to make similar reports by including a 'contains' property in its results whenever a respective set of paths has been provided. Such a result looks like this: ``` { "action": "subdataset", "contains": [ "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1/sub2" ], "gitmodule_datalad-id": "1b1e4b70-e472-11e9-9214-f0d5bf7b5561", "gitmodule_name": "sub1", "gitmodule_url": "./sub1", "gitshasum": "15428154a3573488503151ddc42051281221e8f8", "parentds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr", "path": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr/sub1", "refds": "/tmp/datalad_temp_test_get_in_unavailable_subdatasetdzceqobr", "revision": "15428154a3573488503151ddc42051281221e8f8", "state": "absent", "status": "ok", "type": "dataset" } ``` where paths are reported as a 'contains' list with fully resolved items.
This is improves its utility for sorting parts into datasets further. Anything that doesn't match a subdataset must either be part of the existing base dataset, or does not exist at all.
Codecov Report
@@ Coverage Diff @@
## master #3743 +/- ##
===========================================
- Coverage 82.98% 60.83% -22.15%
===========================================
Files 273 273
Lines 35918 35933 +15
===========================================
- Hits 29807 21861 -7946
- Misses 6111 14072 +7961
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (assuming my push gets the tests in a passing state)
if not contains_hits: | ||
# we are not looking for this subds, because it doesn't | ||
# match the target path | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the main price we pay for this added information is not aborting the for c in contains
iteration early if one of the elements is a hit. OK.
If an argument given to `contains` doesn't have a subdataset match, subdatasets() now yields a record with status="impossible" rather than nothing.
7f1eed9
to
f634bea
Compare
appveyor failure from the known stalling of |
Thx @kyleam for the review and this fix! I will keep this open for now, until I know it is suitable for use in |
Ok, it can go in now. |
Sorting paths (for locally present content, and unavailable stuff) into
containing (present) datasets is a key task for helper code such as
annotate_paths(). This change equips
subdatasets()
with the abilityto make similar reports by including a 'contains' property in its
results whenever a respective set of paths has been provided.
Such a result looks like this:
where paths are reported as a 'contains' list with fully resolved items.
This seems like a useful feature on its own, but is part of a move towards fixing #3368 and #3469