Limit recursive install on a per-subdataset basis #1598

mih · 2017-06-21T13:47:14Z

Docs are in the diff.

Data dependency datasets in the studyforrest collection want to marked up with this, before they come on board.

yarikoptic · 2017-06-21T13:58:04Z

datalad/distribution/subdatasets.py

+        If set to 'skip', the respective subdataset is skipped when DataLad
+        is recursively installing its superdataset. However, the subdataset
+        remains installable when explicitly requested, and no other features
+        are impaired.


But what if I do want to get the full tree/hierarchy, since it must stop at some point, how would I do it?

If that is an actual use case it would need a switch.

I wonder if we better just record dataset uuid and have a configurable per install strategy on what to do with datasets which were already installed elsewhere (above current dataset). With all the new ways of handling args, could we also pass such information inside?
My point is that I am not sure if it is the right place to decide... Sure thing we could add more switches and knobs as we discover more

mih · 2017-06-21T14:26:33Z

WOW, replicated the publish segfault!!! With completely different diff. I guess we exceeded the amount of possible changes....

mih · 2017-06-21T16:50:47Z

So you are saying that anyone that installs /// should receive a list of uuids that they should supply to prevent 7 levels of redundant datasets? Maybe not...

…

On Jun 21, 2017 18:27, "Yaroslav Halchenko" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In datalad/distribution/subdatasets.py <#1598 (comment)>: > - notably slower (performs one call to Git per dataset versus a single call - for all combined). + Performance note: Property modification, requesting `bottomup` reporting + order, or a particular numerical `recursion_limit` implies an internal + switch to an alternative query implementation for recursive query that is + more flexible, but also notably slower (performs one call to Git per + dataset versus a single call for all combined). + + The following properties for subdatasets are recognized by DataLad + (without the 'gitmodule_' prefix that is used in the query results): + + "datalad-recursiveinstall" + If set to 'skip', the respective subdataset is skipped when DataLad + is recursively installing its superdataset. However, the subdataset + remains installable when explicitly requested, and no other features + are impaired. I wonder if we better just record dataset uuid and have a configurable per install strategy on what to do with datasets which were already installed elsewhere (above current dataset). With all the new ways of handling args, could we also pass such information inside? My point is that I am not sure if it is the right place to decide... Sure thing we could add more switches and knobs as we discover more — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1598 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIVHwydfg0wfdzBpowQ99SYZYWT9EACks5sGUSEgaJpZM4OA_Gw> .

mih · 2017-06-21T16:58:43Z

Just to clarify: without a switch for a maintainer to indicate that a subdataset is not for install we will have a problem once studyforrest is part of ///

Simply because some subdatasets will never be available, hence install -r /// always fail from that day onwards.

yarikoptic · 2017-06-21T17:03:53Z

No. When you install top level you know uuids of datasets at that level. When you install a subdataset recursively, you can know what datasets available at the levels above

mih · 2017-06-21T17:06:07Z

It does not help in the situation that I outlined, such dataset will never be installed and always cause failure.

yarikoptic · 2017-06-21T17:24:48Z

Could you please point to that "outline"? ;-)

mih · 2017-06-21T17:25:31Z

5cm up: #1598 (comment)

yarikoptic · 2017-06-21T17:33:08Z

Ah, so for some internal subdatasets, not to be shared (yet) etc. Yeah, for those needs explicit marker. I thought it was too be used also for marking the ones included in multiple places within the bigger collection

mih · 2017-06-21T17:38:28Z

In studyforrest I use that same marker for both. There is simply no point in having them installed on install -r, but not because some variant of some dataset might have popped up somewhere due to a magically "correct" order. These datasets are inputs to their parents. They only ever need to be there to recompute the parent. If I want that I use datalad to grab the files right from the non-installed subdatasets. The included test documents that this functionality is not impaired.

mih · 2017-06-21T17:42:11Z

That being said, the marker could have a different name. For example, it would be nice to be able to say "uninstall all input datasets --recursive"

codecov-io · 2017-06-22T05:51:32Z

Codecov Report

Merging #1598 into master will decrease coverage by 35.74%.
The diff coverage is 27.02%.

@@             Coverage Diff             @@
##           master    #1598       +/-   ##
===========================================
- Coverage   85.54%   49.79%   -35.75%     
===========================================
  Files         260      254        -6     
  Lines       29848    27750     -2098     
===========================================
- Hits        25532    13817    -11715     
- Misses       4316    13933     +9617

Impacted Files	Coverage Δ
datalad/support/annexrepo.py	`54.87% <100%> (-31.34%)`	⬇️
datalad/interface/tests/test_annotate_paths.py	`18.88% <19.23%> (-81.12%)`	⬇️
datalad/distribution/get.py	`84.97% <33.33%> (-5.03%)`	⬇️
datalad/distribution/subdatasets.py	`83.33% <42.85%> (-13.32%)`	⬇️
datalad/crawler/oldconfig/tests/test_config.py	`12.5% <0%> (-87.5%)`	⬇️
datalad/support/tests/test_stats.py	`13.11% <0%> (-86.89%)`	⬇️
datalad/support/tests/utils.py	`14.28% <0%> (-85.72%)`	⬇️
datalad/tests/test_config.py	`14.45% <0%> (-85.55%)`	⬇️
datalad/tests/test_protocols.py	`14.81% <0%> (-85.19%)`	⬇️
datalad/support/tests/test_gitrepo.py	`14.83% <0%> (-85.17%)`	⬇️
... and 215 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27b6f1c...1a98765. Read the comment docs.

mih · 2017-06-22T05:53:07Z

So there you have it. This PR has virtually no new code that actually runs, yet it still segfaults. My point: the cause is not in the PRs, but it is already in master and we are sampling random variations that make it finally go KABOOM.

I am out of ideas. In any case, stopping to merge PRs because of this segfault seems counterproductive.

https://travis-ci.org/datalad/datalad/jobs/245645529

yarikoptic · 2017-06-22T15:14:21Z

I will look at segfault as soon as I find Ethernet Port/internet for the laptop

…ter on See also dataladgh-1597

Docs are inside.

yarikoptic · 2017-07-07T01:00:43Z

So now we could install study Forrest? :-)

mih · 2017-07-07T04:37:32Z

No, functionality is not in master...

…

On Jul 7, 2017 03:00, "Yaroslav Halchenko" ***@***.***> wrote: So now we could install study Forrest? :-) — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1598 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIVHzKYkj_CERTfhTwblFnedRcTBQl9ks5sLYM7gaJpZM4OA_Gw> .

mih added the enhancement label Jun 21, 2017

yarikoptic reviewed Jun 21, 2017

View reviewed changes

mih force-pushed the enh-recursion branch from b6e469c to af516f7 Compare June 22, 2017 05:27

mih force-pushed the enh-recursion branch 2 times, most recently from 28766dd to d0c5eb6 Compare June 24, 2017 07:44

mih added 6 commits July 3, 2017 14:15

DOC: Better arg description

e9482dc

BF: Commit a modified .gitmodules, might otherwise end up in annex la…

571b748

…ter on See also dataladgh-1597

BF: Limit recursive install on a per-subdataset basis

2f08a95

Docs are inside.

BF: Protect against crash due to incomplete config

e51e70f

DOC+BF: Passify sphinx link checker

f45f79e

BF: Prevent subdatasets from setting invalid options

1a98765

mih force-pushed the enh-recursion branch from 2dec3a8 to 1a98765 Compare July 3, 2017 12:16

mih changed the base branch from master to dev-master July 6, 2017 18:37

mih merged commit dae6154 into datalad:dev-master Jul 6, 2017

mih deleted the enh-recursion branch July 7, 2017 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit recursive install on a per-subdataset basis #1598

Limit recursive install on a per-subdataset basis #1598

mih commented Jun 21, 2017

yarikoptic Jun 21, 2017

mih Jun 21, 2017

yarikoptic Jun 21, 2017

mih commented Jun 21, 2017

mih commented Jun 21, 2017 via email

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017 •

edited

Loading

mih commented Jun 21, 2017

codecov-io commented Jun 22, 2017 •

edited

Loading

mih commented Jun 22, 2017 •

edited

Loading

yarikoptic commented Jun 22, 2017

yarikoptic commented Jul 7, 2017

mih commented Jul 7, 2017 via email

Limit recursive install on a per-subdataset basis #1598

Limit recursive install on a per-subdataset basis #1598

Conversation

mih commented Jun 21, 2017

yarikoptic Jun 21, 2017

Choose a reason for hiding this comment

mih Jun 21, 2017

Choose a reason for hiding this comment

yarikoptic Jun 21, 2017

Choose a reason for hiding this comment

mih commented Jun 21, 2017

mih commented Jun 21, 2017 via email

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017

yarikoptic commented Jun 21, 2017

mih commented Jun 21, 2017 • edited Loading

mih commented Jun 21, 2017

codecov-io commented Jun 22, 2017 • edited Loading

Codecov Report

mih commented Jun 22, 2017 • edited Loading

yarikoptic commented Jun 22, 2017

yarikoptic commented Jul 7, 2017

mih commented Jul 7, 2017 via email

mih commented Jun 21, 2017 •

edited

Loading

codecov-io commented Jun 22, 2017 •

edited

Loading

mih commented Jun 22, 2017 •

edited

Loading