Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit recursive install on a per-subdataset basis #1598

Merged
merged 6 commits into from
Jul 6, 2017

Conversation

mih
Copy link
Member

@mih mih commented Jun 21, 2017

Docs are in the diff.

Data dependency datasets in the studyforrest collection want to marked up with this, before they come on board.

If set to 'skip', the respective subdataset is skipped when DataLad
is recursively installing its superdataset. However, the subdataset
remains installable when explicitly requested, and no other features
are impaired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if I do want to get the full tree/hierarchy, since it must stop at some point, how would I do it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is an actual use case it would need a switch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we better just record dataset uuid and have a configurable per install strategy on what to do with datasets which were already installed elsewhere (above current dataset). With all the new ways of handling args, could we also pass such information inside?
My point is that I am not sure if it is the right place to decide... Sure thing we could add more switches and knobs as we discover more

@mih
Copy link
Member Author

mih commented Jun 21, 2017

WOW, replicated the publish segfault!!! With completely different diff. I guess we exceeded the amount of possible changes....

@mih
Copy link
Member Author

mih commented Jun 21, 2017 via email

@mih
Copy link
Member Author

mih commented Jun 21, 2017

Just to clarify: without a switch for a maintainer to indicate that a subdataset is not for install we will have a problem once studyforrest is part of ///

Simply because some subdatasets will never be available, hence install -r /// always fail from that day onwards.

@yarikoptic
Copy link
Member

No. When you install top level you know uuids of datasets at that level. When you install a subdataset recursively, you can know what datasets available at the levels above

@mih
Copy link
Member Author

mih commented Jun 21, 2017

It does not help in the situation that I outlined, such dataset will never be installed and always cause failure.

@yarikoptic
Copy link
Member

Could you please point to that "outline"? ;-)

@mih
Copy link
Member Author

mih commented Jun 21, 2017

5cm up: #1598 (comment)

@yarikoptic
Copy link
Member

Ah, so for some internal subdatasets, not to be shared (yet) etc. Yeah, for those needs explicit marker. I thought it was too be used also for marking the ones included in multiple places within the bigger collection

@mih
Copy link
Member Author

mih commented Jun 21, 2017

In studyforrest I use that same marker for both. There is simply no point in having them installed on install -r, but not because some variant of some dataset might have popped up somewhere due to a magically "correct" order. These datasets are inputs to their parents. They only ever need to be there to recompute the parent. If I want that I use datalad to grab the files right from the non-installed subdatasets. The included test documents that this functionality is not impaired.

@mih
Copy link
Member Author

mih commented Jun 21, 2017

That being said, the marker could have a different name. For example, it would be nice to be able to say "uninstall all input datasets --recursive"

@codecov-io
Copy link

codecov-io commented Jun 22, 2017

Codecov Report

Merging #1598 into master will decrease coverage by 35.74%.
The diff coverage is 27.02%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1598       +/-   ##
===========================================
- Coverage   85.54%   49.79%   -35.75%     
===========================================
  Files         260      254        -6     
  Lines       29848    27750     -2098     
===========================================
- Hits        25532    13817    -11715     
- Misses       4316    13933     +9617
Impacted Files Coverage Δ
datalad/support/annexrepo.py 54.87% <100%> (-31.34%) ⬇️
datalad/interface/tests/test_annotate_paths.py 18.88% <19.23%> (-81.12%) ⬇️
datalad/distribution/get.py 84.97% <33.33%> (-5.03%) ⬇️
datalad/distribution/subdatasets.py 83.33% <42.85%> (-13.32%) ⬇️
datalad/crawler/oldconfig/tests/test_config.py 12.5% <0%> (-87.5%) ⬇️
datalad/support/tests/test_stats.py 13.11% <0%> (-86.89%) ⬇️
datalad/support/tests/utils.py 14.28% <0%> (-85.72%) ⬇️
datalad/tests/test_config.py 14.45% <0%> (-85.55%) ⬇️
datalad/tests/test_protocols.py 14.81% <0%> (-85.19%) ⬇️
datalad/support/tests/test_gitrepo.py 14.83% <0%> (-85.17%) ⬇️
... and 215 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27b6f1c...1a98765. Read the comment docs.

@mih
Copy link
Member Author

mih commented Jun 22, 2017

So there you have it. This PR has virtually no new code that actually runs, yet it still segfaults. My point: the cause is not in the PRs, but it is already in master and we are sampling random variations that make it finally go KABOOM.

I am out of ideas. In any case, stopping to merge PRs because of this segfault seems counterproductive.

https://travis-ci.org/datalad/datalad/jobs/245645529

@yarikoptic
Copy link
Member

I will look at segfault as soon as I find Ethernet Port/internet for the laptop

@mih mih force-pushed the enh-recursion branch 2 times, most recently from 28766dd to d0c5eb6 Compare June 24, 2017 07:44
@mih mih changed the base branch from master to dev-master July 6, 2017 18:37
@mih mih merged commit dae6154 into datalad:dev-master Jul 6, 2017
@yarikoptic
Copy link
Member

So now we could install study Forrest? :-)

@mih
Copy link
Member Author

mih commented Jul 7, 2017 via email

@mih mih deleted the enh-recursion branch July 7, 2017 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants