-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for (unrelased) DataLad 0.13.0 #506
Conversation
Codecov Report
@@ Coverage Diff @@
## master #506 +/- ##
==========================================
+ Coverage 89.60% 89.66% +0.05%
==========================================
Files 148 148
Lines 12290 12388 +98
==========================================
+ Hits 11013 11108 +95
- Misses 1277 1280 +3
Continue to review full report at Codecov.
|
The Travis run isn't included in the checks box (at least at the moment), but here's the run: https://travis-ci.org/github/ReproNim/reproman/builds/673081638 The build that covers the most (*) edit: As expected, the |
Updated to resolve conflicts with master. range-diff
|
The upcoming commits will depend on features and fixes that will be a part of DataLad's 0.13 release.
The changes in this series depend on the 0.13 release. While not done by this commit, we should eventually do version checks on the remote end as well (issue 477). In that case, datalad v0.12.x would probably be sufficient, though I'm not sure it's worth making the distinction.
add() was deprecated in DataLad v0.12 and has been dropped in the upcoming 0.13 release. Closes ReproNim#463.
In DataLad v0.12 the module was moved to datalad.core.local.run; datalad.interface.run is now just a compatibility shim.
DataLad v0.12 exposed a call_git() method that should be used instead of _git_custom_command().
With the upcoming DataLad 0.13 release, create-sibling now supports local paths in addition to SSH URLs. Closes ReproNim#462.
In order to run something that involves submodules, we of course need to populate them. create-sibling won't do this for us as of DataLad's 78e00dcd23 (RF: do not run submodule update --init in the post-update hook, 2019-04-10), so let's do a blanket 'update --init'. This will probably need to be refined later, as getting all subdatasets is unlikely to be desirable. And we very likely should try harder to stay on a branch. This shouldn't functionally matter with the upcoming 'update --follow=parent' change, but it risks confusing users that inspect the remote directly.
By default 'git fetch' will try to fetch submodules if the recorded commit isn't available locally. Avoid this because 1) with the upcoming 0.13 release of Datalad, update() try to fetch an commit even if it isn't in a ref that's brought down by the initial fetch [*] and 2) submodule.c hard codes 'origin' as a fallback to fetch from, so the fetch is likely to fail, as 'run' names remotes by the resource name. [*] See DataLad's dbd84e5662 (ENH: update: Try to fetch submodule commit explicitly if needed, 2020-02-07). Fixes ReproNimgh-499.
head_at() already aborts with a dirty repository to avoid losing any changes. But, if a submodule changed between the current and target commit, the repository may become dirty after checking out the target. In that case, the submodule is carrying an undefined set of changes that could cause confusion (e.g., if the submodule change came in with a save).
datalad-pair's fetch() fetches the job ref from the resource and then relies on 'datalad update' to bring down changes from the remote. If the merged in branch doesn't contain the job ref, then we temporarily switch to the job ref, get the outputs, and then let the caller know that the changes weren't brought in. This approach is problematic with submodules. One issue is that the head switch mentioned above does not check out submodules, and in general DataLad doesn't yet offer a way to set/reload a dataset hierarchy to a particular state. Another issue is that update() selects a merge target from the submodule remote that doesn't necessarily contain the registered commit. See DataLad's 94efa8637a (NF: update: Add option to merge in recorded submodule commit, 2020-02-17) for more information. In what will be the 0.13 release, update() learned a --follow=parentds mode that makes update() take the registered commit as the merge target when updating submodules. Use it. However, this still doesn't deal with bringing a non-mainline job ref into the top-level dataset because update() unconditionally merges in a branch from the remote. Try harder to bring in the job ref by doing an explicit 'git merge' of the ref before the update() call.
5dd5d8a
to
cdce122
Compare
tools/ci/install_datalad
Outdated
@@ -7,5 +7,5 @@ sudo apt-get install git-annex-standalone | |||
# Install datalad system-wide for use with localhost ssh | |||
sudo apt-get install datalad | |||
# ... and install it into the virtualenv. | |||
pip install datalad | |||
pip install git+https://github.com/datalad/datalad.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that there is a datalad 0.13.0rc1 I think it should be sufficient to declare ~= 0.13.0rc1
for datalad I think. That would limit though to 0.13
"major" series of datalad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will be reverted and setup.py adjusted with the dependency anyway.
To help debug failures, make all results available at the DEBUG level, and display status="error" results directly in the OrchestratorError message. At this point, we're not doing any analysis of the error message, so remove the datalad-update suggestion. As demonstrated by ReproNimgh-441, this may fail for a variety of reasons, and in some cases the datalad-update snippet will just be confusing. Eventually it might be good to do more analysis on an error to give targeted suggestions. And at the very least it'd probably be good to format the message file if it's coming in as a (<format string>, <args>) tuple. Closes ReproNim#441. Re: ReproNim#511
A recursive create-sibling is called only if the top-level remote dataset doesn't exist. That means that, after an initial run, another run that has new subdatasets locally will fail because the subdataset won't yet have a remote for this resource. Update prepare_remote() to always call create-sibling, skipping existing targets, to handle this situation. Ref: ReproNim#511 (comment)
This should have been done when we started using DataLad functionality. Adjust the .travis.yml install target so that DataLad continues to only be pulled in for the INSTALL_DATALAD=1 run.
This series updates
run
-related functionality for changes in the next feature release of DataLad. That version will contain some functionality that was prompted by issues on ReproMan's end, in particular improvements todatalad update
anddatalad create-sibling
.The changes above will require bumping our dependency to DataLad 0.13.0, so other commits in this series prune no longer needed compatibility kludges. In addition, 0.13 removes the deprecated
datalad add
command, so a commit here migrates todatalad save
(which was held off to retain compatibility with DataLad 0.11.x line).I'm marking this as a draft because (1) this shouldn't be merged until DataLad 0.13.0 is released and (2) this should resolve at least a few open issues, but I still need to reference them in the commit messages and add relevant tests. The latter will probably point out the need for at least minor tweaks.