New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retire distribution.create and promote rev_create #3383
Conversation
There is no additional commit in the submodule; it just has staged changes.
Unlike distribution.create, rev_create puts a .noannex file in the repository if --no-annex was given. Adjust test_unlock_raises to remove the .noannex file before trying to convert the existing repository into an annex repository.
8389502 (RF: internal use of `create` -> `rev_create`, 2019-03-22) converted most spots, but there are still a few places outside of distribution/tests/test_create.py where we use distribution.create. Convert the remaining spots, leaving only a few in test_add.py (to be addressed next) and the ones in the benchmarks.
These tests rely on create's save argument, which rev_create dropped. In test_add_dirty_tree, the save=False is unnecessary, so use a plain rev_create call. In the other spots, use an unbound rev_create call to avoid saving the subdataset in the superdataset. The results aren't identical, but the differences don't matter in the context of these tests.
Codecov Report
@@ Coverage Diff @@
## master #3383 +/- ##
==========================================
- Coverage 91.24% 91.08% -0.16%
==========================================
Files 265 263 -2
Lines 34452 34152 -300
==========================================
- Hits 31436 31109 -327
- Misses 3016 3043 +27
Continue to review full report at Codecov.
|
With rev-create (now create), anything after the path is taken as an option to 'git init'.
datalad/datalad#3383 replaces core's create with rev-create. Add back a compatibility alias, like we already do for diff and status.
The diff looks pleasantly straightforward. Thx. |
The crawler used the
What's the cleanest way for a python caller to set this for one invocation of |
I would take one further step back and ignore the create default entirely. I see no problem or disadvantage of resetting the value after create is done, as this is a setting that lives in the local clone, and does not affect any content in the create commit. Hence:
So that is doable and less complicated than any global overrides. However, I wonder why the crawler needs a different default, and what requirements it has that demand it to be different than MD5E -- and whether we should have the same default in -core? Looking into this I see:
So it seems to me that those settings are redundant, the requirements are identical, and @yarikoptic what do you think? |
It is more than just a setting in the local clone. Both the old create and rev-create set the annex backend in .gitattributes with import os
import tempfile
import datalad.api as dl
ds = dl.create(tempfile.mkdtemp(prefix="dl-")) # rev_create
ds.config.set('annex.backends', 'SHA256E', where='local')
(ds.pathobj / "blah").write_text("ooo")
ds.rev_save()
print(os.readlink(str(ds.pathobj / "blah")))
# .git/annex/objects/37/Mv/MD5E-s3--7f94dd413148ff9ac9e9e4b6ff2b6ca9/MD5E-s3--7f94dd413148ff9ac9e9e4b6ff2b6ca9 Before asking about this, I briefly explored doing it after, with something like diff --git a/datalad_crawler/nodes/annex.py b/datalad_crawler/nodes/annex.py
index 442397f..af802d3 100644
--- a/datalad_crawler/nodes/annex.py
+++ b/datalad_crawler/nodes/annex.py
@@ -138,8 +138,6 @@ def _initiate_dataset(self, path, name):
# TODO: RF whenevever create becomes a dedicated factory/method
# and/or branch becomes an option for the "creator"
- backend = self.backend or cfg.obtain('datalad.crawl.default_backend', default='MD5E')
-
ds = create(
path=path,
force=False,
@@ -148,11 +146,14 @@ def _initiate_dataset(self, path, name):
# custom backend was specified, but now with dataset id -- should always save
# save=not bool(backend),
# annex_version=None,
- annex_backend=backend,
#git_opts=None,
#annex_opts=None,
#annex_init_opts=None
)
+ ds.repo.set_default_backend(
+ self.backend or cfg.obtain(
+ 'datalad.crawl.default_backend', default='MD5E'),
+ persistent=True, commit=True)
if self.add_to_super:
# place hack from 'add-to-super' times here
# MIH: tests indicate that this wants to discover any dataset above But |
Sorry, I missed the attributes. I see no better way than using cfg overrides (as you initially pointed out). You can also dump a file with just But @yarikoptic's feedback is still out on why having this second same default to begin with. |
I think that ideally all python API commands should support passing dictionary of configuration variables, to be in line with command line API. As for crawler, we could indeed retire it anyone asks for it back. I typically really to default anyways for any new dataset |
rev-create, soon to be the main create, doesn't allow configuring the backend via an argument. To be compatible with the new create, we could set datalad.repo.backend to whatever value is configured via a crawler-specific mechanism. But it's not clear that any crawler-specific mechanisms are needed to configure the backend, so instead warn that the crawler setting isn't honored and point to datalad.repo.backend. Re: datalad/datalad#3383 (comment)
This PR and the sister PRs (datalad/datalad-revolution#121 and datalad/datalad-crawler#41) are ready to be reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I was hoping for the diff to be like this, and now I am happy that it actually worked out like this. Thanks much for taking care of this PR.
datalad/datalad-revolution#121 is now waiting for this to be merged. I think an rc3 release is also due after the merge. |
@yarikoptic I will merge this now in order to get rc3 out as a depencency for the metadata work. Given the nature of the diff (almost exclusively a renaming) I think the chances for unexpected things a low. And we can always adjust later on. |
Remaining things to do:
adjust benchmarks
Hmm, based off of
datalad/benchmarks/api.py
Lines 11 to 17 in df1596c
perhaps there's not anything that requires adjustment.
adjust -revolution to provide
rev_create
aliasadjust -crawler (https://travis-ci.org/datalad/datalad/jobs/527865603#L1891)
Closes #3379.