Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: clone: --reckless='ephemeral' #4099

Merged
merged 14 commits into from Feb 13, 2020

Conversation

bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Jan 28, 2020

Introduce reckless mode 'ephemeral' for throw-away clones, symlinking their .git/annex to origin's .git/annex.
In addition declare 'here' dead to not propagate availability info from it when publishing captured changes back.

bpoldrack added 2 commits Jan 28, 2020
Introduce reckless mode 'ephemeral' for throw-away clones, symlinking their .git/annex to origin's .git/annex.
In addition declare 'here' dead to not propagate availability info from it when publishing captured changes back.
(Closes datalad#4008)
    For use in a datalad command context, we shouldn't
    assume to be able to write to tmpfile and also not import a whole lot from
    datalad's test machinery. Finally, we want to know, whether we can create a
    symlink at a specific location, not just somewhere. Therefore use
    arbitrary path to test-build a symlink and delete afterwards. Suiteable
    location can therefore be determined by high lever code.
    datalad.tests.utils.has_symlink_capability RF'ed to make use of the
    new one.
@codecov
Copy link

@codecov codecov bot commented Jan 28, 2020

Codecov Report

Merging #4099 into master will increase coverage by 0.11%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4099      +/-   ##
==========================================
+ Coverage   89.63%   89.74%   +0.11%     
==========================================
  Files         274      275       +1     
  Lines       36742    37772    +1030     
==========================================
+ Hits        32932    33897     +965     
- Misses       3810     3875      +65     
Impacted Files Coverage Δ
datalad/support/tests/test_fileinfo.py 100.00% <0.00%> (ø) ⬆️
datalad/local/tests/test_subdataset.py 100.00% <0.00%> (ø) ⬆️
datalad/support/tests/test_repodates.py 100.00% <0.00%> (ø) ⬆️
datalad/interface/tests/test_ls_webui.py 100.00% <0.00%> (ø) ⬆️
datalad/core/local/tests/test_save.py 97.86% <0.00%> (+0.53%) ⬆️
datalad/tests/test_protocols.py 100.00% <0.00%> (ø) ⬆️
datalad/distribution/tests/test_dataset.py 99.67% <0.00%> (ø) ⬆️
...ad/distributed/tests/test_create_sibling_gitlab.py 100.00% <0.00%> (ø) ⬆️
datalad/interface/tests/test_save.py 100.00% <0.00%> (ø) ⬆️
datalad/support/tests/test_ansi_colors.py 100.00% <0.00%> (ø) ⬆️
... and 169 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e3d8e8c...f4bcb8b. Read the comment docs.

On windows annex v7 adjusted branch is some kind of link file
containing path to annex object. We have no logic to deal with
that yet. So, for now assume that when annex does the right
thing everywhere else it will likely do it there as well.
bpoldrack added 3 commits Jan 29, 2020
the only thing remaining of ephemeral is git-annex-dead 'here', if we couldn't symlink annex
If clone did run with a ria+ URL and therefore postclonecfg_ria was called,
check whether there's a corresponding, enabled special remote and set
publish-depends on it for 'origin'
@bpoldrack
Copy link
Member Author

@bpoldrack bpoldrack commented Jan 29, 2020

Currently not able to grasp what's going on in the windows test. Also it fails differently on our windows testing box. Any thoughts, @mih?

Edit:

Hm. I know. It's the issue with short paths again. So, solution pretty much depends on #4057.

@mih mih mentioned this pull request Feb 7, 2020
2 tasks
@mih mih added this to the 0.12.3 milestone Feb 7, 2020
@bpoldrack bpoldrack added the merge-if-ok OP considers this work done, and requests review/merge label Feb 11, 2020
Copy link
Contributor

@kyleam kyleam left a comment

Thanks @bpoldrack. The patches look good to me (all my comments are minor nits).

Playing around with this a bit, I noticed that I could trigger a failure if I did anything in the original repository (the one the symlink points to). Have you hit into any issues on your end? For example, with git-annex 7.20191230, I can trigger a failure with the following change to the added test:

diff --git a/datalad/core/distributed/tests/test_clone.py b/datalad/core/distributed/tests/test_clone.py
index bb8a148ac2..2c354ab230 100644
--- a/datalad/core/distributed/tests/test_clone.py
+++ b/datalad/core/distributed/tests/test_clone.py
@@ -785,6 +785,9 @@ def test_ephemeral(origin_path, clone1_path, clone2_path):
     # 1. clone via path
     clone1 = clone(origin_path, clone1_path, reckless='ephemeral')
 
+    (origin.pathobj / "foo").write_text("foo")
+    origin.save()
+
     can_symlink = has_symlink_capability()
 
     if can_symlink:
======================================================================
ERROR: datalad.core.distributed.tests.test_clone.test_ephemeral
----------------------------------------------------------------------
Traceback (most recent call last):
[...]
  File "/home/kyle/src/python/datalad/datalad/core/distributed/tests/test_clone.py", line 819, in test_ephemeral
    clone1.save()
[...]
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 3717, in _save_post
    careless=True,
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 1488, in commit
    index_file=index_file)
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 316, in newfunc
    result = func(self, files_new, *args, **kwargs)
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 1896, in _git_custom_command
    expect_fail=expect_fail)
  File "/home/kyle/src/python/datalad/datalad/support/gitrepo.py", line 1933, in _run_command_files_split
    *args, **kwargs)
  File "/home/kyle/src/python/datalad/datalad/cmd.py", line 711, in run
    cmd, env=self.get_git_environ_adjusted(env), *args, **kwargs)
  File "/home/kyle/src/python/datalad/datalad/cmd.py", line 544, in run
    raise CommandError(str(cmd), msg, status, out[0], out[1])
datalad.support.exceptions.CommandError: CommandError: command '['git', 'commit', '-m', '[DATALAD] Recorded changes']' failed with exitcode 1
Failed to run ['git', 'commit', '-m', '[DATALAD] Recorded changes'] under '/tmp/datalad_temp_test_ephemeralrtns097w'. Exit code=1. out= err=(recording state in git...)
error: invalid object 100644 9ac791b8d1472b37d15a298e27474cef8c7111e2 for '576/8dd/MD5E-s3--acbd18db4cc2f85cedef654fccc4a4d8.log'
fatal: git-write-tree: error building trees
git-annex: failed to read sha from git write-tree
CallStack (from HasCallStack):
  error, called at ./Git/Sha.hs:18:15 in main:Git.Sha
[...]

repo._run_annex_command('untrust', annex_options=['here'])

elif reckless == 'ephemeral':
# with ephemeral we declare 'here' as 'dead' right away, whenever
# we symlink origins annex. Because we want annex to copy to
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/origins/origin's/

# with ephemeral we declare 'here' as 'dead' right away, whenever
# we symlink origins annex. Because we want annex to copy to
# the ria remote to get availability info correct for an eventual
# git-push into the store
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time parsing/understanding from "Because [...]" to here. If it is meant to convey what is said in the commit message (re: not propagating back), I think that phrasing is clearer.

'remote.origin.annex-ignore', 'true',
where='local')

ds.repo._run_annex_command('dead', annex_options=['here'])
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ds.repo.set_remote_dead('here')?

if origin_annex_url:
try:
# deal with file:// scheme URLs as well as plain paths
# if origin isn't local, we have nothing to do
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment nit-pick: I'm frequently tripped up by comments in this project that drop a period and treat a new line as a sentence separator. As a reader, I would prefer to not have to guess from the context. Like here, I'd guess that it should be "paths. If", but I'm not sure.

origin_git_path = Path(PathRI(origin_annex_url).localpath)
if origin_git_path.name != '.git':
origin_git_path /= '.git'
except Exception:
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a more specific exception be caught here?

if origin_git_path.name != '.git':
origin_git_path /= '.git'
except Exception:
# TODO: What level? + note, that annex-dead is independ
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/independ/independent/? I'm not sure what the note part of the to-do is about.

target_is_directory=True)
else:
# TODO: What level? + note, that annex-dead is independ
lgr.warning("Unable to create symlinks.")
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something about "reckless=ephemeral" should be added to this message.

@@ -763,3 +765,72 @@ def test_ria_http_storedataladorg(path):
ok_(ds.is_installed())
eq_(ds.id, datalad_store_testds_id)


@skip_if_on_windows # see gh-
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/gh-/&4131/

@bpoldrack
Copy link
Member Author

@bpoldrack bpoldrack commented Feb 12, 2020

Thanks, @kyleam - will address those comments.

Playing around with this a bit, I noticed that I could trigger a failure if I did anything in the original repository (the one the symlink points to). Have you hit into any issues on your end?

I can reproduce that, yes. However, I don't think we can do much about that, other than maybe enhancing the doc for it (and possibly complain to Joey). Annex complains about the object for the file you saved in origin, since in clone1 it doesn't know about it. IMO this shouldn't necessarily lead to failure to commit on annex' end, since an unknown object in the object tree doesn't really hurt (particularly not the new object that we tried to commit).

So, in my (current) view, we should add to the doc, that a symlinked annex comes with some potential issues (it's called reckless after all). A save should prob. be prepended by an update ( git fetch actually is sufficient), which solves the case you set up. The test case would still fail on publishing back, since it's then not fast-forward, of course.

The things we possibly could do, I think, are:

  1. catch that error on save, detect whether .git/annex is symlinked and suggest to update first via error message OR

  2. save ephemeral mode in repo (we prob. should anyway, since parameters to clone should be stored as the default for subdatasets - just noticed, that I forgot to make sure that's true for reckless as well) and based on that config have an automatic fetch prior to a save.

What do you think @mih ?

bpoldrack added 3 commits Feb 12, 2020
and use RI rather than PathRI as this was the intention: Figure, whether
something is local, not declaring it local.
PathRI('https://somewhere/remote').localpath would yield a local path
'https://somewhere/remote' instead of failure.
@mih
Copy link
Member

@mih mih commented Feb 12, 2020

@kyleam Thanks for the review!

@bpoldrack your reasoning seems sound!

@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 12, 2020

@bpoldrack Thanks for the pushed updates. Those look good to me.

@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 12, 2020

For posterity:

  • original discussion around this idea: gh-3521
  • related issue that is partially fulfilled by this pr: gh-3528

@bpoldrack
Copy link
Member Author

@bpoldrack bpoldrack commented Feb 12, 2020

Changed storing the reckless mode accordingly. And added to the doc.
From my point of view this is it for this PR. If you guys agree, @kyleam , @mih , I would make an issue regarding dealing with detecting the potential issue from within save and have a separate PR for that.

@kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 12, 2020

Sounds good to me. Thanks @bpoldrack.

'datalad.clone.reckless', reckless,
where='local',
# delay reload until all config IO is done
reload=False)
Copy link
Contributor

@kyleam kyleam Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lack of a reload is what's behind the new test failures. In the deleted call above, the very next line has reload=True.

Copy link
Member Author

@bpoldrack bpoldrack Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@mih mih mentioned this pull request Feb 13, 2020
@bpoldrack bpoldrack merged commit 217f977 into datalad:master Feb 13, 2020
16 of 17 checks passed
@yarikoptic yarikoptic modified the milestones: 0.12.3, 0.12.x Feb 27, 2020
@bpoldrack bpoldrack deleted the enh-clone-ephemeral branch Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants