Skip to content

Commit

Permalink
DOC+TST: save: Update for "no commits" sub-repo fix in Git
Browse files Browse the repository at this point in the history
Saving an untracked sub-repository without a commit checked out [0]
results in an ugly state: the sub-repository is _not_ added as a
submodule but its untracked files are added as blobs in the
_superdataset_.  This is not usually an issue for datasets because
these get initial commits on creation, so we documented this as a
known issue rather than introducing additional checks that would come
with a performance penalty.

The core problem is a combination of 'git submodule add' accepting a
sub-repository on an unborn branch and 'git ls-files' considering such
a sub-repository a directory rather than a repository.  Both issues
are fixed in Git 2.22.0 [1].  Update the documentation note and tests
accordingly.

[0] Usually this is just a repository without any commits, but it
    could be repository that has commits but is currently on an unborn
    branch.
[1] https://public-inbox.org/git/20190409230737.26809-1-kyle@kyleam.com/
    Commits e13811189b and b22827045e.

Re: #3139
  • Loading branch information
kyleam committed Jun 10, 2019
1 parent 4ac23e7 commit 2d38011
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 9 deletions.
8 changes: 4 additions & 4 deletions datalad/core/local/save.py
Expand Up @@ -81,10 +81,10 @@ class Save(Interface):
% dataset save -d <path_to_dataset> --version-tag bestyet
.. note::
For performance reasons, any Git repository without an initial commit
located inside a Dataset is ignored, and content underneath it will be
saved to the respective superdataset. DataLad datasets always have an
initial commit, hence are not affected by this behavior.
Before Git v2.22, any Git repository without an initial commit located
inside a Dataset is ignored, and content underneath it will be saved to
the respective superdataset. DataLad datasets always have an initial
commit, hence are not affected by this behavior.
"""
# note above documents that out behavior is like that of `git add`, but
# does not explicitly mention the connection to keep it simple.
Expand Down
23 changes: 18 additions & 5 deletions datalad/core/local/tests/test_save.py
Expand Up @@ -42,6 +42,7 @@
from datalad.distribution.dataset import Dataset
from datalad.support.annexrepo import AnnexRepo
from datalad.support.exceptions import CommandError
from datalad.support.external_versions import external_versions
from datalad.api import (
save,
create,
Expand Down Expand Up @@ -662,18 +663,30 @@ def test_surprise_subds(path):
# If subrepo is an adjusted branch, it would have a commit, making most of
# this test irrelevant because it is about the unborn branch edge case.
adjusted = somerepo.is_managed_branch()
# This edge case goes away with Git v2.22.0.
fixed_git = external_versions['cmd:git'] >= '2.22.0'

# save non-recursive
ds.save(recursive=False)
res = ds.save(recursive=False, on_failure='ignore')
if not adjusted and fixed_git:
# We get an appropriate error about no commit being checked out.
assert_in_results(res, action='add_submodule', status='error')

# the content of both subds and subrepo are not added to their
# respective parent as no --recursive was given
assert_repo_status(subds.path, untracked=['subfile'])
assert_repo_status(somerepo.path, untracked=['subfile'])

if adjusted:
# adjusted branch: #datalad/3178 (that would have a commit)
modified = [subds.pathobj, somerepo.pathobj]
untracked = []
if adjusted or fixed_git:
if adjusted:
# adjusted branch: #datalad/3178 (that would have a commit)
modified = [subds.pathobj, somerepo.pathobj]
untracked = []
else:
# Newer Git versions refuse to add a sub-repository with no commits
# checked out.
modified = [subds.pathobj]
untracked = ['d1']
assert_repo_status(ds.path, modified=modified, untracked=untracked)
assert_not_in(ds.repo.pathobj / 'd1' / 'subrepo' / 'subfile',
ds.repo.get_content_info())
Expand Down

0 comments on commit 2d38011

Please sign in to comment.