Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save: Update for "no commits" sub-repo fix in Git #3476

Merged
merged 4 commits into from Jul 1, 2019

Conversation

@kyleam
Copy link
Contributor

@kyleam kyleam commented Jun 10, 2019

Saving an untracked sub-repository without a commit checked out [0]
results in an ugly state: the sub-repository is _not_ added as a
submodule but its untracked files are added as blobs in the
_superdataset_.  This is not usually an issue for datasets because
these get initial commits on creation, so we documented this as a
known issue rather than introducing additional checks that would come
with a performance penalty.

The core problem is a combination of 'git submodule add' accepting a
sub-repository on an unborn branch and 'git ls-files' considering such
a sub-repository a directory rather than a repository.  Both issues
are fixed in Git 2.22.0 [1].  Update the documentation note and tests
accordingly.

[0] Usually this is just a repository without any commits, but it
    could be repository that has commits but is currently on an unborn
    branch.
[1] https://public-inbox.org/git/20190409230737.26809-1-kyle@kyleam.com/
    Commits e13811189b and b22827045e.

Re: #3139


Marking as "do not merge" because there's a temporary commit to enable the Travis run with the latest Git and because this needs and sits on top of gh-3470.

@codecov
Copy link

@codecov codecov bot commented Jun 12, 2019

Codecov Report

Merging #3476 into master will decrease coverage by 16.54%.
The diff coverage is 46.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #3476       +/-   ##
==========================================
- Coverage   83.05%   66.5%   -16.55%     
==========================================
  Files         269     269               
  Lines       34933   34946       +13     
==========================================
- Hits        29014   23242     -5772     
- Misses       5919   11704     +5785
Impacted Files Coverage Δ
datalad/core/local/save.py 76% <ø> (-12%) ⬇️
datalad/support/gitrepo.py 51.64% <100%> (-24.33%) ⬇️
datalad/core/local/tests/test_save.py 97.28% <42.85%> (-2.72%) ⬇️
datalad/tests/utils_testdatasets.py 11.76% <0%> (-88.24%) ⬇️
datalad/interface/tests/test_annotate_paths.py 23.74% <0%> (-76.26%) ⬇️
datalad/core/local/run.py 25.64% <0%> (-70.26%) ⬇️
datalad/metadata/metadata.py 18.38% <0%> (-69.92%) ⬇️
datalad/distribution/utils.py 24.39% <0%> (-65.86%) ⬇️
datalad/support/stats.py 32% <0%> (-65.34%) ⬇️
datalad/interface/run_procedure.py 23.89% <0%> (-64.16%) ⬇️
... and 101 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c9aef0...de954bb. Read the comment docs.

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Jun 12, 2019

@kyleam kyleam force-pushed the save-no-commits branch from e5eba6f to 2d38011 Jun 12, 2019
kyleam added 2 commits Jun 25, 2019
As the comment says, this is about adjusted branches rather than
Windows specifically, so use is_managed_branch().  (That method also
returns true for a repository in direct mode, but that's not relevant
for master.)
This test is about the unborn branch edge case, but we might as well
check that the adjusted branch is in the state that we expect.
@kyleam kyleam force-pushed the save-no-commits branch from 2d38011 to 874623d Jun 25, 2019
@kyleam kyleam force-pushed the save-no-commits branch 3 times, most recently from dc8d436 to de8c1c7 Jun 25, 2019
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Jun 25, 2019

OK, since I opened this PR, ppa:git-core/ppa has been updated to include the latest Git (v2.22.0). So testing v2.22.0-specific bits of this PR on Travis should be as easy as a temporary commit that comments out the cron bit of upstream git build. But

Any ideas about what the issue is?

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Jun 26, 2019

There's a lot of noise on this PR (mostly from me trying to debug a Travis-specific failure), so let me summarize the outstanding issues:

  • Travis recently switched its default image from Trusty to Xenial, and the Xenial image doesn't have third-party apt repositories enabled by default. When I enable them, the build exits (successfully) right after our install-latest-git.sh script runs. If I try to declare "dist: trusty" (f8504fc), I see the same thing.

    This isn't a blocker because I can work around it by setting "apt: packages: git" (656c525). That isn't ideal because a build that is usually on cron will do more work than needed, but at least it actually executes the tests and seems to use the new Git version.

    This workaround should be applied on 0.11.x and merged down.

    Update: Looks like "run upstream git build" logic was broken from the beginning. Stupid mistake by me. Fixed in gh-3494.

  • test_save.test_surprise_subds is failing on Travis with the latest Git.

    https://travis-ci.org/datalad/datalad/builds/550815965#L1154

    This is confusing because it passes for me locally with v2.22.0. I'm still trying to figure out what could be going on.

kyleam added a commit to kyleam/datalad that referenced this issue Jun 26, 2019
Travis recently switched to using Xenial by default, and the Xenial
image, unlike the Trusty one, doesn't have third-party
apt-repositories [0].  Selectively enable ppa:git-core/ppa in the
upstream Git build because we rely on that repository to get the
latest Git.

Note that we install git through "addons: apt: packages:" rather than
our tools/ci/install-latest-git.sh script.  This has the downside that
Git is installed unnecessarily, both for builds that have _DL_CRON set
and for builds where the bundled Git matches the latest upstream one.
We do it because otherwise the build exits (with a status of 0) right
after the installation happens [1].  The same happens if we use "dist:
trusty", which should have the third-party repositories enabled [2].

[0]: https://docs.travis-ci.com/user/reference/xenial/#third-party-apt-repositories-removed
[1]: https://travis-ci.org/datalad/datalad/jobs/550376442
[2]: https://travis-ci.org/datalad/datalad/builds/550835880

Re: datalad#3476
If we see an untracked directory in the get_content_info(...,
untracked="all") output, we know it's a sub-repository because
otherwise git would show the untracked files individually (or nothing
at all in the case of an empty directory).

We call add_submodule on these paths, which takes care of adding the
paths to the index, there's no reason to later pass these paths to
'git add' or 'git annex add'.

Most of the time that is just a noop, but it is problematic in one
situation, stemming from a change in how Git treats sub-repositories
without any commits checked out.  Starting with v2.22.0, these are
treated as any other repository, so the call to add_submodule will
appropriately fail.  Before v2.22.0, untracked files in the
sub-repository would confusingly be added to the _parent_ repo.  The
problems comes when the all of the following conditions are met:
DATALAD_USE_DEFAULT_GIT is set, the system git is at least v2.22.0,
and the bundled git is below v2.22.0.  In this case, 'git annex add
empty-subrepo' will have the old and undesirable behavior of adding
the file to the parent repo.

Avoid this by dropping the unnecessary sub-repository paths from the
'git annex add' call.
Saving an untracked sub-repository without a commit checked out [0]
results in an ugly state: the sub-repository is _not_ added as a
submodule but its untracked files are added as blobs in the
_superdataset_.  This is not usually an issue for datasets because
these get initial commits on creation, so we documented this as a
known issue rather than introducing additional checks that would come
with a performance penalty.

The core problem is a combination of 'git submodule add' accepting a
sub-repository on an unborn branch and 'git ls-files' considering such
a sub-repository a directory rather than a repository.  Both issues
are fixed in Git 2.22.0 [1].  Update the documentation note and tests
accordingly.

[0] Usually this is just a repository without any commits, but it
    could be repository that has commits but is currently on an unborn
    branch.
[1] https://public-inbox.org/git/20190409230737.26809-1-kyle@kyleam.com/
    Commits e13811189b and b22827045e.

Re: datalad#3139
@kyleam kyleam force-pushed the save-no-commits branch 2 times, most recently from 1daf394 to 1b78c01 Jun 27, 2019
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Jun 27, 2019

test_save.test_surprise_subds is failing on Travis with the latest Git.

OK, this passed on the latest upstream git run. Explanation in 4e73078 (BF: gitrepo: Don't pass added submodules to git{,-annex} add).

The only failing test on the upstream git run was the webui test that was fixed by gh-3492 (which has yet to make its way to master).

https://travis-ci.org/datalad/datalad/jobs/551290494

I'll drop 1b78c01 ([drop] un-cron upstream git build for testing) and from my POV this is good to go.

@kyleam kyleam force-pushed the save-no-commits branch from 1b78c01 to de954bb Jun 27, 2019
kyleam added a commit to kyleam/datalad that referenced this issue Jun 27, 2019
Travis recently switched to using Xenial by default, and the Xenial
image, unlike the Trusty one, doesn't have third-party
apt-repositories [0].  Selectively enable ppa:git-core/ppa in the
upstream Git build because we rely on that repository to get the
latest Git.

[0]: https://docs.travis-ci.com/user/reference/xenial/#third-party-apt-repositories-removed

Re: datalad#3476
@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jul 1, 2019

Looks proper to me, filed #3500 for appveyor failure, adjusted here test passes there:

datalad.core.local.tests.test_save.test_surprise_subds ... ok

so merging

@yarikoptic yarikoptic merged commit 29ca196 into datalad:master Jul 1, 2019
2 of 3 checks passed
@kyleam kyleam deleted the save-no-commits branch Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants