Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: gitrepo: Work around change in 'ls-files -o' output #3904

Merged
merged 2 commits into from Dec 8, 2019

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Dec 7, 2019

save_() calls 'ls-files -o' on a list of untracked directories to
determine which directories correspond to untracked submodules.  If
the directories remain unexpanded in the 'ls-files -o' output, they
are taken as submodules.

This method of identifying untracked submodules breaks [0,1] with the
latest release of Git (v2.24.0) due to an unintentional change in the
'ls-files -o' output [2]: when given _multiple_ pathspecs, 'ls-files
-o' recurses into untracked submodules and lists the files from the
submodule (even the tracked ones).  save_() filters the output to
directories, so most of the additional entries are removed, but when
the submodule itself has submodules (untracked or tracked) these and
those from any deeper levels are included in the output.  save_()
passes these deeper repositories to add_submodule(), which as expected
leads to 'git submodule add' failing.

This behavior will hopefully be addressed in Git itself [2], but we
should still provide a workaround for the current version of Git.  Add
a helper that filters these deeper repositories from the list of
submodules that save_() feeds to add_submodule().  This approach takes
advantage of the fact that, even when 'ls-files -o' misbehaves and
traverses into the submodule, it still reports an unexpanded entry.
If it didn't, we wouldn't be able to distinguish a submodule from a
regular directory.

Note that the helper is conditionally defined at the module level; for
earlier versions of Git that don't need this kludge, this reduces the
cost per save_() call to a single call to an identity function.

0: #3890 (comment)
1: #3902 (comment)
2: https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/T/#u


Marking as "do not merge" because the tip commit should be dropped before merging.

@kyleam kyleam added the do not merge label Dec 7, 2019
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 7, 2019

Looks like that resolved the two tests that were failing due to changed ls-files -o output interacting with GitRepo.save_. There was still one failure (core.local.tests.test_diff.test_path_diff) in the Git 2.24.0 run. Assuming that's related to the ls-files -o change, that should probably be bundled with this PR. I'll have to look into that.

https://travis-ci.org/datalad/datalad/jobs/621884338#L1153

@kyleam kyleam added the WIP label Dec 7, 2019
@mih
Copy link
Member

@mih mih commented Dec 7, 2019

Thanks a lot @kyleam for the analysis and the fix!!

@mih mih mentioned this pull request Dec 7, 2019
kyleam added a commit to kyleam/datalad that referenced this issue Dec 7, 2019
As described in the previous commit, the traversal behavior of
'ls-files -o' in the latest release of Git, v2.24.0, has changed.  The
workaround in that commit fixed two tests that failed with v2.24.0,
but there is a remaining failure, test_path_diff [0].  Local testing
confirmed that it is due to the same change in Git, 89a1f4aaf7 (dir:
if our pathspec might match files under a dir, recurse into it,
2019-09-17).

The failures addressed in the last commit were about 'ls-files -o'
without --directory traversing into untracked nested repositories when
multiple pathspecs are given.  In contrast the test_path_diff failure
happens due to traversing into a plain untracked directory when
--directory is specified along with multiple pathspecs.  For example,
before 89a1f4aaf7 the following will not traverse into an untracked
directory:

  git-89a1f4aaf7^ $ git ls-files -o --directory . a
  a
  untracked-dir/

As of 89a1f4aaf7 the same command includes untracked content from the
untracked directory:

  git-89a1f4aaf7 $ git ls-files -o --directory . a
  a
  untracked-dir/
  untracked-dir/b

Note that if the "a" pathspec were dropped, the call would _not_
traverse into the untracked directory (i.e., "untracked-dir/b" would
not be included).

For now let's just skip the assertion on Git v2.24 because (1) unlike
the GitRepo.save_() case from the previous commit, it's not clear that
this discrepancy in the 'datalad diff' output is critical in any way,
and (2) it looks like the change in behavior may be addressed
upstream.

[0]: datalad#3904 (comment)
@codecov
Copy link

@codecov codecov bot commented Dec 7, 2019

Codecov Report

Merging #3904 into master will increase coverage by 33.91%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #3904       +/-   ##
===========================================
+ Coverage   46.74%   80.66%   +33.91%     
===========================================
  Files         269      272        +3     
  Lines       36045    36065       +20     
===========================================
+ Hits        16850    29091    +12241     
+ Misses      19195     6974    -12221
Impacted Files Coverage Δ
datalad/support/gitrepo.py 83.62% <100%> (+0.79%) ⬆️
datalad/core/local/tests/test_diff.py 99.41% <100%> (+0.6%) ⬆️
tools/coverage-bin/git-annex-remote-datalad 100% <0%> (ø)
tools/coverage-bin/datalad 100% <0%> (ø)
...ols/coverage-bin/git-annex-remote-datalad-archives 100% <0%> (ø)
datalad/config.py 98.85% <0%> (+0.38%) ⬆️
datalad/ui/progressbars.py 83.1% <0%> (+0.67%) ⬆️
datalad/support/external_versions.py 93.98% <0%> (+1.5%) ⬆️
... and 140 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20bda0f...a4a6e29. Read the comment docs.

kyleam added 2 commits Dec 7, 2019
save_() calls 'ls-files -o' on a list of untracked directories to
determine which directories correspond to untracked submodules.  If
the directories remain unexpanded in the 'ls-files -o' output, they
are taken as submodules.

This method of identifying untracked submodules breaks [0,1] with the
latest release of Git (v2.24.0) due to an unintentional change in the
'ls-files -o' output [2]: when given _multiple_ pathspecs, 'ls-files
-o' recurses into untracked submodules and lists the files from the
submodule (even the tracked ones).  save_() filters the output to
directories, so most of the additional entries are removed, but when
the submodule itself has submodules (untracked or tracked) these and
those from any deeper levels are included in the output.  save_()
passes these deeper repositories to add_submodule(), which as expected
leads to 'git submodule add' failing.

This behavior will hopefully be addressed in Git itself [2], but we
should still provide a workaround for the current version of Git.  Add
a helper that filters these deeper repositories from the list of
submodules that save_() feeds to add_submodule().  This approach takes
advantage of the fact that, even when 'ls-files -o' misbehaves and
traverses into the submodule, it still reports an unexpanded entry.
If the unexpanded entry weren't present, we wouldn't be able to
distinguish a submodule from a regular directory.

Note that the helper is conditionally defined at the module level; for
earlier versions of Git that don't need this kludge, this reduces the
cost per save_() call to a single call to an identity function.

[0]: datalad#3890 (comment)
[1]: datalad#3902 (comment)
[2]: https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/T/#u
As described in the previous commit, the traversal behavior of
'ls-files -o' in the latest release of Git, v2.24.0, has changed.  The
workaround in that commit fixed two tests that failed with v2.24.0,
but there is a remaining failure, test_path_diff [0].  Local testing
confirmed that it is due to the same change in Git, 89a1f4aaf7 (dir:
if our pathspec might match files under a dir, recurse into it,
2019-09-17).

The failures addressed in the last commit were about 'ls-files -o'
without --directory traversing into untracked nested repositories when
multiple pathspecs are given.  In contrast the test_path_diff failure
happens due to traversing into a plain untracked directory when
--directory is specified along with multiple pathspecs.  For example,
before 89a1f4aaf7 the following will not traverse into an untracked
directory:

  git-89a1f4aaf7^ $ git ls-files -o --directory . a
  a
  untracked-dir/

As of 89a1f4aaf7 the same command includes untracked content from the
untracked directory:

  git-89a1f4aaf7 $ git ls-files -o --directory . a
  a
  untracked-dir/
  untracked-dir/b

Note that if the "a" pathspec were dropped, the call would _not_
traverse into the untracked directory (i.e., "untracked-dir/b" would
not be included).

For now let's just skip the assertion on Git v2.24 because (1) unlike
the GitRepo.save_() case from the previous commit, it's not clear that
this discrepancy in the 'datalad diff' output is critical in any way,
and (2) it looks like the change in behavior may be addressed
upstream.

[0]: datalad#3904 (comment)
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 7, 2019

Assuming that's related to the ls-files -o change, that should probably be bundled with this PR. I'll have to look into that.

The failure was indeed related to the same ls-files -o change in behavior.

range-diff
1:  6d4f237d1 ! 1:  ca234631a BF: gitrepo: Work around change in 'ls-files -o' output
    @@ Commit message
         submodules that save_() feeds to add_submodule().  This approach takes
         advantage of the fact that, even when 'ls-files -o' misbehaves and
         traverses into the submodule, it still reports an unexpanded entry.
    -    If it didn't, we wouldn't be able to distinguish a submodule from a
    -    regular directory.
    +    If the unexpanded entry weren't present, we wouldn't be able to
    +    distinguish a submodule from a regular directory.
     
         Note that the helper is conditionally defined at the module level; for
         earlier versions of Git that don't need this kludge, this reduces the
-:  --------- > 2:  a4a6e2955 TST: diff: Skip an assertion that fails with Git v2.24.0
2:  fe6da1383 = 3:  86457008a [drop] travis: Use upstream Git

@kyleam kyleam removed the WIP label Dec 7, 2019
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 8, 2019

The travis build with the latest git and the github action that uses the latest git both passed. I'll drop the tip commit.

@kyleam kyleam removed the do not merge label Dec 8, 2019
@mih
Copy link
Member

@mih mih commented Dec 8, 2019

Great! Appveyor stalls in the 2nd run, but that is nothing new. Will merge.

@mih mih merged commit 6031944 into datalad:master Dec 8, 2019
14 of 15 checks passed
@kyleam kyleam deleted the ls-files-kludge branch Dec 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants