-
Notifications
You must be signed in to change notification settings - Fork 110
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with fresh git-annex 7.20191114+git43-ge29663773-1~ndall+1 #3890
Comments
I've started to look into some of these. I can trigger the
Given the flakiness, I haven't tried to bisect it against the annex repo, but I'm able to trigger it with the latest annex release (7.20191114) too. I've tried to reduce the DataLad script to a shell script that does the same operations, but file1.txt is left unmodified (the expected behavior) in the many times I've run the script. The other tests I've tried locally:
|
It boils down to a change in Git v2.24.0 that leads to |
save_() calls 'ls-files -o' on a list of untracked directories to determine which directories correspond to untracked submodules. If the directories remain unexpanded in the 'ls-files -o' output, they are taken as submodules. This method of identifying untracked submodules breaks [0,1] with the latest release of Git (v2.24.0) due to an unintentional change in the 'ls-files -o' output [2]: when given _multiple_ pathspecs, 'ls-files -o' recurses into untracked submodules and lists the files from the submodule (even the tracked ones). save_() filters the output to directories, so most of the additional entries are removed, but when the submodule itself has submodules (untracked or tracked) these and those from any deeper levels are included in the output. save_() passes these deeper repositories to add_submodule(), which as expected leads to 'git submodule add' failing. This behavior will hopefully be addressed in Git itself [2], but we should still provide a workaround for the current version of Git. Add a helper that filters these deeper repositories from the list of submodules that save_() feeds to add_submodule(). This approach takes advantage of the fact that, even when 'ls-files -o' misbehaves and traverses into the submodule, it still reports an unexpanded entry. If it didn't, we wouldn't be able to distinguish a submodule from a regular directory. Note that the helper is conditionally defined at the module level; for earlier versions of Git that don't need this kludge, this reduces the cost per save_() call to a single call to an identity function. [0]: datalad#3890 (comment) [1]: datalad#3902 (comment) [2]: https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/T/#u
save_() calls 'ls-files -o' on a list of untracked directories to determine which directories correspond to untracked submodules. If the directories remain unexpanded in the 'ls-files -o' output, they are taken as submodules. This method of identifying untracked submodules breaks [0,1] with the latest release of Git (v2.24.0) due to an unintentional change in the 'ls-files -o' output [2]: when given _multiple_ pathspecs, 'ls-files -o' recurses into untracked submodules and lists the files from the submodule (even the tracked ones). save_() filters the output to directories, so most of the additional entries are removed, but when the submodule itself has submodules (untracked or tracked) these and those from any deeper levels are included in the output. save_() passes these deeper repositories to add_submodule(), which as expected leads to 'git submodule add' failing. This behavior will hopefully be addressed in Git itself [2], but we should still provide a workaround for the current version of Git. Add a helper that filters these deeper repositories from the list of submodules that save_() feeds to add_submodule(). This approach takes advantage of the fact that, even when 'ls-files -o' misbehaves and traverses into the submodule, it still reports an unexpanded entry. If the unexpanded entry wasn't present, we wouldn't be able to distinguish a submodule from a regular directory. Note that the helper is conditionally defined at the module level; for earlier versions of Git that don't need this kludge, this reduces the cost per save_() call to a single call to an identity function. [0]: datalad#3890 (comment) [1]: datalad#3902 (comment) [2]: https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/T/#u
save_() calls 'ls-files -o' on a list of untracked directories to determine which directories correspond to untracked submodules. If the directories remain unexpanded in the 'ls-files -o' output, they are taken as submodules. This method of identifying untracked submodules breaks [0,1] with the latest release of Git (v2.24.0) due to an unintentional change in the 'ls-files -o' output [2]: when given _multiple_ pathspecs, 'ls-files -o' recurses into untracked submodules and lists the files from the submodule (even the tracked ones). save_() filters the output to directories, so most of the additional entries are removed, but when the submodule itself has submodules (untracked or tracked) these and those from any deeper levels are included in the output. save_() passes these deeper repositories to add_submodule(), which as expected leads to 'git submodule add' failing. This behavior will hopefully be addressed in Git itself [2], but we should still provide a workaround for the current version of Git. Add a helper that filters these deeper repositories from the list of submodules that save_() feeds to add_submodule(). This approach takes advantage of the fact that, even when 'ls-files -o' misbehaves and traverses into the submodule, it still reports an unexpanded entry. If the unexpanded entry weren't present, we wouldn't be able to distinguish a submodule from a regular directory. Note that the helper is conditionally defined at the module level; for earlier versions of Git that don't need this kludge, this reduces the cost per save_() call to a single call to an identity function. [0]: datalad#3890 (comment) [1]: datalad#3902 (comment) [2]: https://lore.kernel.org/git/87fti15agv.fsf@kyleam.com/T/#u
Going through the last cron builds for master and 0.11.x, here's the list of failures I spot.
The last two were papered over by gh-3904, with the underlying issue resolved upstream with this in-flight series. |
I revisited it and was able to come up with a script that triggered it on my end. Reported on the git-annex site. |
Here's what's being tested there: datalad/datalad/distribution/tests/test_create_sibling.py Lines 317 to 322 in c01fe8f
It seems very likely that the failure is due to using a later version of Git, but I'd guess the only problem that points to is that |
This particular case has been fixed upstream in ea3cb7d27 (2019-12-27). See Joey's comments in that bug report for why in general using |
As of git-annex 7.20191024, test_AnnexRepo_dirty has started to intermittently fail [0] due to the repository being in an unexpectedly dirty state, with content that was previously tracked by git converted to annexed content. The failure happens due to the combination of two things: 1) Calling `git -c annex.largefiles=anything annex add -- FILES` can in some cases result in git running git-annex's clean filter on files that are _not_ part of FILES. Importantly the clean filter runs within the `-c annex.largefiles=anything` context. 2) As of 7.20191024, git-annex's clean filter remembers the inode for annexed content, leading git-annex to conclude that the regular git file in the working tree (file1.txt in this test) should be annexed, because the content was tagged as such in 1. See the associated git-annex bug report [1] for a more complete description. git-annex 7.20191230, specifically commit ea3cb7d27, works around this edge case. For git-annex versions before this fix but after 7.20191024, skip the test if the "is clean" assertion fails. [0]: datalad#3890 (comment) [1]: https://git-annex.branchable.com/bugs/A_case_where_file_tracked_by_git_unexpectedly_becomes_annex_pointer_file/
As of git-annex 7.20191024, test_AnnexRepo_dirty has started to intermittently fail [0] due to the repository being in an unexpectedly dirty state, with content that was previously tracked by git converted to annexed content. The failure happens due to the combination of two things: 1) Calling `git -c annex.largefiles=anything annex add -- FILES` can in some cases result in git running git-annex's clean filter on files that are _not_ part of FILES. Importantly the clean filter runs within the `-c annex.largefiles=anything` context. 2) As of 7.20191024, git-annex's clean filter remembers the inode for annexed content, leading git-annex to conclude that the regular git file in the working tree (file1.txt in this test) should be annexed, because the content was tagged as such in 1. See the associated git-annex bug report [1] for a more complete description. git-annex 7.20191230, specifically commit ea3cb7d27, works around this edge case. For git-annex versions before this fix but after 7.20191024, skip the test if the "is clean" assertion fails. [0]: datalad#3890 (comment) [1]: https://git-annex.branchable.com/bugs/A_case_where_file_tracked_by_git_unexpectedly_becomes_annex_pointer_file/
A check in test_target_ssh_simple assumes that executing the post-update hook will change the modification times of .git/info/refs and .git/objects/info/packs. In test runs with more recent Git versions, this check has started to fail because the modification times of the files are not updated [0]. This change in behavior is likely due to Git's f4f476b6a1 (update-server-info: avoid needless overwrites, 2019-05-13), which was part of the v2.23.0 release. We could condition the check on the Git version, but, as the now failing check demonstrates, this check is much too concerned with internal implementation details of git. To avoid the test starting to fail due to another change within .git/ that we do not control, let's instead update the check to not assume that it knows the strict set of files that's modified. [0]: Re: datalad#3890 (comment)
OK, to recap the status indicated by the check list, two of six failures remain unaddressed in any way. Both of these tests appear flaky based on Travis runs. One of these I've never triggered locally, and the other one I was able to trigger early on but not recently. I don't have any good ideas how to troubleshoot those. I'm going to open dedicated issues for them and then close this issue. |
0.11.x
https://travis-ci.org/datalad/datalad/builds/615882850?utm_source=github_status&utm_medium=notification
py 2.7 and 3.5 direct (so I guess upgrades to 7)
and for v5 (also I guess upgrades to 7):
master: https://travis-ci.org/datalad/datalad/builds/615883142?utm_source=github_status&utm_medium=notification also has
may be more ... didn't look into details yet.but there are changes in behavior for annex (and/or git?)
The text was updated successfully, but these errors were encountered: