Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for latest git-annex #4214

Merged
merged 6 commits into from Feb 28, 2020
Merged

Update for latest git-annex #4214

merged 6 commits into from Feb 28, 2020

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Feb 27, 2020

This series includes a few git-annex-related updates. Most importantly, it deals with the removal of --include-dotfiles in git-annex 8.20200226 and switches to using add --force-{small,large} when available (7.20200202.7 and later) (update: should do, but put off for now).

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

Once the initial test run completes [0], this should get a temporary commit that tests with the latest -devel annex.

[0]: ... but tests don't seem to be running at all right now, as mentioned in another PR.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

Doing a fuller test sweep locally, it looks like this will need an update (not sure how much is due to the proposed changes here vs more fixups needed for newer git-annex).

@kyleam kyleam added the WIP label Feb 27, 2020
kyleam added 4 commits Feb 27, 2020
This is particularly worth doing because the latest git-annex release
(8.20200226) changes its treatment of dotfiles and removes
--include-dotfiles.
Unless git-annex is passed dotfiles explicitly, it skips them by
default.  This differs from git's behavior of adding the dotfiles
regardless of whether they are given as explicit pathspecs.  To make
'git annex add' behave more consistently with 'git add', AnnexRepo
passes --include-dotfiles to it in two independent spots, _save_add()
and add_().

However, git-annex version 8.20200226, specifically 3cd375723
(annex.dotfiles, 2019-12-26), drops --include-dotfiles and changes the
behavior described above: git-annex no longer skips dotfiles, but
instead adds them to git (unless instructed otherwise by
annex.dotfiles or annex.largefiles/--force-large).

Try to make the behavior more consistent with older git-annex versions
we support by

  * continuing to use --include-dotfiles if it is available.  We cache
    its availability in a new class method.  Note that the method is
    written in a more general way to anticipate the need for more such
    kludges, including the new --force-{large,small} options.

  * Unconditionally use '-c annex.dotfiles=true' in our git-annex
    calls so that the addition of dotfiles honors annex.largefiles and
    friends, like they do in earlier git-annex versions.

At this point, the main behavior discrepancy that we expose between
8.20200226 and earlier versions is that `repo.add(<directory>,
git=False)` will _not_ skip dotfiles with 8.20200226 while it will
with older versions.  This doesn't seem worth worrying about because
core.local.save()/repo._save_add() aren't susceptible to this issue
because they deal with status-expanded paths (i.e. they don't end up
calling `git annex add` with directories, making git-annex's old
dotfile skipping behavior irrelevant).

Re: datalad#4185
Re: datalad#4213
Right now, we only need to clean up the fsck() result in one spot, but
with 8.20200226, we also need to filter "error-messages" to several
spots.  Move this to a dedicated helper.
As of git-annex 8.20200226, test_publish_simple() and
test_publish_with_data() will fail because the remote=None fsck() will
include an message along the lines of

  test-annex.dat: Can be upgraded to an improved key format. You can
  do so by running: git annex migrate --backend=SHA256E test-annex.dat
@codecov
Copy link

codecov bot commented Feb 27, 2020

Codecov Report

Merging #4214 into maint will increase coverage by 0.21%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #4214      +/-   ##
==========================================
+ Coverage   89.10%   89.31%   +0.21%     
==========================================
  Files         275      275              
  Lines       36788    36836      +48     
==========================================
+ Hits        32779    32900     +121     
+ Misses       4009     3936      -73     
Impacted Files Coverage Δ
datalad/downloaders/http.py 70.91% <0.00%> (-2.79%) ⬇️
datalad/downloaders/tests/test_http.py 58.39% <0.00%> (-2.19%) ⬇️
datalad/utils.py 84.14% <0.00%> (+0.09%) ⬆️
datalad/distribution/siblings.py 76.89% <0.00%> (+0.30%) ⬆️
datalad/support/tests/test_annexrepo.py 95.76% <0.00%> (+0.31%) ⬆️
datalad/support/gitrepo.py 89.73% <0.00%> (+0.31%) ⬆️
datalad/tests/test_cmd.py 97.32% <0.00%> (+0.53%) ⬆️
datalad/core/local/create.py 94.73% <0.00%> (+0.75%) ⬆️
datalad/tests/utils_testrepos.py 93.80% <0.00%> (+0.88%) ⬆️
datalad/tests/test_log.py 99.01% <0.00%> (+0.98%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f32fc43...191a4e0. Read the comment docs.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

With the latest push, I've reworked/rethought the --include-dotfiles kludge and added a workaround for two publish failures that showed up with 8.20200226. I've also removed the --force-{large,small} commits. We should eventually do that, but it's not necessary to do for compatibility with the latest git-annex and it was causing a few failures I haven't yet looked into.

@kyleam kyleam removed the WIP label Feb 27, 2020
@yarikoptic yarikoptic added this to the 0.12.3 milestone Feb 27, 2020
@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

Travis is now green with git-annex 7.20190819+git2-g908476a9b-1 run: https://travis-ci.org/datalad/datalad/builds/655920337

On the other hand, the Windows builds, which use 8.20200226-g2d3ef2c07, have several failures.

Locally with git-annex 7.20200219 and 4316d92b48 (which is after v8 branch merge), I don't see these failures. So it is not just the newer annex on the Windows builds.


I'll push a temporary commit that tests with git-annex -devel on Travis.

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

I've canceled the Travis run because the version check for the presence of --include-dotfiles won't do for the current neurodebian -devel git-annex version. The check for "has --include-dotfiles" is currently ver <= "7.20200226". The value is "7.20200226" because that is the last release with --include-dotfiles. However, -devel reports as 7.20200219-135-g34d726cba based on the git describe output, so the check will decide it has --include-dotfiles despite that not being the case.

I'm not seeing a version check that would work in this case. @yarikoptic, when you have the chance, could you please build -devel from a commit after the merge of 7.20200226 into master (bfa015ae4 or later)?

@yarikoptic
Copy link
Member

yarikoptic commented Feb 27, 2020

doing now for 8.20200226-16-ge156a2b74

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

Windows builds, which use 8.20200226-g2d3ef2c07, have several failures.

  • The datalad.core run [...]

I can trigger these core failures if I use an adjusted branch. The issue related to using -c annex.dotfiles=true for the calls, but then the files going back into git afterward. If I set annex.dotfiles in .git/config, these tests pass. I suspect that -c annex.dotfiles=true is problematic in general when overridden transiently (as is the case for -c annex.largefiles).

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

doing now for 8.20200226-16-ge156a2b74

Thanks!

@yarikoptic
Copy link
Member

yarikoptic commented Feb 27, 2020

it is in
$> apt-cache policy git-annex-standalone 
git-annex-standalone:
  Installed: 7.20200219+git135-g34d726cba-1~ndall+1
  Candidate: 8.20200226+git16-ge156a2b74-1~ndall+1
  Version table:
     8.20200226+git16-ge156a2b74-1~ndall+1 500
        500 http://neuro.debian.net/debian-devel buster/main amd64 Packages
 *** 7.20200219+git135-g34d726cba-1~ndall+1 100
        100 /var/lib/dpkg/status
     7.20190819+git2-g908476a9b-1~ndall+1 500
        500 http://neuro.debian.net/debian buster/main amd64 Packages

@kyleam
Copy link
Contributor Author

kyleam commented Feb 27, 2020

it is in

Thanks. I'll trigger a Travis run. At this point I'd guess it is just going to run into the hanging from gh-4118.

And we have the larger conceptual question I mentioned above of how to handle annex.dotfiles on the latest git-annex. I think we're going to need to set it to "true" somewhere, because we have to be able to put ".datalad/metadata/objects/" into annex. But -c annex.dotfiles=true fails on adjusted branches (and probably with annex.addunlocked set).

Although given that all the Windows CI builds use the latest git-annex and without this PR will be failing across the board with the git-annex update, perhaps it makes sense to skip those tests on adjusted branches, mark the other failures as known windows failures, and leave it to subsequent PRs to work out what approach to use.

@kyleam kyleam force-pushed the annex-updates branch 2 times, most recently from 6884beb to cf6124c Compare Feb 27, 2020
kyleam added 2 commits Feb 28, 2020
These tests are failing on the Windows GitHub CI build with
8.20200226-g2d3ef2c07.  Due to the combination of an adjusted branch
and us using a `-c annex.dotfiles=true` override in our calls to
git-annex, the repos end up in an unexpectedly dirty state when on an
adjusted branch.  (Dotfiles switch from pointer files back to
git-tracked files.)

The issue goes away if we set annex.dotfiles in .git/config or
git-annex's config.  So the two main options seem to be to (1) not set
annex.dotfiles=true or (2) to set it more persistently.  The former
isn't likely to be viable because it would prevent use from annexing
content like ".datalad/metadata/objects/", and the latter needs more
thought.  For now, make these tests pass by adjusting them to set
annex.dotfiles=true in .git/config when running on an adjusted branch.

Re: datalad#4214 (comment)
The Windows GitHub CI builds are now running with
8.20200226-g2d3ef2c07.  The --include-dotfiles adjustment from this
series removed the widespread failures, and the previous commit worked
around unresolved issues due to adjusted branches and using `-c
annex.dotfiles=true`.  These are the remaining failures.

Re: datalad#4214 (comment)
@kyleam
Copy link
Contributor Author

kyleam commented Feb 28, 2020

I'll trigger a Travis run. At this point I'd guess it is just going to run into the hanging from gh-4118.

It did stall: https://travis-ci.org/datalad/datalad/builds/656020395

@kyleam
Copy link
Contributor Author

kyleam commented Feb 28, 2020

I said:

Although given that all the Windows CI builds use the latest git-annex and without this PR will be failing across the board with the git-annex update, perhaps it makes sense to skip those tests on adjusted branches, mark the other failures as known windows failures [...]

I've done mostly that with the latest push, with the difference that "skip those tests on adjusted branch" isn't a skip, but rather a workaround with a FIXME.

I'm not sure whether appveyor needs the known failure marks, so I've restricted it to known_failure_githubci_win for now.

@mih
Copy link
Member

mih commented Feb 28, 2020

Given the urgency of this (all tests on windows break now), I much appreciate this PR!

@mih mih merged commit 4122228 into datalad:maint Feb 28, 2020
17 checks passed
@yarikoptic
Copy link
Member

yarikoptic commented Feb 28, 2020

But -c annex.dotfiles=true fails on adjusted branches (and probably with annex.addunlocked set).

is it git-annex issue? reported?

@kyleam kyleam deleted the annex-updates branch Feb 28, 2020
@kyleam
Copy link
Contributor Author

kyleam commented Feb 28, 2020

is it git-annex issue?

I'm not sure. I've explained my understanding of it in comments above and in 547c129 (TST: Work around annex.dotfiles-related failures on adjusted branches, 2020-02-28).

reported?

I plan on doing so, assuming that, when I try to understand the issue better and write it up, I don't convince myself that it is intended behavior on git-annex's side. (As I mentioned above, at this point, I think the answer will be "as with annex.largefiles, you shouldn't override annex.dotfiles transiently; it will lead to problems with the filters").

edit: posts on git-annex site

kyleam added a commit to kyleam/datalad that referenced this issue Mar 6, 2020
As of version 8.20200226, git-annex no longer has --include-dotfiles
and, instead of skipping dotfiles that are not explicitly given as
paths, now adds dotfiles to git by default.  When the new
annex.dotfiles option is configured to true, git-annex decides the
fate of dotfiles based on the normal annex.largefiles mechanism.

97c28c7 (RF: annexrepo: Adjust for removal of --include-dotfiles,
dataladgh-4214) dealt with the removal of --include-dotfiles, but it only
_partially_ dealt with the new "dot files to git" behavior.  That
commit set annex.dotfiles=true for our calls to git-annex.  However,
when we set this transiently, later actions can result in annexed
dotfiles being added as regular git files [0], a problem that was
hinted at by the workarounds needed for test_run.test_sidecar and
test_save.test_save_dotfiles in adjusted mode.

We rely on annexing dotfiles (e.g., `.datalad/metadata/objects`), so
it seems we can't get around setting annex.dotfiles=true in the
repository.  When AnnexRepo is initialized, check whether
annex.dotfiles is configured in .git/config.  If it's not, set it to
true unless it has explicitly been set to false outside of DataLad.

Note that this solution doesn't consider the annex.dotfiles
configuration that could be stored in git-annex:config.log, which is
overridden by the value in .git/config.  This could be added to the
check if we decide the correctness is worth the expense for each
AnnexRepo initialization.

[0] Two posts on the git-annex site have discussion and demo scripts
    related to this:
    https://git-annex.branchable.com/forum/Get_annex.dotfiles__61__true_behavior_without_persistent_configuration__63__/
    https://git-annex.branchable.com/bugs/Guard_against_previously_annexed_dotfiles_being_converted_to_git_files__63__/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants