Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Don't assume default annex repo version is v5 #3648

Merged
merged 7 commits into from Sep 25, 2019
Merged

Conversation

@kyleam
Copy link
Contributor

@kyleam kyleam commented Sep 3, 2019

This PR fixes test failures that I found locally when running with a git-annex built from annex's v7-default branch (b421004d75). I'm marking it as a draft because before merging we should have a Travis run with a git-annex version that auto-upgrades v5 repositories.

On a throw-away merge of this PR's branch into master, the datalad/core tests passed for me, but master's tests may need additional fixes elsewhere. Edit: master looks good too. After merging this to 0.11.x, a throw-away merge of 0.11.x into master passed).

Closes #3628.


  • Verify that this branch passes on Travis with a version of git-annex that defaults to v7 repos (that change is currently in annex's v7-default branch has been released and is now included in the latest git-annex release, 7.20190912).
@codecov
Copy link

@codecov codecov bot commented Sep 3, 2019

Codecov Report

Merging #3648 into 0.11.x will decrease coverage by 2.45%.
The diff coverage is 65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3648      +/-   ##
==========================================
- Coverage   83.69%   81.23%   -2.46%     
==========================================
  Files         278      256      -22     
  Lines       48124    33933   -14191     
==========================================
- Hits        40275    27565   -12710     
+ Misses       7849     6368    -1481
Impacted Files Coverage Δ
datalad/support/gitrepo.py 69.13% <ø> (-6.44%) ⬇️
datalad/support/tests/test_gitrepo.py 99.88% <ø> (+0.04%) ⬆️
datalad/tests/utils.py 89.15% <100%> (-3.56%) ⬇️
datalad/support/tests/test_annexrepo.py 94.32% <100%> (-1.91%) ⬇️
datalad/support/annexrepo.py 57.75% <22.22%> (-6.21%) ⬇️
datalad/interface/rerun.py 18.86% <0%> (-30.58%) ⬇️
datalad/ui/utils.py 46.87% <0%> (-25%) ⬇️
datalad/distribution/publish.py 16.42% <0%> (-21.11%) ⬇️
datalad/tests/test_direct_mode.py 30.35% <0%> (-18.3%) ⬇️
datalad/downloaders/providers.py 68.34% <0%> (-16.72%) ⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60542a0...9410516. Read the comment docs.

@@ -121,6 +121,7 @@ class AnnexRepo(GitRepo, RepoInterface):
GIT_ANNEX_MIN_VERSION = '6.20180913'
git_annex_version = None
supports_direct_mode = None
repo_version_info = None
Copy link
Member

@yarikoptic yarikoptic Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to me.

minor notes/ideas:

  1. I wondered if a better fit name could be chosen for this class variable though. repo_ prefix somewhat visually occludes with our .repo and possibly suggests that it might be about an instance, not overall annex functionality. May be
  • annex_versions (and thus annex_versions["supported"] and annex_versions["upgradable"])
  • repository_versions (closer to verbatim wording of git annex version output)
  1. So we use check_ for other two class level variables. I do understand that get_ is closer in what it does here, but may be to stay consistent with the other two, it should be e.g. check_annex_versions?

Copy link
Contributor Author

@kyleam kyleam Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look.

  1. I wondered if a better fit name could be chosen for this class variable though. repo_ prefix somewhat visually occludes with our .repo and possibly suggests that it might be about an instance, not overall annex functionality. May be
  • annex_versions (and thus annex_versions["supported"] and annex_versions["upgradable"])

I'm worried it wouldn't be clear whether we're talking about the version of git-annex or repository versions, even if a reader doesn't gloss over the "s".

  • repository_versions (closer to verbatim wording of git annex version output)

This clears up the above problem. There is some risk that the reader thinks it is the version of the current repo (they don't register the "s" or somehow think an annex repo can be in some hybrid state :>), which was my motivation for tacking on "_info". But I'm not sure "repository_version_info" would clear up any real confusion, so I went with your shorter "repository_versions" suggestion.

  1. So we use check_ for other two class level variables. I do understand that get_ is closer in what it does here, but may be to stay consistent with the other two, it should be e.g. check_annex_versions?

Fair enough.

range-diff
1:  f547890d8 ! 1:  9932d5377 ENH: annexrepo: Add get_repo_version_info method
    @@ Metadata
     Author: Kyle Meyer <kyle@kyleam.com>
     
      ## Commit message ##
    -    ENH: annexrepo: Add get_repo_version_info method
    +    ENH: annexrepo: Add check_repository_versions method
     
         Provide a method for getting the supported and upgradable repository
         versions.  In the current release of git-annex (7.20190819), the
    @@ datalad/support/annexrepo.py: class AnnexRepo(GitRepo, RepoInterface):
          GIT_ANNEX_MIN_VERSION = '6.20180913'
          git_annex_version = None
          supports_direct_mode = None
    -+    repo_version_info = None
    ++    repository_versions = None
      
          # Class wide setting to allow insecure URLs. Used during testing, since
          # git annex 6.20180626 those will by default be not allowed for security
    @@ datalad/support/annexrepo.py: def check_direct_mode_support(cls):
              return cls.supports_direct_mode
      
     +    @classmethod
    -+    def get_repo_version_info(cls):
    ++    def check_repository_versions(cls):
     +        """Get information on supported and upgradable repository versions.
     +
    -+        The result is cached at `cls.repo_version_info`.
    ++        The result is cached at `cls.repository_versions`.
     +
     +        Returns
     +        -------
    @@ datalad/support/annexrepo.py: def check_direct_mode_support(cls):
     +          supported -> list of supported versions (int)
     +          upgradable -> list of upgradable versions (int)
     +        """
    -+        if cls.repo_version_info is None:
    ++        if cls.repository_versions is None:
     +            from datalad.cmd import Runner
     +            key_remap = {
     +                "supported repository versions": "supported",
     +                "upgrade supported from repository versions": "upgradable"}
     +            out, _ = Runner().run(["git", "annex", "version"])
     +            kvs = (ln.split(":", 1) for ln in out.splitlines())
    -+            cls.repo_version_info = {
    ++            cls.repository_versions = {
     +                key_remap[k]: list(map(int, v.strip().split()))
     +                for k, v in kvs if k in key_remap}
    -+        return cls.repo_version_info
    ++        return cls.repository_versions
     +
          @staticmethod
          def get_size_from_key(key):
2:  29bb64f11 ! 2:  5db9ee007 TST: skip_v6_or_later: Consider whether v5 is supported by git-annex
    @@ datalad/tests/utils.py: def dm_func(*args, **kwargs):
     +    from datalad.support.annexrepo import AnnexRepo
     +
          version = cfg.obtain("datalad.repo.version")
    -+    info = AnnexRepo.get_repo_version_info()
    ++    info = AnnexRepo.check_repository_versions()
      
     -    @skip_if(version >= 6, msg="Skip test in v6+ test run", method=method)
     +    @skip_if(version >= 6 or 5 not in info["supported"],
3:  fef8786e5 ! 3:  2e666e11f TST: annexrepo: Rework test to use get_repo_version_info()
    @@ Metadata
     Author: Kyle Meyer <kyle@kyleam.com>
     
      ## Commit message ##
    -    TST: annexrepo: Rework test to use get_repo_version_info()
    +    TST: annexrepo: Rework test to use check_repository_versions()
     
         When testing that AnnexRepo(..., version=6) has the intended effect,
         test_repo_version() asserts that we expect either a v6 or v7 repo
    @@ datalad/support/tests/test_annexrepo.py: def test_repo_version(path1, path2, pat
     -    # the one below to eq_(version, 7).
     -    assert_in(version, [6, 7])
     +    # Since git-annex 7.20181031, v6 repos upgrade to v7.
    -+    supported_versions = AnnexRepo.get_repo_version_info()["supported"]
    ++    supported_versions = AnnexRepo.check_repository_versions()["supported"]
     +    v6_lands_on = next(i for i in supported_versions if i >= 6)
     +    eq_(version, v6_lands_on)
      
4:  b8270590f = 4:  721eb8b82 TST: annexrepo: Don't assume v5 is the default repository version

kyleam added 4 commits Sep 10, 2019
Provide a method for getting the supported and upgradable repository
versions.  In the current release of git-annex (7.20190819), the
supported versions will be 5 and 7 and the upgradable versions will be
0 through 6 (on non-Windows systems).  In an upcoming release, 5 will
likely be dropped from the supported versions [*].

This method will allow us to update code (particularly tests) that
assume v5 is the default supported version.

Use a classmethod for the same reason described in 66cf13b (ENH:
annexrepo: Add check_direct_mode_support method, 2019-08-29).

[*] https://git-annex.branchable.com/todo/v7_path_toward_default/

Re: datalad#3628
An upcoming release of git-annex will likely drop v5 from the
supported versions and autoupgrade it to v7 or later.
When testing that AnnexRepo(..., version=6) has the intended effect,
test_repo_version() asserts that we expect either a v6 or v7 repo
because more recent versions of git-annex auto-upgrade v6 to v7.  This
workaround requires us to keep up with the latest git-annex because v7
may start to auto-upgrade to a later version.

Instead determine which version the repository will end up at by
taking the closest supported version that git-annex reports.
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 13, 2019

git-annex's v7 branch has been merged in eae5e9647 (2019-09-12). @yarikoptic, when you have the time, could you build it and trigger a _DL_DEVEL_ANNEX run for this PR?

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Sep 19, 2019

git-annex's v7 branch has been merged in eae5e9647 (2019-09-12). @yarikoptic, when you have the time, could you build it and trigger a _DL_DEVEL_ANNEX run for this PR?

pushed

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Sep 22, 2019

tests seems to pass but new fun: #3698

As an arbitrary toy analysis, 3rdparty_analysis_workflow.sh calls
`file` on locked and present annex files, generating a result.txt file
where the first line looks something like

  src/[...]T1w.nii.gz: symbolic link to ../../.git/annex/objects/[...]

In v6+ repos, it seems that this looks close enough to an unlocked
pointer file to confuse git-annex, leading to the example to
fail (e.g., [0]).

Avoid generating a result.txt that has a git-annex object path in the
first line to make this example work with v6+ repositories (which
git-annex now uses by default).

[0]: https://travis-ci.org/datalad/datalad/jobs/587149871#L1336

Closes datalad#3698.
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 23, 2019

The 3rdparty example now passes, but there is still a failure in the crawler build:

https://travis-ci.org/datalad/datalad/jobs/588452670#L1239

======================================================================
ERROR: datalad_crawler.pipelines.tests.test_openfmri.test_openfmri_pipeline1
----------------------------------------------------------------------
[...]
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully [{'message': 'not available; No other repository is known to contain the file.', 'status': 'error', 'action': 'get', 'path': '/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q/.datalad/crawl/crawl.cfg', 'annexkey': 'MD5E-s55--32de35ddd4afe2053bd7d10433180b8f.cfg', 'type': 'file', 'refds': '/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q'}, {'message': ('could not get some content in %s %s', '/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q', ['/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q/.datalad/crawl/crawl.cfg']), 'type': 'directory', 'path': '/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q', 'action': 'get', 'refds': '/tmp/datalad_temp_test_openfmri_pipeline1x_5wmm5q', 'status': 'impossible'}]

My wild guess is that the crawler instantiates a GitRepo on an annex repository and then calls .add(), bypassing the -c annex.largefiles=nothing value that would be set if AnnexRepo.add(..., git=True) were called. I think that GitRepo.add should always use -c annex.largefiles=nothing. I'll try to make that change and see if it helps. Otherwise I'll disable the crawler tests and open an issue in the crawler repo about the tests needing to be adjusted for the latest git-annex.

With v6+ repositories, 'git add' adds the file to annex unless
annex.largefiles specifies that the file should go to git.  Typically
an annex repository will be represented as an AnnexRepo and files can
be added to git with `.add(..., git=True)`.  However, instantiating an
existing annex repo as a GitRepo leads to the surprising behavior that
GitRepo.add() adds the file to annex.  Configure annex.largefiles so
that GitRepo.add() always adds files to git.
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 23, 2019

but there is still a failure in the crawler build:
[...]
I think that GitRepo.add should always use -c annex.largefiles=nothing. I'll try to make that change and see if it helps. Otherwise I'll disable the crawler tests and open an issue in the crawler repo about the tests needing to be adjusted for the latest git-annex.

The crawler run still failed.

I've filed an issue on the crawler's repo, disabled the crawler test, but kept the GitRepo.add change and associated test because I still think that's the behavior we want. Without the crawler run, the build passes. I'll drop the tip commit that forces the annex-devel build.

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Sep 24, 2019

d'oh -- I forgot that this PR no longer carries the enabling of debian-devel, will push that commit now

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 24, 2019

d'oh -- I forgot that this PR no longer carries the enabling of debian-devel, will push that commit now

What you just pushed is equivalent to the run for a7370d6. Here's the same link as above for the failing run: https://travis-ci.org/datalad/datalad/jobs/588507087#L1238

So, if the crawler run doesn't fail now, there's some sort of flakiness to it.

range-diff
1:  9932d5377 = 1:  9932d5377 ENH: annexrepo: Add check_repository_versions method
2:  5db9ee007 = 2:  5db9ee007 TST: skip_v6_or_later: Consider whether v5 is supported by git-annex
3:  2e666e11f = 3:  2e666e11f TST: annexrepo: Rework test to use check_repository_versions()
4:  721eb8b82 = 4:  721eb8b82 TST: annexrepo: Don't assume v5 is the default repository version
5:  de4d7545b = 5:  de4d7545b BF: 3rdparty_analysis_workflow: Make example compatible with v6+
6:  ad0158cb1 = 6:  ad0158cb1 BF(v7): gitrepo: Avoid adding files to annex
7:  a7370d6db = 7:  f253c1bfa TEMP: annex devel run

…it-annex

We already made similar adjustments for @skip_v6_or_later, which is
used in this code base.  @known_failure_v6_or_later isn't currently
used in datalad core, but the crawler extension still uses it and the
tests fail without it (e.g., [0]).

Closes datalad#3706.
Closes datalad/datalad-crawler#55.
Re: datalad/datalad-crawler#57

[0]: https://travis-ci.org/datalad/datalad/jobs/588452670#L1239
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 24, 2019

I've adjusted known_failure_v6_or_later, as discussed in #3706 and datalad/datalad-crawler#57.

@yarikoptic, thanks for spotting that the failing crawler test was decorated with known_failure_v6_or_later. Hopefully that push gets us back to green. The crawler is in no worse condition then, but of course we should get to the bottom of why those json files are switching to annex files in v6+ repos.

range-diff
1:  9932d5377 = 1:  9932d5377 ENH: annexrepo: Add check_repository_versions method
2:  5db9ee007 = 2:  5db9ee007 TST: skip_v6_or_later: Consider whether v5 is supported by git-annex
3:  2e666e11f = 3:  2e666e11f TST: annexrepo: Rework test to use check_repository_versions()
4:  721eb8b82 = 4:  721eb8b82 TST: annexrepo: Don't assume v5 is the default repository version
5:  de4d7545b = 5:  de4d7545b BF: 3rdparty_analysis_workflow: Make example compatible with v6+
6:  ad0158cb1 = 6:  ad0158cb1 BF(v7): gitrepo: Avoid adding files to annex
-:  --------- > 7:  941051611 TST: known_failure_v6_or_later: Consider whether v5 is supported by git-annex
7:  f253c1bfa = 8:  8a040db87 TEMP: annex devel run

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Sep 25, 2019

With 8a040db, crawler run is back to green: https://travis-ci.org/datalad/datalad/jobs/589121771

All other failures are due to unrelated test_download_ftp failure. I'll drop the temporary tip commit. There's not much point in having the non-devel annex build run yet again, so I'll merge without waiting for another one to go through.

@kyleam kyleam merged commit 9410516 into datalad:0.11.x Sep 25, 2019
1 of 3 checks passed
@kyleam kyleam deleted the v7-default branch Sep 25, 2019
yarikoptic added a commit that referenced this issue Oct 14, 2019
0.11.8 (Oct 11, 2019) -- annex-we-are-catching-up

Fixes

- Our internal command runner failed to capture output in some cases.
  ([#3656][])
- Workaround in the tests around python in cPython >= 3.7.5 ';' in
  the filename confusing mimetypes ([#3769][]) ([#3770][])

Enhancements and new features

- Prepared for upstream changes in git-annex, including support for
  the latest git-annex
  - 7.20190912 auto-upgrades v5 repositories to v7.  ([#3648][]) ([#3682][])
  - 7.20191009 fixed treatment of (larger/smaller)than in .gitattributes ([#3765][])

- The `cfg_text2git` procedure, as well the `--text-no-annex` option
  of [create][], now configure .gitattributes so that empty files are
  stored in git rather than annex.  ([#3667][])

* tag '0.11.8': (27 commits)
  DOC: add CHANGELOG entry about mimetypes workaround, and regenerate changelog.rst
  RF: reuse fn*obscure* variables from test_archives for testing archives custom remote
  BF(TST,workaround): do not use ; in the test archive filenames
  Finalize changelog and boost version
  DOC: Adjust CHANGELOG for the fix of test
  RF(TST): use 'willgetshort' name to correctly reflect file behavior
  BF(TST): reflect the fact that since 7.20191009 file would jump from annex to git based on current size
  CHANGELOG.md: Add entry for gh-3667
  CHANGELOG.md: First batch for 0.11.8
  RF: simplify the expression for largefiles based on size
  ENH: exit with dedicated 99 exit code if installed annex is newer than -devel
  TST: known_failure_v6_or_later: Consider whether v5 is supported by git-annex
  BF(v7): gitrepo: Avoid adding files to annex
  BF: 3rdparty_analysis_workflow: Make example compatible with v6+
  ENH: annexrepo: Give more informative assertion error
  BF: annexrepo: Skip empty lines when expecting one output line
  TST: create: Adjust --text-no-annex test for aa6b8dc
  ENH: add file size rule to --text-no-annex
  TST: basic test for empty files in text2git ds
  ENH: exclude empty files from being annexed after text2git
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants