Skip to content

Commit

Permalink
Merge tag '0.11.8' into debian
Browse files Browse the repository at this point in the history
0.11.8 (Oct 11, 2019) -- annex-we-are-catching-up

Fixes

- Our internal command runner failed to capture output in some cases.
  ([#3656][])
- Workaround in the tests around python in cPython >= 3.7.5 ';' in
  the filename confusing mimetypes ([#3769][]) ([#3770][])

Enhancements and new features

- Prepared for upstream changes in git-annex, including support for
  the latest git-annex
  - 7.20190912 auto-upgrades v5 repositories to v7.  ([#3648][]) ([#3682][])
  - 7.20191009 fixed treatment of (larger/smaller)than in .gitattributes ([#3765][])

- The `cfg_text2git` procedure, as well the `--text-no-annex` option
  of [create][], now configure .gitattributes so that empty files are
  stored in git rather than annex.  ([#3667][])

* tag '0.11.8': (27 commits)
  DOC: add CHANGELOG entry about mimetypes workaround, and regenerate changelog.rst
  RF: reuse fn*obscure* variables from test_archives for testing archives custom remote
  BF(TST,workaround): do not use ; in the test archive filenames
  Finalize changelog and boost version
  DOC: Adjust CHANGELOG for the fix of test
  RF(TST): use 'willgetshort' name to correctly reflect file behavior
  BF(TST): reflect the fact that since 7.20191009 file would jump from annex to git based on current size
  CHANGELOG.md: Add entry for gh-3667
  CHANGELOG.md: First batch for 0.11.8
  RF: simplify the expression for largefiles based on size
  ENH: exit with dedicated 99 exit code if installed annex is newer than -devel
  TST: known_failure_v6_or_later: Consider whether v5 is supported by git-annex
  BF(v7): gitrepo: Avoid adding files to annex
  BF: 3rdparty_analysis_workflow: Make example compatible with v6+
  ENH: annexrepo: Give more informative assertion error
  BF: annexrepo: Skip empty lines when expecting one output line
  TST: create: Adjust --text-no-annex test for aa6b8dc
  ENH: add file size rule to --text-no-annex
  TST: basic test for empty files in text2git ds
  ENH: exclude empty files from being annexed after text2git
  ...
  • Loading branch information
yarikoptic committed Oct 11, 2019
2 parents 4a60c12 + 0fc39f9 commit 8f03e47
Show file tree
Hide file tree
Showing 20 changed files with 294 additions and 103 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Expand Up @@ -269,7 +269,7 @@ before_install:
- if [ ! -z "${_DL_UPSTREAM_GITPYTHON:-}" ]; then pip install https://github.com/gitpython-developers/GitPython/archive/master.zip; fi
- if [ ! -z "${_DL_UPSTREAM_GITANNEX:-}" ]; then sudo tools/ci/install-annex-snapshot.sh; sudo ln -s `find /usr/local/lib/git-annex.linux -maxdepth 1 -type f -perm /+x` /usr/local/bin/; else sudo eatmydata apt-get install git-annex-standalone ; fi
# Install optionally -devel version of annex, and if goes wrong (we have most recent), exit right away
- if [ ! -z "${_DL_DEVEL_ANNEX:-}" ]; then tools/ci/prep-travis-devel-annex.sh || exit 0; fi
- if [ ! -z "${_DL_DEVEL_ANNEX:-}" ]; then tools/ci/prep-travis-devel-annex.sh || { ex="$?"; if [ "$ex" -eq 99 ]; then exit 0; else exit "$ex"; fi; }; fi
# Optionally install the latest Git. Exit code 100 indicates that bundled is same as the latest.
- if [ ! -z "${_DL_UPSTREAM_GIT:-}" ]; then
sudo tools/ci/install-latest-git.sh || { [ $? -eq 100 ] && exit 0; } || exit 1;
Expand Down
28 changes: 28 additions & 0 deletions CHANGELOG.md
Expand Up @@ -9,6 +9,27 @@ This is a high level and scarce summary of the changes between releases.
We would recommend to consult log of the
[DataLad git repository](http://github.com/datalad/datalad) for more details.

## 0.11.8 (Oct 11, 2019) -- annex-we-are-catching-up

### Fixes

- Our internal command runner failed to capture output in some cases.
([#3656][])
- Workaround in the tests around python in cPython >= 3.7.5 ';' in
the filename confusing mimetypes ([#3769][]) ([#3770][])

### Enhancements and new features

- Prepared for upstream changes in git-annex, including support for
the latest git-annex
- 7.20190912 auto-upgrades v5 repositories to v7. ([#3648][]) ([#3682][])
- 7.20191009 fixed treatment of (larger/smaller)than in .gitattributes ([#3765][])

- The `cfg_text2git` procedure, as well the `--text-no-annex` option
of [create][], now configure .gitattributes so that empty files are
stored in git rather than annex. ([#3667][])


## 0.11.7 (Sep 06, 2019) -- python2-we-still-love-you-but-...

Primarily bugfixes with some optimizations and refactorings.
Expand Down Expand Up @@ -1461,3 +1482,10 @@ publishing
[#3626]: https://github.com/datalad/datalad/issues/3626
[#3631]: https://github.com/datalad/datalad/issues/3631
[#3646]: https://github.com/datalad/datalad/issues/3646
[#3648]: https://github.com/datalad/datalad/issues/3648
[#3656]: https://github.com/datalad/datalad/issues/3656
[#3667]: https://github.com/datalad/datalad/issues/3667
[#3682]: https://github.com/datalad/datalad/issues/3682
[#3765]: https://github.com/datalad/datalad/issues/3765
[#3769]: https://github.com/datalad/datalad/issues/3769
[#3770]: https://github.com/datalad/datalad/issues/3770
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Expand Up @@ -545,7 +545,7 @@ Refer datalad/config.py for information on how to add these environment variable

For the upcoming release use this template

## 0.11.8 (??? ??, 2019) -- will be better than ever
## 0.11.9 (??? ??, 2019) -- will be better than ever

bet we will fix some bugs and make a world even a better place.

Expand Down
8 changes: 8 additions & 0 deletions datalad/cmd.py
Expand Up @@ -322,6 +322,14 @@ def _get_output_online(self, proc,
time.sleep(0.001)

# Handle possible remaining output
if log_stdout_ and log_stderr_:
# If Popen was called with more than two pipes, calling
# communicate() after we partially read the stream will return
# empty output.
stdout += self._process_remaining_output(
outputstream, proc.stdout.read(), *stdout_args)
stderr += self._process_remaining_output(
errstream, proc.stderr.read(), *stderr_args)
stdout_, stderr_ = proc.communicate()
# ??? should we condition it on log_stdout in {'offline'} ???
stdout += self._process_remaining_output(outputstream, stdout_, *stdout_args)
Expand Down
31 changes: 15 additions & 16 deletions datalad/customremotes/tests/test_archives.py
Expand Up @@ -61,12 +61,11 @@
from . import _get_custom_runner


# both files will have the same content
# fn_inarchive_obscure = 'test.dat'
# fn_extracted_obscure = 'test2.dat'
fn_inarchive_obscure = get_most_obscure_supported_name()
fn_archive_obscure = fn_inarchive_obscure.replace('a', 'b') + '.tar.gz'
fn_extracted_obscure = fn_inarchive_obscure.replace('a', 'z')
from ...tests.test_archives import (
fn_in_archive_obscure,
fn_archive_obscure,
fn_archive_obscure_ext,
)

#import line_profiler
#prof = line_profiler.LineProfiler()
Expand All @@ -76,13 +75,13 @@
# matching archive name, so it will be a/d/test.dat ... we don't want that probably
@with_direct
@with_tree(
tree=(('a.tar.gz', {'d': {fn_inarchive_obscure: '123'}}),
tree=(('a.tar.gz', {'d': {fn_in_archive_obscure: '123'}}),
('simple.txt', '123'),
(fn_archive_obscure, (('d', ((fn_inarchive_obscure, '123'),)),)),
(fn_extracted_obscure, '123')))
(fn_archive_obscure_ext, (('d', ((fn_in_archive_obscure, '123'),)),)),
(fn_archive_obscure, '123')))
@with_tempfile()
def test_basic_scenario(direct, d, d2):
fn_archive, fn_extracted = fn_archive_obscure, fn_extracted_obscure
fn_archive, fn_extracted = fn_archive_obscure_ext, fn_archive_obscure
annex = AnnexRepo(d, runner=_get_custom_runner(d), direct=direct)
annex.init_remote(
ARCHIVES_SPECIAL_REMOTE,
Expand All @@ -91,7 +90,7 @@ def test_basic_scenario(direct, d, d2):
])
assert annex.is_special_annex_remote(ARCHIVES_SPECIAL_REMOTE)
# We want two maximally obscure names, which are also different
assert(fn_extracted != fn_inarchive_obscure)
assert(fn_extracted != fn_in_archive_obscure)
annex.add(fn_archive)
annex.commit(msg="Added tarball")
annex.add(fn_extracted)
Expand All @@ -114,7 +113,7 @@ def test_basic_scenario(direct, d, d2):

file_url = annexcr.get_file_url(
archive_file=fn_archive,
file=fn_archive.replace('.tar.gz', '') + '/d/' + fn_inarchive_obscure)
file=fn_archive.replace('.tar.gz', '') + '/d/' + fn_in_archive_obscure)

annex.add_url_to_file(fn_extracted, file_url, ['--relaxed'])
annex.drop(fn_extracted)
Expand Down Expand Up @@ -165,7 +164,7 @@ def test_basic_scenario(direct, d, d2):


@with_tree(
tree={'a.tar.gz': {'d': {fn_inarchive_obscure: '123'}}}
tree={'a.tar.gz': {'d': {fn_in_archive_obscure: '123'}}}
)
@known_failure_direct_mode #FIXME
def test_annex_get_from_subdir(topdir):
Expand All @@ -174,13 +173,13 @@ def test_annex_get_from_subdir(topdir):
annex.add('a.tar.gz')
annex.commit()
add_archive_content('a.tar.gz', annex=annex, delete=True)
fpath = op.join(topdir, 'a', 'd', fn_inarchive_obscure)
fpath = op.join(topdir, 'a', 'd', fn_in_archive_obscure)

with chpwd(op.join(topdir, 'a', 'd')):
runner = Runner()
runner(['git', 'annex', 'drop', '--', fn_inarchive_obscure]) # run git annex drop
runner(['git', 'annex', 'drop', '--', fn_in_archive_obscure]) # run git annex drop
assert_false(annex.file_has_content(fpath)) # and verify if file deleted from directory
runner(['git', 'annex', 'get', '--', fn_inarchive_obscure]) # run git annex get
runner(['git', 'annex', 'get', '--', fn_in_archive_obscure]) # run git annex get
assert_true(annex.file_has_content(fpath)) # and verify if file got into directory


Expand Down
4 changes: 2 additions & 2 deletions datalad/distribution/create.py
Expand Up @@ -343,9 +343,9 @@ def __call__(
attrs = tbrepo.get_gitattributes('.')
# some basic protection against useless duplication
# on rerun with --force
if not attrs.get('.', {}).get('annex.largefiles', None) == '(not(mimetype=text/*))':
if not attrs.get('.', {}).get('annex.largefiles', None) == '(not(mimetype=text/*)and(largerthan=0))':
tbrepo.set_gitattributes([
('*', {'annex.largefiles': '(not(mimetype=text/*))'})])
('*', {'annex.largefiles': '(not(mimetype=text/*)and(largerthan=0))'})])
add_to_git.append('.gitattributes')

if native_metadata_type is not None:
Expand Down
7 changes: 3 additions & 4 deletions datalad/distribution/tests/test_create.py
Expand Up @@ -334,7 +334,7 @@ def test_create_text_no_annex(path):
import re
ok_file_has_content(
_path_(path, '.gitattributes'),
content='\* annex\.largefiles=\(not\(mimetype=text/\*\)\)',
content='\* annex\.largefiles=\(not\(mimetype=text/\*\)and\(largerthan=0\)\)',
re_=True,
match=False,
flags=re.MULTILINE
Expand All @@ -344,13 +344,12 @@ def test_create_text_no_annex(path):
create_tree(path,
{
't': 'some text',
'b': '' # empty file is not considered to be a text file
# should we adjust the rule to consider only non empty files?
'b': '' # Empty file is considered text file.
}
)
ds.add(['t', 'b'])
ok_file_under_git(path, 't', annexed=False)
ok_file_under_git(path, 'b', annexed=True)
ok_file_under_git(path, 'b', annexed=False)


@with_tempfile(mkdir=True)
Expand Down
16 changes: 16 additions & 0 deletions datalad/interface/tests/test_run_procedure.py
Expand Up @@ -34,6 +34,7 @@
from datalad.tests.utils import on_windows
from datalad.tests.utils import known_failure_direct_mode
from datalad.tests.utils import known_failure_windows
from datalad.tests.utils import skip_if_on_windows
from datalad.distribution.dataset import Dataset
from datalad.support.exceptions import (
CommandError,
Expand Down Expand Up @@ -347,3 +348,18 @@ def test_quoting(path):
"datalad run-procedure just2args \"with ' sing\" 'with \" doub'")
with assert_raises(CommandError):
runner.run("datalad run-procedure just2args 'still-one arg'")

@skip_if_on_windows
@with_tempfile
def test_text2git_empty(path):
"""
Tests that empty files are not annexed in a ds configured with text2git.
"""
ds = Dataset(path).create(force=True)
ds.run_procedure('cfg_text2git')
ok_clean_git(ds.path)
# create an empty file, no extension
open(op.join(path, 'emptyfile'), 'a').close()
ds.save(message="add empty file")
# check that it's not annexed
assert_false(ds.repo.is_under_annex("emptyfile"))
2 changes: 1 addition & 1 deletion datalad/resources/procedures/cfg_text2git.py
Expand Up @@ -10,7 +10,7 @@
check_installed=True,
purpose='configuration')

annex_largefiles = '(not(mimetype=text/*))'
annex_largefiles = '(not(mimetype=text/*)and(largerthan=0))'
attrs = ds.repo.get_gitattributes('*')
if not attrs.get('*', {}).get(
'annex.largefiles', None) == annex_largefiles:
Expand Down
31 changes: 29 additions & 2 deletions datalad/support/annexrepo.py
Expand Up @@ -121,6 +121,7 @@ class AnnexRepo(GitRepo, RepoInterface):
GIT_ANNEX_MIN_VERSION = '6.20180913'
git_annex_version = None
supports_direct_mode = None
repository_versions = None

# Class wide setting to allow insecure URLs. Used during testing, since
# git annex 6.20180626 those will by default be not allowed for security
Expand Down Expand Up @@ -799,6 +800,30 @@ def check_direct_mode_support(cls):
cls.supports_direct_mode = cls.git_annex_version <= "7.20190819"
return cls.supports_direct_mode

@classmethod
def check_repository_versions(cls):
"""Get information on supported and upgradable repository versions.
The result is cached at `cls.repository_versions`.
Returns
-------
dict
supported -> list of supported versions (int)
upgradable -> list of upgradable versions (int)
"""
if cls.repository_versions is None:
from datalad.cmd import Runner
key_remap = {
"supported repository versions": "supported",
"upgrade supported from repository versions": "upgradable"}
out, _ = Runner().run(["git", "annex", "version"])
kvs = (ln.split(":", 1) for ln in out.splitlines())
cls.repository_versions = {
key_remap[k]: list(map(int, v.strip().split()))
for k, v in kvs if k in key_remap}
return cls.repository_versions

@staticmethod
def get_size_from_key(key):
"""A little helper to obtain size encoded in a key"""
Expand Down Expand Up @@ -1161,11 +1186,13 @@ def _run_simple_annex_command(self, *args, **kwargs):
# see https://git-annex.branchable.com/todo/output_of_wanted___40__and_possibly_group_etc__41___should_not_be_polluted_with___34__informational__34___messages/
lines_ = [
l for l in lines
if not re.search(
if l and not re.search(
r'\((merging .* into git-annex|recording state ).*\.\.\.\)', l
)
]
assert(len(lines_) <= 1)

if len(lines_) > 1:
raise AssertionError("Expected one line but got {}".format(lines_))
return lines_[0] if lines_ else None

def _is_direct_mode_from_config(self):
Expand Down
5 changes: 4 additions & 1 deletion datalad/support/gitrepo.py
Expand Up @@ -1146,7 +1146,10 @@ def add_(self, files, git=True, git_options=None, update=False):
# without --verbose git 2.9.3 add does not return anything
add_out = self._git_custom_command(
files,
['git', 'add'] + assure_list(git_options) +
# Set annex.largefiles to prevent storing files in annex when
# GitRepo() is instantiated with a v6+ annex repo.
['git', '-c', 'annex.largefiles=nothing', 'add'] +
assure_list(git_options) +
to_options(update=update) + ['--verbose']
)
# get all the entries
Expand Down
35 changes: 19 additions & 16 deletions datalad/support/tests/test_annexrepo.py
Expand Up @@ -1315,21 +1315,23 @@ def test_repo_version(path1, path2, path3):
annex = AnnexRepo(path1, create=True, version=6)
ok_clean_git(path1, annex=True)
version = annex.repo.config_reader().get_value('annex', 'version')
# TODO: Since git-annex 7.20181031, v6 repos upgrade to v7. Once that
# version or later is our minimum required version, update this test and
# the one below to eq_(version, 7).
assert_in(version, [6, 7])
# Since git-annex 7.20181031, v6 repos upgrade to v7.
supported_versions = AnnexRepo.check_repository_versions()["supported"]
v6_lands_on = next(i for i in supported_versions if i >= 6)
eq_(version, v6_lands_on)

# default from config item (via env var):
with patch.dict('os.environ', {'DATALAD_REPO_VERSION': '6'}):
annex = AnnexRepo(path2, create=True)
version = annex.repo.config_reader().get_value('annex', 'version')
assert_in(version, [6, 7])
eq_(version, v6_lands_on)

# parameter `version` still has priority over default config:
annex = AnnexRepo(path3, create=True, version=5)
version = annex.repo.config_reader().get_value('annex', 'version')
eq_(version, 5)
# Assuming specified version is a supported version...
if 5 in supported_versions:
# ...parameter `version` still has priority over default config:
annex = AnnexRepo(path3, create=True, version=5)
version = annex.repo.config_reader().get_value('annex', 'version')
eq_(version, 5)


@with_testrepos('.*annex.*', flavors=['clone'])
Expand Down Expand Up @@ -2494,7 +2496,7 @@ def test_error_reporting(path):
@with_tree(tree={
'.gitattributes': "** annex.largefiles=(largerthan=4b)",
'alwaysbig': 'a'*10,
'willnotgetshort': 'b'*10,
'willgetshort': 'b'*10,
'tobechanged-git': 'a',
'tobechanged-annex': 'a'*10,
})
Expand All @@ -2521,7 +2523,7 @@ def check_commit_annex_commit_changed(unlock, path):
path
, {
'alwaysbig': 'a'*11,
'willnotgetshort': 'b',
'willgetshort': 'b',
'tobechanged-git': 'aa',
'tobechanged-annex': 'a'*11,
'untracked': 'unique'
Expand All @@ -2533,19 +2535,20 @@ def check_commit_annex_commit_changed(unlock, path):
, index_modified=files if not unannex else ['tobechanged-git']
, untracked=['untracked'] if not unannex else
# all but the one in git now
['alwaysbig', 'tobechanged-annex', 'untracked', 'willnotgetshort']
['alwaysbig', 'tobechanged-annex', 'untracked', 'willgetshort']
)

ar.commit("message", files=['alwaysbig', 'willnotgetshort'])
ar.commit("message", files=['alwaysbig', 'willgetshort'])
ok_clean_git(
path
, index_modified=['tobechanged-git', 'tobechanged-annex']
, untracked=['untracked']
)
ok_file_under_git(path, 'alwaysbig', annexed=True)
# This one is actually "questionable" since might be "correct" either way
# but it would be nice to have it at least consistent
ok_file_under_git(path, 'willnotgetshort', annexed=True)
# 7.20191009 included a fix to evaluate current filesize not old one.
# So if size got short - it will get committed to git
ok_file_under_git(path, 'willgetshort',
annexed=external_versions['cmd:annex']<'7.20191009')

ar.commit("message2", options=['-a']) # commit all changed
ok_clean_git(
Expand Down
12 changes: 11 additions & 1 deletion datalad/support/tests/test_gitrepo.py
Expand Up @@ -1465,4 +1465,14 @@ def test_duecredit(path):
if external_versions['duecredit']:
assert_in('Data management and distribution platform', outs)
else:
eq_(outs, '')
eq_(outs, '')


@with_tree({"foo": "foo"})
def test_gitrepo_add_to_git_with_annex_v7(path):
from datalad.support.annexrepo import AnnexRepo
ar = AnnexRepo(path, create=True, version=7)
gr = GitRepo(path)
gr.add("foo")
gr.commit(msg="c1")
assert_false(ar.is_under_annex("foo"))

0 comments on commit 8f03e47

Please sign in to comment.