New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFing direct mode: "git commit file(s)" etc #2770

Merged
merged 26 commits into from Aug 27, 2018

Conversation

Projects
None yet
2 participants
@yarikoptic
Member

yarikoptic commented Aug 14, 2018

Disclaimer - sits on top of #2724 which I intend to merge if tests pass

This pull request fixes # ... TODO ... there might be a few

  • (nearly all) test decorators now carry identical to their name attributes for nose, so you can easily run only a subset of those decorated tests. Some are decorated further, e.g. direct_mode would point to tests somehow related to direct mode - either explicitly testing in direct mode, known to fail, or ignored. etc
  • git annex invocations do not get -c core.bare=False --work-tree=. since they shouldn't/mustn't!
  • in direct mode: for git commit file(s) under direct mode, we resort to git commit while first analyzing file(s) given:
    • if any is not staged - we puke
    • if some staged aren't among file(s) - we prepare a new index file, where we reset those which aren't to be committed
    • gotcha/shortcoming if there is a modified file, which is staged but in some previous shape (so there are newer changes in worktree), they would not get committed. We could have analyzed index regarding either staged version is annexified or git - and then do git add or git annex add accordingly, but not sure if worth the hassle ATM. At least this fix/behavior would get us those 95% of correct functionality in direct mode ;)
    • This current "index" trick workaround could later be superseded by git supporting git commit --cached file(s) invocation. but that wouldn't eliminate the need for the analysis if we decide to do regarding files which aren't yet staged (thus might needing git annex add)
  • redecorate @skip_direct_mode into @known_failure_direct_mode
  • undecorate tests which no longer fail in direct mode (there should be a few)
  • possibly a dedicated test for git commit file(s) (datalad save file(s)) behavior in the case of some files being staged and some not... but might not do it if coverage would indicate that all possible lines are tested now

Please have a look @datalad/developers

yarikoptic added some commits Aug 11, 2018

ENH(TST): provide bunch of tags for tests
some tags match the decorator function (skip_if_on_windows) but some also
add the "feature" tag such as "direct_mode" which could be for tests which
known to fail in there (thus in addition to "known_failure_direct_mode" tag)
BF+RF: direct_mode - no worktree etc to annex commands, use separate …
…index for commit file(s)

The index trick is largely a workaround for now.  A proper fix would be to
implement --cached option for git commit file(s) invocation, which might
happen later on
RF(TEMP?): just ignore if some provided files are not staged
apparently add(".") would not "filter out" those which are clean and not
modified, so we might have no good chance to detect those.  What if we skip
- what will fail?
@kyleam

Unless I've confused myself, I think the temp index logic is wrong.

@@ -2307,17 +2314,43 @@ def count_objects(self):
if len(item.split(': ')) == 2]}
return count
def get_changed_files(self, staged=False, filter='', index_file=None):
"""Return a dictionary with files and their status code

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

None of the current callers use the dict, so it'd be simpler to just use git diff [--staged] -z --name-only --diff-filter=<filter>. But I could imagine a caller finding the dict useful in the future, so OK.

Also, this will include modified submodules. Do you want that?

This comment has been minimized.

@yarikoptic

yarikoptic Aug 23, 2018

Member

re submodules: hm... I was just trying to generlize/centralize the call from previous commands in get_missing_files and get_deleted_files. Did I change their behavior somehow?

This comment has been minimized.

@kyleam

kyleam Aug 23, 2018

Member

Did I change their behavior somehow?

No.

Parameters
----------
staged: bool, optional
Either operate on staged (index) files instead of workdir

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

s/Either/Whether to/

----------
staged: bool, optional
Either operate on staged (index) files instead of workdir
filter: str, optional

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

diff_filter to avoid masking filter?

index_file: str, optional
Alternative index file for git to use
"""
opts = ['--raw', '--name-status']

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

This is true for the previous code too, but just a note that git will quote file names with "weird" characters here. The -z --name-only the suggestion above would avoid it, but, with anything other than --name-only, the parsing would be more involved.

if staged:
opts.append('--staged')
if filter:
opts.append('--diff-filter=%s' % filter)

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

Fine as is, but you could just unconditionally append this since the default value is '' and Git treats --diff-filter='' as no filter.

This comment has been minimized.

@yarikoptic

yarikoptic Aug 15, 2018

Member

But for what benefit?

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

It drops a conditional, which I find more readable. It also makes using the non-standard filter='' rather than filter=None have a purpose.

# import pdb; pdb.set_trace()
# raise RuntimeError(
# "To commit files in direct mode, please first git "
# "or git-annex add them first! Files: %s"

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

In case this commented code gets added back: Two "first"s.

# Need an alternative index_file
with make_tempfile() as index_file:
alt_index_file = index_file
# Reset the files we are not to be committed

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

I can't parse this comment.

self._git_custom_command(
list(staged_not_to_commit),
['git', 'reset'],
index_file=alt_index_file)

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

Huh I expected a write-tree/read-tree call somewhere, or a copy on the current index. AFAICS what you end up with is an index that only has the files in staged_not_to_commit (at HEAD's version). IOW, you have all files in the working tree except for those in staged_not_to_commit staged for deletion, and modifications in staged_not_to_commit as unstaged changes. Once you go back to .git/index, you'd have the modifications from staged_not_to_commit staged, but you'd also have "A"s for all the other files in the working tree.

This comment has been minimized.

@kyleam

kyleam Aug 16, 2018

Member

As of cc0bcd6, this copies the current index.

@@ -1575,6 +1579,9 @@ def _git_custom_command(self, files, cmd_str,
if check_fake_dates and self.fake_dates_enabled:
env = self.add_fake_dates(env)
if index_file:
env['GIT_INDEX_FILE'] = index_file

This comment has been minimized.

@kyleam

kyleam Aug 15, 2018

Member

env may be None here.

This comment has been minimized.

@kyleam

kyleam Aug 16, 2018

Member

Fixed in 7ca7c10.

kyleam added some commits Aug 16, 2018

BF: gitrepo: Verify that env is a dict before setting GIT_INDEX_FILE
env may be None at this point.  Copy env to follow the behavior of
other handlers that avoid modifying it.
BF: annexrepo: Copy alternative index from current index
Otherwise the alternative index will only contain entries for the
files in the subsequent 'git reset' call.  When we commit and return
to the real index, all the other tracked files will be shown as
additions.

Re: https://github.com/datalad/datalad/pull/2770/files#r210359518
@codecov

This comment has been minimized.

codecov bot commented Aug 16, 2018

Codecov Report

Merging #2770 into master will decrease coverage by 1.1%.
The diff coverage is 93.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2770      +/-   ##
==========================================
- Coverage   90.15%   89.04%   -1.11%     
==========================================
  Files         245      245              
  Lines       31320    31777     +457     
==========================================
+ Hits        28236    28297      +61     
- Misses       3084     3480     +396
Impacted Files Coverage Δ
datalad/interface/tests/test_download_url.py 100% <ø> (ø) ⬆️
datalad/metadata/tests/test_search.py 93.18% <ø> (-0.06%) ⬇️
datalad/plugin/tests/test_plugins.py 89.15% <ø> (-0.13%) ⬇️
datalad/distribution/tests/test_get.py 100% <ø> (ø) ⬆️
datalad/metadata/extractors/tests/test_base.py 87.17% <ø> (ø) ⬆️
datalad/distribution/tests/test_install.py 82.09% <ø> (-17.5%) ⬇️
datalad/distribution/tests/test_create.py 100% <ø> (ø) ⬆️
datalad/distribution/tests/test_create_sibling.py 88.46% <ø> (-0.05%) ⬇️
datalad/tests/test_tests_utils.py 98.55% <100%> (ø) ⬆️
datalad/interface/tests/test_run.py 76.9% <100%> (-22.93%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 410f20a...a2e70aa. Read the comment docs.

kyleam added some commits Aug 16, 2018

TST: Update skip_if_no_network test for 107f036
As of that commit, the function is assigned a "network" attribute
directly rather than a "tags" attribute.
BF: annexrepo: Don't drop files to commit() if not in direct mode
As of 90d13c9, test_AnnexRepo_{commit,status} and
test_recursive_save fail because the files passed to commit() are
ignored regardless of whether we're in direct mode.
BF: annexrepo: Assure files is a list in all codepaths of commit()
`files` is documented as accepting a str or list of str, but the
non-proxy codepath doesn't assure that `files` is a list.  Move the
assure_list earlier so that it affects both codepaths.
@kyleam

This comment has been minimized.

Member

kyleam commented Aug 16, 2018

As of ccd071b, the test failures are restricted to the direct mode runs.

kyleam added some commits Aug 17, 2018

BF: annexrepo: Raise expected error when using alternative index
Commit should raise FileNotInRepositoryError for untracked files.  To
avoid getting a list of all tracked files, make a separate commit
--dry-run call.

This fixes the failing test_annexrepo.test_AnnexRepo_commit.
BF: annexrepo: Don't assume files are absolute
This fixes the failing test_AnnexRepo_status and
test_symlinked_relpath (which have been failing since 90d13c9).
BF: export_archive: Don't readlink() annex files in direct mode
As of 90d13c9, '--work-tree=.' and '-c core.bare=false' are no
longer passed to git annex calls.  As a result, file_has_content and
is_under_annex return True, which means that export_archive's readlink
call now happens in direct mode.  Prevent it from calling readlink on
annex files that aren't symbolic links.
TST: run: Make some status checks more specific
In direct mode, dataset.diff sees all annex files as type changes.
Because of this, some of the blanket "all status X" checks in
test_basics are failing in direct mode.  Where we expect all
"notneeded"s, there's an "ok" because the typechange diff result.
Where we expect all "ok"s, there's a "notneeded" because of an add
call on a typechanged annex file that is already in the dataset.

These weren't failing before 90d13c9 because the files were added to
git instead of the annex.

Adjust the checks on the result list to be more specific.
TST: Mark test_rerun_cherry_pick as a known direct mode failure
Before 90d13c9, this test passed because files were added to git
rather than annex.  This is no longer the case, which means that git
refuses to let us check out another revision for the rerun because the
annex files are seen as type changes.  Thus this is a legitimate
direct mode failure that was masked by an issue with add().
TST: Mark test_remove_more_than_one as a known direct mode failure
Before 90d13c9, this test passed because files were added to git
rather than annex.  This is no longer the case, and git fails with

  Failed to run ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0',
  '-c', 'core.bare=False', '--work-tree=.', 'rm', '-r', '--', 'two',
  'one'] under [...]. Exit code=1. out= err=error: the following files
  have local modifications:
      one
      two
  (use --cached to keep the file, or -f to force removal)

Perhaps we should run 'git rm' with --force when dataset.remove is
called with check=False, but, at any rate, this is a legitimate
failure that was masked by an issue with add().
@kyleam

This comment has been minimized.

Member

kyleam commented Aug 21, 2018

I haven't clicked through every remaining Travis failure, but I think they're all triggered by the Sphinx warning from #2774.

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 23, 2018

@kyleam since you have workedaround the sphinx issue for now (thanks for that), would you mind pushing an ammended or a new commit to trigger "rebuild"?

TST: run: Fix 663590e
That commit makes the test fail outside of direct mode because the
results include "add" but not "save."

@kyleam kyleam force-pushed the yarikoptic:bf-direct-mode branch from 557ff43 to 850fee3 Aug 23, 2018

@kyleam

This comment has been minimized.

Member

kyleam commented Aug 23, 2018

would you mind pushing an ammended or a new commit to trigger "rebuild"?

done

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 23, 2018

whoohoo -- travis seems to be ok! awesome. It had unrelated error for the crawler tests - restarted. appveyor though isn't happy for some tests, including now the dreaded old "The process cannot access the file because it is being used by another process" (e.g. #147) and test_create tests failing with "Repo unexpectly dirty "

@kyleam

This comment has been minimized.

Member

kyleam commented Aug 24, 2018

test_create tests failing with "Repo unexpectly dirty "

All right, after some unpleasant setup and debugging on smaug's windows vm, I think I know what's failing in these tests. The files to add are coming in like \\full\\path\\to\\repo\\.datalad\\config, and then _normalize_path converts them to something like .datalad\\config. All's fine in the Windows world at this point, I guess. The problem arises from when we check this list against the files returned by repo.get_changes_files, which look like .datalad/config. So any path that is not in the root of the repo doesn't get committed.

Obviously, we'll need to harmonize those sets, but I'll have to review the Windows path discussions to figure out who should change.

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 24, 2018

Oh, good catch! My guess is that get_changes_files should return them in OS-style

CLN: get_changed_files: Rename filter to diff_filter
The new name makes it clearer what the parameter is by name alone, and
it avoids masking the built-in filter().

kyleam added some commits Aug 24, 2018

DOC: gitrepo: Clarify description of diff_filter
Avoid saying "from ACDMRTUXB" because that (1) includes a lot of ones
the user would never want to use and (2) could be interpreted as a
complete list of acceptable values, but it's not.
RF: gitrepo: Drop noop --raw option in git diff call
--raw has no effect when given with --name-status.
RF+BF: gitrepo: Simplify return value of get_changed_files
get_changed_files returns a dict of files mapping to status codes, but
it has two bugs.  It would break in the inadvisable and rare situation
where someone includes a newline in a file name.  It's not a priority
to handle such file names, but we might as well if we can.  More
importantly, the diff call doesn't disable rename detection, so it
will incorrectly parse a R or C entry, which includes two file names.

One solution would be to properly parse the full output of "-z --raw",
dropping "--name-status".  (Or better still, use git-diff-files and
git-diff-index, which aren't influenced by the diff.renames variable
like the porcelain git-diff is).  But no callers currently use status
codes, so let's simply use "--name-only -z".  Also, interface.diff
already does parsing of the raw format, so it makes sense to keep this
low-level interface as simple as we need it to be, and try to use
interface.diff wherever we can.
BF(windows): gitrepo: Convert "/" in output of get_changed_files
Git will use "/" in paths even on Windows, but downstream
code (particularly annexrepo.commit) expects the paths to have native
separators.

This should fix the failing test_create* runs on appveyor.
@kyleam

This comment has been minimized.

Member

kyleam commented Aug 24, 2018

24dfa7c
[...]
This should fix the failing test_create* runs on appveyor.

Well, mostly. One of the three test_create*, test_create_text_no_annex, fails with a different error now:

======================================================================
ERROR: datalad.distribution.tests.test_create.test_create_text_no_annex
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\utils.py", line 591, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "C:\Miniconda35\envs\test-environment\lib\contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\utils.py", line 1440, in make_tempfile
    yield filename
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\utils.py", line 591, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\distribution\tests\test_create.py", line 347, in test_create_text_no_annex
    ds.add(['t', 'b'])
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\distribution\dataset.py", line 444, in apply_func
    return f(**kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 478, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 466, in return_func
    results = list(results)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 421, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 490, in _process_results
    for res in results:
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\distribution\add.py", line 421, in __call__
    for a in added:
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 1577, in add_
    expect_stderr=True):
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 2430, in _run_annex_command_json
    raise e
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 2346, in _run_annex_command_json
    **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 1087, in _run_annex_command
    raise e
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 1079, in _run_annex_command
    return self.cmd_call_wrapper.run(cmd_list, env=env, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\cmd.py", line 672, in run
    cmd, env=self.get_git_environ_adjusted(env), *args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\cmd.py", line 532, in run
    raise CommandError(str(cmd), msg, status, out[0], out[1])
datalad.support.exceptions.CommandError: CommandError: command '['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--json', '--', 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_5li_huzj\\b', 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_5li_huzj\\t']' failed with exitcode 1
Failed to run ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--json', '--', 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_5li_huzj\\b', 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_5li_huzj\\t'] under 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_5li_huzj'. Exit code=1. out= err=git-annex: bad annex.largefiles configuration: Parse failure: "mimetype" not supported; not built with MagicMime support
======================================================================

test_install_recursive_repeat also continues to fail.

@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 24, 2018

Oh, so Windows build of git annex is without mime support @joeyh? Then we might need to skip this test on Windows. Ideally (not here I guess) we should add somehow/somewhere (externals or AnnexRepo attributes) information about supported by annex optical features and then skip dynamically

kyleam added some commits Aug 24, 2018

TST: Skip test_create_text_no_annex on Windows
As of 2e39ce5(RF: disable some direct-mode failing tests which might
pass now, 2018-08-15), this test isn't skipped on AppVeyor.  It will
fail with

  git-annex: bad annex.largefiles configuration: Parse failure:
  "mimetype" not supported; not built with MagicMime support

This should be addressed elsewhere, but skip this test for now.

Re: #2770 (comment)
TST: Skip test_install_recursive_repeat on Windows
There are a couple of things going on here.  Before 2e39ce5, this
was skipped on AppVeyor as a known direct mode failure .  If it
wasn't, it would have failed with "file used by another process
error".  It's happening in a codepath of rmtemp not covered by
a670ddd (Add explicit call of gc and loop inside rmtemp(),
2015-05-06).

However, even if we avoid the above error by not removing the test
directories, the ok_clean_git call fails.  It seems to stem from the
git annex status calls in annexrepo.get_status.  These repos look
clean on manual inspection.  'git annex status' reports some of the
some of the Git tracked files (like .datalad/config) as modified, but
the file on the system and the one from the Git tree are identical.

Punt on this for now since it isn't directly related to this series.

@kyleam kyleam force-pushed the yarikoptic:bf-direct-mode branch from cb2f170 to 4464bde Aug 25, 2018

Merge branch 'master' into bf-direct-mode
Resolve incompatibility of test_gitrepo.py changes with
ad8ce54 (TST: Plain RF to remove 'import *' madness, 2018-08-07).
@yarikoptic

This comment has been minimized.

Member

yarikoptic commented Aug 25, 2018

heh, tests fail since Germans did massive RF to avoid import * via the flood of from datalad.tests.utils import ... lines in ad8ce54 and that somehow only now got to master, and now test doesn't get its opj and should use op.join. I will merge master into, and make it be used but also will use datalad.support.path not os.path since that is what we should use generally now AFAIK for workarounds for unicodes etc
edit: @kyleam feel free to reset to before the merge and do it another way if you like that
edit2: ah, you are still not asleep, so I will leave it to you then ;-)
edit3: FWIW this is my patch after the merge http://www.onerussian.com/tmp/opjandpath.patch

@kyleam

This comment has been minimized.

Member

kyleam commented Aug 25, 2018

and that somehow only now got to master

It was just that all the old code was already adjusted for the rename. I added an opj in this series, and the Travis merge of course didn't take care of that

I will merge master into

Sorry, pushed around the same time as your message came in.

and make it be used but also will use datalad.support.path not os.path since that is what we should use generally now AFAIK for workarounds for unicodes etc

I just used os.path, so feel free to fix it up how you'd like. But there is a lot of datalad code that uses os.path.

@yarikoptic yarikoptic merged commit 2ebfddd into datalad:master Aug 27, 2018

9 checks passed

codecov/patch 96.49% of diff hit (target 90.15%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +6.33% compared to 410f20a
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
datalad-pr-dl-osx-64 DEV build done.
Details
datalad-pr-docker-dl-nd14_04 DEV build done.
Details
datalad-pr-docker-dl-nd16_04 DEV build done.
Details
datalad-pr-docker-dl-nd80 DEV build done.
Details
datalad-pr-docker-dl-nd90 DEV build done.
Details

kyleam added a commit to kyleam/datalad that referenced this pull request Aug 28, 2018

TST: run: Unmark two tests as known direct mode failures
These pass following 98039cc from dataladgh-2770.

Note that dataladgh-2782 reported that test_create_1test_dataset now passes
in direct mode, but this is because the test doesn't fail
consistently, not because it was fixed by dataladgh-2770.

@yarikoptic yarikoptic added this to the Release 0.10.3 milestone Sep 12, 2018

yarikoptic added a commit that referenced this pull request Sep 13, 2018

Merge tag '0.10.3' into debian
0.10.3 (Sep 13, 2018) -- Almost-perfect

This is largely a bugfix release which addressed many (but not yet all)
issues of working with git-annex direct and version 6 modes, and operation
on Windows in general.  Among enhancements you will see the
support of public S3 buckets (even with periods in their names),
ability to configure new providers interactively, and improved `egrep`
search backend.

Although we do not require with this release, it is recommended to make
sure that you are using a recent `git-annex` since it also had a variety
of fixes and enhancements in the past months.

Fixes

- Parsing of combined short options has been broken since DataLad
  v0.10.0. ([#2710])
- The `datalad save` instructions shown by `datalad run` for a command
  with a non-zero exit were incorrectly formatted. ([#2692])
- Decompression of zip files (e.g., through `datalad
  add-archive-content`) failed on Python 3.  ([#2702])
- Windows:
  - colored log output was not being processed by colorama.  ([#2707])
  - more codepaths now try multiple times when removing a file to deal
    with latency and locking issues on Windows.  ([#2795])
- Internal git fetch calls have been updated to work around a
  GitPython `BadName` issue.  ([#2712]), ([#2794])
- The progess bar for annex file transferring was unable to handle an
  empty file.  ([#2717])
- `datalad add-readme` halted when no aggregated metadata was found
  rather than displaying a warning.  ([#2731])
- `datalad rerun` failed if `--onto` was specified and the history
  contained no run commits.  ([#2761])
- Processing of a command's results failed on a result record with a
  missing value (e.g., absent field or subfield in metadata).  Now the
  missing value is rendered as "N/A".  ([#2725]).
- A couple of documentation links in the "Delineation from related
  solutions" were misformatted.  ([#2773])
- With the latest git-annex, several known V6 failures are no longer
  an issue.  ([#2777])
- In direct mode, commit changes would often commit annexed content as
  regular Git files.  A new approach fixes this and resolves a good
  number of known failures.  ([#2770])
- The reporting of command results failed if the current working
  directory was removed (e.g., after an unsuccessful `install`). ([#2788])
- When installing into an existing empty directory, `datalad install`
  removed the directory after a failed clone.  ([#2788])
- `datalad run` incorrectly handled inputs and outputs for paths with
  spaces and other characters that require shell escaping.  ([#2798])
- Globbing inputs and outputs for `datalad run` didn't work correctly
  if a subdataset wasn't installed.  ([#2796])
- Minor (in)compatibility with git 2.19 - (no) trailing period
  in an error message now. ([#2815])

Enhancements and new features

- Anonymous access is now supported for S3 and other downloaders.  ([#2708])
- A new interface is available to ease setting up new providers.  ([#2708])
- Metadata: changes to egrep mode search  ([#2735])
  - Queries in egrep mode are now case-sensitive when the query
    contains any uppercase letters and are case-insensitive otherwise.
    The new mode egrepcs can be used to perform a case-sensitive query
    with all lower-case letters.
  - Search can now be limited to a specific key.
  - Multiple queries (list of expressions) are evaluated using AND to
    determine whether something is a hit.
  - A single multi-field query (e.g., `pa*:findme`) is a hit, when any
    matching field matches the query.
  - All matching key/value combinations across all (multi-field)
    queries are reported in the query_matched result field.
  - egrep mode now shows all hits rather than limiting the results to
    the top 20 hits.
- The documentation on how to format commands for `datalad run` has
  been improved.  ([#2703])
- The method for determining the current working directory on Windows
  has been improved.  ([#2707])
- `datalad --version` now simply shows the version without the
  license.  ([#2733])
- `datalad export-archive` learned to export under an existing
  directory via its `--filename` option.  ([#2723])
- `datalad export-to-figshare` now generates the zip archive in the
  root of the dataset unless `--filename` is specified.  ([#2723])
- After importing `datalad.api`, `help(datalad.api)` (or
  `datalad.api?` in IPython) now shows a summary of the available
  DataLad commands.  ([#2728])
- Support for using `datalad` from IPython has been improved.  ([#2722])
- `datalad wtf` now returns structured data and reports the version of
  each extension.  ([#2741])
- The internal handling of gitattributes information has been
  improved.  A user-visible consequence is that `datalad create
  --force` no longer duplicates existing attributes.  ([#2744])
- The "annex" metadata extractor can now be used even when no content
  is present.  ([#2724])
- The `add_url_to_file` method (called by commands like `datalad
  download-url` and `datalad add-archive-content`) learned how to
  display a progress bar.  ([#2738])

* tag '0.10.3': (214 commits)
  Changelog entry for 2.19 git compat fix
  DOC: slight tune ups to the ChangeLog
  ENH: link issue/pull #s in CHANGELOG, use tools/link_issues_CHANGELOG
  BF: remove trailing period while matching a mesage from git
  DOC: try_multiple_dec: Add description of `duration` parameter
  CLN: annexrepo: Fix grammar in a recently added comment
  TST: auto: Reformat comment from 900ee08
  DOC: _rmtree: Drop a word from summary line
  DOC: Info on IDs (fixes gh-2801)
  BF: diff -- when reraising - just raise, do not raise that instance of exception from new location
  BF(TST): precommit before removing submodule so there is no batched processes
  ENH+BF(TST): close established ssh sockets upon test finish
  BF(TST): One more "clost all log handlers in the test"
  CHANGELOG.md: Start adding entries for v0.10.3
  BF(TST): close cookies db in the teardown since atexit is later, so cannot assure no open files
  BF(TST): explicitly close created log handlers
  ENH(TST): @known_failure_windows to replace plain @skip_if_on_windows where there is hope
  BF(TST): do not swallow output while testing AutomagicIO to not cause some open files issue
  ENH(TST): Skip a test when cannot remove curent directory
  BF(TST): explicitly precommit a repo used under swallow_outputs
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment