Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: GitRepo.commit to work correctly on directory paths (Fixes #3087) #3106

Merged
merged 6 commits into from Apr 29, 2019

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jan 6, 2019

This pull request fixes #3087 through comparing to the git diff for the files provided to the commit call, so should behave properly in case of directories etc.

Changes

  • Needs fixing since breaks the tests
  • Make checks optional for the rev- functionality since it cares not about staged or not and about direct mode, so I guess a plain commit(files) should suffice in that case - probably should better be done in master upon merge
@mih
Copy link
Member

@mih mih commented Jan 7, 2019

Please make the new behavior optional and (ideally) off by default. The -revolution code stack performs this kind of normalization very early on (it is needed in other places too). Doing it unconditionally this late in the game would have no benefit and slow things down in its context.

If this is not desirable, I can overwrite commit() in RevolutionGitRepo, but I wanted to mention it.

It might be worth investigating whether the ls-files call done here should be modified to honor .gitignore setttings too. You could have a look at the specific call done in RevolutionGitRepo.get_content_info() -- it took a few cycles to arrive at this particular setup.

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Jan 7, 2019

Thanks for the info/pointers!
Unless someone gets time before then, I really hope to push on #2926 in a week or two so may be this one would even be superseded ;)

@yarikoptic yarikoptic added this to the Release 0.11.3 milestone Feb 10, 2019
@yarikoptic yarikoptic removed this from the Release 0.11.3 milestone Feb 20, 2019
@yarikoptic yarikoptic added this to the Release 0.11.4 milestone Feb 20, 2019
@yarikoptic yarikoptic removed this from the Release 0.11.4 milestone Feb 20, 2019
@yarikoptic yarikoptic added this to the 0.11.x milestone Feb 20, 2019
yarikoptic added 4 commits Apr 23, 2019
BK: causes datalad.support.tests.test_annexrepo.test_AnnexRepo_status
failure and probably more
…of provided files

This allows to explicitly check for what staged files not to be committed,
thus it should work for directories
@codecov
Copy link

@codecov codecov bot commented Apr 24, 2019

Codecov Report

Merging #3106 into 0.11.x will increase coverage by <.01%.
The diff coverage is 97.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3106      +/-   ##
==========================================
+ Coverage   90.84%   90.84%   +<.01%     
==========================================
  Files         252      252              
  Lines       33105    33127      +22     
==========================================
+ Hits        30073    30095      +22     
  Misses       3032     3032
Impacted Files Coverage Δ
datalad/utils.py 87.28% <100%> (+0.09%) ⬆️
datalad/interface/tests/test_save.py 100% <100%> (ø) ⬆️
datalad/support/annexrepo.py 88.98% <100%> (ø) ⬆️
datalad/support/tests/test_gitrepo.py 99.87% <100%> (ø) ⬆️
datalad/support/gitrepo.py 89.1% <92.3%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b12a4cc...db0a844. Read the comment docs.

@yarikoptic yarikoptic changed the base branch from master to 0.11.x Apr 24, 2019
@@ -2624,8 +2609,23 @@ def get_changed_files(self, staged=False, diff_filter='', index_file=None):
opts.append('--diff-filter=%s' % diff_filter)
if index_file:
kwargs['env'] = {'GIT_INDEX_FILE': index_file}
return [normpath(f) # Call normpath to convert separators on Windows.
for f in self.repo.git.diff(*opts, **kwargs).split('\0') if f]
if files:
Copy link
Member Author

@yarikoptic yarikoptic Apr 24, 2019

surprised that test didn't fail (locally) -- I intended to make it if files is not None here and I thought I added explicit test (may be died in intermediate RF), TODO

Copy link
Member Author

@yarikoptic yarikoptic Apr 24, 2019

done now in db0a844

@yarikoptic
Copy link
Member Author

@yarikoptic yarikoptic commented Apr 24, 2019

ok, @kyleam - should be ready for review.
I've done some limited benchmarking -- no notable change in comparison to 0.11.x ;)

@yarikoptic yarikoptic removed this from the 0.11.x milestone Apr 24, 2019
@yarikoptic yarikoptic added this to the Release 0.11.5 milestone Apr 24, 2019
@yarikoptic yarikoptic changed the title BF 3087 BF: GitRepo.commit to work correctly on directory paths (Fixes #3087) Apr 24, 2019
kyleam
kyleam approved these changes Apr 29, 2019
Copy link
Contributor

@kyleam kyleam left a comment

This change is a clear improvement on the current state. (gh-3009 has at least one other issue, but that is unrelated to this change, so I'll deal with that separately.)

Re: disabling this in master: AFAICS rev-save already avoids calling AnnexRepo.commit(), so there is no need to add a mechanism for disabling this.

@kyleam kyleam merged commit db0a844 into datalad:0.11.x Apr 29, 2019
5 checks passed
kyleam added a commit that referenced this issue Apr 29, 2019
As of gh-3009, if path arguments are given, AnnexRepo.commit() uses a
temporary index that excludes any staged changes that are not included
in the specified paths.  It does this to avoid git-annex failing with
"Cannot make a partial commit with unlocked annexed files".  This
logic broke calling 'datalad save' with a subdirectory argument
because the subdirectory was incorrectly counted as a "staged change
not be committed".

Fix this by teaching get_changed_files() how to restrict its output to
the specified paths and then finding the "staged changes not be
committed" by comparing the output of the file-restricted and
non-restricted calls of get_changed_files(staged=True).

Note: rev-save does not go through AnnexRepo.commit(), so there is no
need to add a switch to disable this in master.
kyleam added a commit to kyleam/datalad-container that referenced this issue Apr 29, 2019
extra_inputs was introduced in v0.11.2, but use v0.11.5 because that
will include 514545a46 (datalad/datalad#3106) as a fix for a
datalad-save regression that broke the docker adapter.
@yarikoptic yarikoptic deleted the bf-3087 branch May 7, 2019
yarikoptic added a commit that referenced this issue May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5': (96 commits)
  [DATALAD RUNCMD] make update-changelog
  Version boost and finalize CHANGELOG.md record
  ENH: new Makefile rule linkissues-changelog to link issues, which now will also be prerequisite for update-changelog
  CHANGELOG.md: Add entries for recently merged PRs
  ENH: require "distro" for python >= 3.8
  ENH: compat with python 3.8 which removed .dist -- try distro
  CLN: wtf: Remove unused (and duplicated) import
  DOC: wtf: Avoid double period in -S's description
  ENH: -D|--decor html_details -- to make it ready for pasting to github issue without clutter
  BF: assure bytes while giving to pyperclip upon its demand (on Py2)
  RF: move always present path + type "section" into "location" section, retain order of sections from cmdline
  RF: switch from nargs="*" to action=append for wtf -S
  ENH: wtf -S to specify which sections to query/display (by default -- all)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for gh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  ...
yarikoptic added a commit that referenced this issue May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5':
  BF: make test for url download more reliable in cases where connection fails
  RF: remove stale commented out duecredit in setup.py.  It has now its own section
kyleam added a commit to kyleam/datalad-container that referenced this issue May 28, 2019
extra_inputs was introduced in v0.11.2, but use v0.11.5 because that
will include 514545a46 (datalad/datalad#3106) as a fix for a
datalad-save regression that broke the docker adapter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants