Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: GitRepo -- make use of --pathspec-from-file where possible and chunking needed #6932

Merged
merged 4 commits into from
Aug 20, 2022

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Aug 10, 2022

Chunking is a kludge to workaround the problem of DataLad needing to provide a
long list of files to git/git-annex commands. Whenever git-annex has --batch mode
which we use typically for commands which could get lots of file paths arguments, we
have not used any special means like that for git commands. Many commands support
--pathspec-file-nul and --pathspec-from-file=FILE options to pass a list of pathspec(s)
to operate on via a FILE instead of command line.

In this PR we decide to pass via such a file in cases where more than a single
chunk is necessary.

The main change is to GitRepo.commit which used to resort to sequence of commits with --amend to establish the desired commit.

  • rebase on master (I think branch is really on top of maint, might not matter)
  • establish "thorough testing". Since tests do not operate on heavy lists of files, and not just one but a number of commands make use of pathspec_from_file, I will establish datalad.runtime.prefer_pathspec_from_file: bool which would cause use of --pathspec-from-file regardless on either chunking is needed. Then one of the matrix runs of travis will enable that config setting to get commands covered (probably the one for "bleeding edge everything")

Closes #6922

@yarikoptic yarikoptic added severity-minor a problem which doesn't affect the package's usefulness, and is presumably trivial to fix semver-minor Increment the minor version when merged and removed severity-minor a problem which doesn't affect the package's usefulness, and is presumably trivial to fix labels Aug 10, 2022
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Aug 11, 2022
https://github.com/datalad/datalad/runs/7793747019?check_suite_focus=true
and others in datalad#6932 are now failing all of a sudden!  Smells the like newest
and greatest setuptools added some deprecation
@yarikoptic
Copy link
Member Author

ok, the issue is reported pypa/setuptools#3501 ... so seems to relate to versionner and workarounds seems to be suggested such as https://github.com/shapely/shapely/pull/1475/files from pypa/setuptools#3501 (comment) . I guess I will submit a dedicated PR with that one.

yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Aug 12, 2022
https://github.com/datalad/datalad/runs/7793747019?check_suite_focus=true
and others in datalad#6932 are now failing all of a sudden!  Smells the like newest
and greatest setuptools added some deprecation

one more after presumably addresed in setuptools
https://github.com/pypa/setuptools/releases/tag/v64.0.2
pypa/setuptools#3506
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request Aug 12, 2022
…present

A little more discussion at datalad#6930 and initially
identified while working on datalad#6932 and being
unable to get CommandError.stdout for matching while using call_git (but works
with _call_git using different code path, which does not have this git_ignore_check).

One of the two locations where this function is used has stdout=None, so this
lead to wiping out of e.stdout even if it was present.  I think it was not intentional
but @christian-monch would know better
@codecov
Copy link

codecov bot commented Aug 12, 2022

Codecov Report

Merging #6932 (a2b00af) into master (bacdc8e) will increase coverage by 12.65%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master    #6932       +/-   ##
===========================================
+ Coverage   78.29%   90.95%   +12.65%     
===========================================
  Files         354      354               
  Lines       46259    46276       +17     
===========================================
+ Hits        36219    42090     +5871     
+ Misses      10040     4186     -5854     
Impacted Files Coverage Δ
datalad/interface/common_cfg.py 100.00% <ø> (ø)
datalad/runner/tests/test_gitrunner.py 100.00% <ø> (ø)
datalad/dataset/gitrepo.py 96.83% <100.00%> (+0.01%) ⬆️
datalad/runner/gitrunner.py 93.40% <100.00%> (+2.15%) ⬆️
datalad/support/gitrepo.py 92.24% <100.00%> (+1.89%) ⬆️
datalad/customremotes/tests/test_datalad.py 95.34% <0.00%> (-2.28%) ⬇️
datalad/cli/common_args.py 100.00% <0.00%> (ø)
datalad/tests/test_dochelpers.py 100.00% <0.00%> (ø)
datalad/support/tests/test_network.py 100.00% <0.00%> (ø)
datalad/support/tests/test_locking.py 95.87% <0.00%> (+0.04%) ⬆️
... and 141 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…hunking needed

Chunking is a kludge to workaround the problem of DataLad needing to provide a
long list of files to git/git-annex commands.  Whenever git-annex has --batch mode
which we use typically for commands which could get lots of file paths arguments, we
have not used any special means like that for git commands.  Many commands support
--pathspec-file-nul and --pathspec-from-file=FILE options to pass a list of pathspec(s)
to operate on via a FILE instead of command line.

In this PR we decide to pass via such a file in cases where more than a single
chunk is necessary.
To guarantee that we do test operation of datalad whenever paths are to
be provided to git commands (we think) already support --pathspec-from-file
option(s).  I added that run to existing runs of Travis which already check
for nonmiltiplexed ssh
… the logs

Now it logs with all the variables not expanded etc, so hard to replicate invocation
locally
@codeclimate
Copy link

codeclimate bot commented Aug 16, 2022

Code Climate has analyzed commit a2b00af and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 2

View more on Code Climate.

@yarikoptic yarikoptic requested a review from mih August 17, 2022 18:28
Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense and is a verz useful feature. Thx!

@yarikoptic-gitmate
Copy link
Collaborator

PR released in 0.18.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor Increment the minor version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid passing long lists of paths to git in CLI
3 participants