Closed
Description
ATM we just pass them on command line, quite often requiring to split into multiple invocations on "chunks" of file paths, and even --amend
previous commit. It can potentially lead to "odd" situations whenever such a "partial commit" is pushed to a sibling (dandi/dandisets#230 ).
So ideally we should avoid doing that whenever possible. Hence I asked git folks about how could we do that
A full reply to my question
Date: Mon, 8 Aug 2022 14:12:56 +0100
From: Phillip Wood <xxx>
To: Yaroslav Halchenko <xxx>, "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: --batch or some --paths-file for very long lists of paths
Hi Yaroslav
On 04/08/2022 23:59, Yaroslav Halchenko wrote:
> Dear Git Gurus,
>
> In DataLad (https://datalad.org) we are doing "our own" analysis of what
> specific files (not entire directories) should git and git-annex operate
> on. Obviously, in large repositories (and we have with >100k files)
> that might require invoking git add or git diff etc with a long list
> of paths specified in the command line. For that we often split
> invocation into multiple and even resort to git commit --amend to
> combine multiple commits then into a single one.
>
> But I wondered if may be there is already some trick to make such
> commands as status, diff, add, commit to operate on arbitrarily long
> list of paths passed to that git command somehow.
A number of porcelain commands have a --pathspec-from-file option that takes a
file with a list of pathspecs or reads them from stdin. When combined with
--pathspec-file-nul this handles paths containing newline correctly or you can
quote them without this option. You can pass --literal-pathspecs if you have a
list of paths rather than pathspecs.
At the plumbing level you can use "update-index" to add/delete/update paths in
the index which will read paths from stdin and "checkout-index" will also read
paths from stdin.
The diff family do not have any support for --pathspec-from-file at the moment
but I'd be happy to see someone implement it (I think it would be fairly
straight forward).
> Note that gitglossary (at least in 2.35.1 git on debian) says that
>
> Pathspecs are used on the command line of "git ls-files", "git
> ls-tree", "git add", "git grep", "git diff", "git checkout", and many other
> commands ...
>
> but
>
> $> git ls-tree -h | head -n1
> usage: git ls-tree [<options>] <tree-ish> [<path>...]
>
> so it is <path> not the <pathspec> like (why in stderr this time?)
>
> $> git commit -h 2>&1 | head -n1
> usage: git commit [<options>] [--] <pathspec>...
>
> So if in both cases it is pathspec, may be pathspec could support some
> other magical keyword like :(filelist)/tmp/mylonglistofpaths ?
I like that path magic idea, but as we already have --pathspec-from-file I
think we'd be better improving support for that.
Best Wishes
Phillip
> Thanks in advance for your time and thoughts,
so I was pointed to two "options"
- use --pathspec-from-file where possible (see below for list of commands supporting it, but might need to check since what versions of git!?)
- and plumbing level
update-index
which can read from stdin (ATM we do use it for removal insave_
)
(git)lena:~/proj/misc/git[master]git
$> git describe
v2.37.1-377-g679aad9e82
$> git grep -e '^--pathspec-from-file=' -- 'Documentation/git*'
Documentation/git-add.txt:--pathspec-from-file=<file>::
Documentation/git-checkout.txt:--pathspec-from-file=<file>::
Documentation/git-commit.txt:--pathspec-from-file=<file>::
Documentation/git-reset.txt:--pathspec-from-file=<file>::
Documentation/git-restore.txt:--pathspec-from-file=<file>::
Documentation/git-rm.txt:--pathspec-from-file=<file>::
Documentation/git-stash.txt:--pathspec-from-file=<file>::
and we should then pait with --literal-pathspecs
whenever we know that it is a PATH and not pathspecs (dirty PR was sent as #6921 )