Skip to content

Avoid passing long lists of paths to git in CLI #6922

Closed
@yarikoptic

Description

@yarikoptic

ATM we just pass them on command line, quite often requiring to split into multiple invocations on "chunks" of file paths, and even --amend previous commit. It can potentially lead to "odd" situations whenever such a "partial commit" is pushed to a sibling (dandi/dandisets#230 ).

So ideally we should avoid doing that whenever possible. Hence I asked git folks about how could we do that

A full reply to my question
Date: Mon, 8 Aug 2022 14:12:56 +0100                                                                                                                  
From: Phillip Wood <xxx>                                                                                                        
To: Yaroslav Halchenko <xxx>, "git@vger.kernel.org" <git@vger.kernel.org>                                                              
Subject: Re: --batch or some --paths-file for very long lists of paths                                                                                
                                                                                                                                                      
Hi Yaroslav                                                                                                                                           
                                                                                                                                                      
On 04/08/2022 23:59, Yaroslav Halchenko wrote:                                                                                                        
> Dear Git Gurus,                                                                                                                                     
>                                                                                                                                                     
> In DataLad (https://datalad.org) we are doing "our own" analysis of what                                                                            
> specific files (not entire directories) should git and git-annex operate                                                                            
> on.  Obviously, in large repositories (and we have with >100k files)                                                                                
> that might require invoking  git add  or  git diff  etc with a long list                                                                            
> of paths specified in the command line.  For that we often split                                                                                    
> invocation into multiple and even resort to   git commit --amend  to                                                                                
> combine multiple commits then into a single one.                                                                                                    
>                                                                                                                                                     
> But I wondered if may be there is already some trick to make such                                                                                   
> commands as   status, diff, add, commit   to operate on arbitrarily long                                                                            
> list of paths passed to that git command somehow.                                                                                                   
                                                                                                                                                      
A number of porcelain commands have a --pathspec-from-file option that takes a                                                                        
file with a list of pathspecs or reads them from stdin. When combined with                                                                            
--pathspec-file-nul this handles paths containing newline correctly or you can                                                                        
quote them without this option. You can pass --literal-pathspecs if you have a                                                                        
list of paths rather than pathspecs.                                                                                                                  
                                                                                                                                                      
At the plumbing level you can use "update-index" to add/delete/update paths in                                                                        
the index which will read paths from stdin and "checkout-index" will also read                                                                        
paths from stdin.                                                                                                                                     
                                                                                                                                                      
The diff family do not have any support for --pathspec-from-file at the moment                                                                        
but I'd be happy to see someone implement it (I think it would be fairly                                                                              
straight forward).                                                                                                                                    
                                                                                                                                                      
> Note that gitglossary (at least in 2.35.1 git on debian) says that                                                                                  
>                                                                                                                                                     
>       Pathspecs are used on the command line of "git ls-files", "git                                                                                
>      ls-tree", "git add", "git grep", "git diff", "git checkout", and many other                                                                    
>      commands ...                                                                                                                                   
>                                                                                                                                                     
> but                                                                                                                                                 
>                                                                                                                                                     
>       $> git ls-tree -h | head -n1                                                                                                                  
>       usage: git ls-tree [<options>] <tree-ish> [<path>...]                                                                                         
>                                                                                                                                                     
> so it is <path> not the <pathspec> like (why in stderr this time?)                                                                                  
>                                                                                                                                                     
>       $> git commit -h 2>&1 | head -n1                                                                                                              
>       usage: git commit [<options>] [--] <pathspec>...                                                                                              
>                                                                                                                                                     
> So if in both cases it is pathspec, may be pathspec could support some                                                                              
> other magical keyword like :(filelist)/tmp/mylonglistofpaths ?                                                                                      
                                                                                                                                                      
I like that path magic idea, but as we already have --pathspec-from-file I                                                                            
think we'd be better improving support for that.                                                                                                      
                                                                                                                                                      
Best Wishes                                                                                                                                           
                                                                                                                                                      
Phillip                                                                                                                                               
                                                                                                                                                      
> Thanks in advance for your time and thoughts,   

so I was pointed to two "options"

  • use --pathspec-from-file where possible (see below for list of commands supporting it, but might need to check since what versions of git!?)
  • and plumbing level update-index which can read from stdin (ATM we do use it for removal in save_)
(git)lena:~/proj/misc/git[master]git
$> git describe
v2.37.1-377-g679aad9e82

$> git grep -e '^--pathspec-from-file=' -- 'Documentation/git*'
Documentation/git-add.txt:--pathspec-from-file=<file>::
Documentation/git-checkout.txt:--pathspec-from-file=<file>::
Documentation/git-commit.txt:--pathspec-from-file=<file>::
Documentation/git-reset.txt:--pathspec-from-file=<file>::
Documentation/git-restore.txt:--pathspec-from-file=<file>::
Documentation/git-rm.txt:--pathspec-from-file=<file>::
Documentation/git-stash.txt:--pathspec-from-file=<file>::

and we should then pait with --literal-pathspecs whenever we know that it is a PATH and not pathspecs (dirty PR was sent as #6921 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementteam-gitGit interface (GitRepo, protocols, helpers, ...) (https://github.com/datalad/datalad/issues/6365)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions