Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand glob paths on Windows #234

Open
troplin opened this issue Nov 13, 2016 · 33 comments
Open

Expand glob paths on Windows #234

troplin opened this issue Nov 13, 2016 · 33 comments
Labels
icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. question An issue that is lacking clarity on one or more points.

Comments

@troplin
Copy link

troplin commented Nov 13, 2016

On Windows, glob expansion is not done by the shell (cmd.exe), but left to the individual program.
That means, that currently something like this doesn't work:

>rg PATTERN *.txt
*.txt: The filename, directory name, or volume label syntax is incorrect. (os error 123)
No files were searched, which means ripgrep probably applied a filter you didn't expect. Try running again with --debug.

It tries to open a file called *.txt, which obviously doesn't exist.

@BurntSushi
Copy link
Owner

What do other command line tools like grep do?

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Nov 13, 2016
@troplin
Copy link
Author

troplin commented Nov 13, 2016

grep does work as expected for me.
I installed it with cygwin though, so I'm not sure if this is a cygwin or grep feature.

But all native Windows tools I know do the expansion themselves, at least if it makes sense to expand.

@troplin
Copy link
Author

troplin commented Nov 13, 2016

Usually you just use FindFirstFile/FindNextFile for this.
Not sure if it uses the exact same rules as unix glob.

@BurntSushi
Copy link
Owner

sigh Indeed, it looks like globbing is done as part of the command line program: https://cygwin.com/ml/cygwin/2009-12/msg01097.html

Other instances of the same problem:

I think what this means is that I need to add a glob iterator to globset. Alternatively, we could use the existing iterator in the glob crate, but it doesn't support {a,b} syntax and gets some non-UTF-8 corner cases wrong.

@BurntSushi
Copy link
Owner

I think what this means is that I need to add a glob iterator to globset. Alternatively, we could use the existing iterator in the glob crate, but it doesn't support {a,b} syntax and gets some non-UTF-8 corner cases wrong.

The other other alternative is to use the standard Windows APIs for this. It seems like the kinds of globs it supports are not as nice as Unix-style globbing, but perhaps that's what Windows users expect, so it could be defensible to use that.

I don't anticipate working on this soon unfortunately.

@BurntSushi BurntSushi added the icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. label Mar 13, 2017
@troplin
Copy link
Author

troplin commented Mar 14, 2017

I thinks using the Windows API is better than nothing and probably easier to implement.
I might give it a try if I find the time.

@BurntSushi
Copy link
Owner

@troplin Thanks! If you do, I would hope to see the Windows logic put behind a separate crate. :-) (Which could live inside ripgrep, or could be yours to maintain.)

@BurntSushi
Copy link
Owner

BurntSushi commented Feb 2, 2018

Question: should ripgrep support Unix-style globbing here, or should it use the standard FindFirstFile/FindNextFile APIs, presumably to be consistent with other Windows CLI tools?

cc @retep998 @roblourens

@roblourens
Copy link
Contributor

Do the windows APIs support anything besides * and ?? It's not clear.

This doesn't affect vscode scenarios, but as a CLI user, I'd prefer more powerful patterns.

@sabi0
Copy link

sabi0 commented Sep 26, 2018

How about -g <GLOB> using Unix-style globs (as it does already) and just <GLOB> falling back to Windows API?

Basically there is already such behavior separation between ripgrep and shell globbing on Linux. And Windows CLI is just an "inferior shell" one might argue. So it would be natural for (emulated) "shell globbing" to work in the shell native way.

@HerbM
Copy link

HerbM commented Oct 28, 2018

FYI:
cmd.exe does NOT support anything but * and ? (specifically NOT [a-z] character classes).
PowerShell does support the character classes as part of "wildcards" (what it calls Globbing.

Current -g switch in RipGrep seems to work fine with ?, *, [class]

It does NOT work without the -g (which is as designed but a bit unexpected for a Windows only user.)

Worth putting out a Windows specific warning when the last thing on the line is *.txt or similar and nothing to search, instead of the current warning:

*.txt: The filename, directory name, or volume label syntax is incorrect. (os error 123)

Maybe add,

use '-g FilePattern' for globbing (wildcard file patterns)

@dracan
Copy link

dracan commented Feb 6, 2020

+1 for @HerbM's suggestion about mentioning the -g flag with the error message. This would have saved me time Googling how to grep against a file pattern, to eventually find this Github issue.

I've hit this issue before, but due to being in the middle of something, I've just moved over to searching in vscode instead of using RipGrep. I'm guessing that plenty of other Windows users have hit the same issue. Updating the error message would resolve this and make it immediately obvious how to do what the user was trying to do.

ps. Awesome tool btw! This is the only issue I've hit - other than that, it's amazing! Thanks for the great work!

@BurntSushi
Copy link
Owner

@dracan Thanks for the feedback! Funny note though: VS Code's search uses ripgrep. :)

@Taverius
Copy link

Definite +1 for yelling at the operator about -g, at the very least.

Counterpoint though:

findstr PATTERN *.txt

Functions as one would expect with glob-expansion ... as does ag.

I know windows path expansions is a pain in the behind, but there's a case to be made for functional parity when even findstr manages it :D

@BurntSushi
Copy link
Owner

Folks, +1 comments are not all that useful. Instead of basically just saying "me too", it would help if Windows users could answer questions I've asked. For example: #234 (comment)

@sabi0
Copy link

sabi0 commented Feb 13, 2020

So I suppose you didn't like my proposal from #234 (comment) ?

@BurntSushi
Copy link
Owner

BurntSushi commented Feb 13, 2020

I have two answers to my question do far. Both of them are different. Yours is one of them. More input from others would be great. In particular, I would love to hear how existing cli utilities work. Do they use standard unix globbing? Or shell native globbing? And more importantly, do users actually like that?

@sabi0
Copy link

sabi0 commented Feb 13, 2020

I believe most utilities use FindFirstFile / FindNextFile. For sure the standard ones like findstr.
But also e.g. pcre2grep and git (see https://github.com/git/git/blob/master/compat/win32/dirent.c).

IMO "consistency" is more important than "liking". So I would vote for using shell native globbing with FindFirstFile / FindNextFile on Windows.
However, as long as a special -g parameter is considered it would be fine IMO to make it behave consistently across all OSes (i.e. use Unix globbing).

@Taverius
Copy link

findstr has limitations - it supports file globbing but not directory. findstr /s PATTERN path/to/*.file works but findstr /s PATTERN path/*/*.file doesn't.

That would be an acceptable minimum but I would strongly prefer unix parity here so that tools that rely on ripgrep don't need to split code paths for windows.

@pfmoore
Copy link

pfmoore commented Feb 26, 2020

Question: should ripgrep support Unix-style globbing here, or should it use the standard FindFirstFile/FindNextFile APIs

As a Windows user, IMO just * and ?. Reasons:

  • They are standard for Windows tools. It's what the MS C runtime does when building argv, so command line users will be used to it.
  • They are invalid filename characters, so you don't need to worry about escaping mechanisms (which is good, as backslash is unavailable due to it being the path separator).

I'd argue for globbing in all elements of the path, not just on the leaf element, and supporting ** meaning "one or more levels of subdirectory", but these are nice to have rather than essential, and are not standard for the CRT, so it's entirely justifiable not to offer them.

Unix style globbing is powerful, and might be a bit more consistent cross-platform, but (a) you'd have to add an escaping mechanism (see above) and (b) not all Unix shells use the exactly same globbing syntax, so it's still not entirely portable. I dont think it's worth it, personally.

@BurntSushi
Copy link
Owner

BurntSushi commented Feb 26, 2020

@pfmoore Thanks for the feedback! I think the two choices here are "support Windows-style globbing using the corresponding winapi calls" or "support Unix-style globbing." I don't think I'm willing to do some inbetween state.

Also, note that ripgrep already supports Unix-style globbing on Windows with the -g/--glob flag, along with the various gitignore support. This is why the underlying glob library permits disabling backslash as an escape, which is indeed disabled on Windows by default. Moreover, globs already have an additional escaping mechanism built in. e.g., You can use [*] to refer to a literal *.

@SandraEickel
Copy link

I have filed issue #1667 after browsing open issues which brought me here.

IMHO the best option would be for ripgrep to behave

  • Windows-like when running from CMD or PowerShell
  • UNIX-like when running from Git Bash (MINGW64) or CygWin
    including input and output of path seperators, nor only globbing mechanisms.

So, this is no inbetween state - it is just checking for the type of environment (shell) and behaving accordingly.

There's no RUST for OS/2, but there we have the same situation, the normal OS/2 CMD (or 4OS2) and a UNIX-like environment with Dash (instead of Bash).

@workingjubilee
Copy link

workingjubilee commented Jul 4, 2023

I believe ripgrep should not reimplement Unix globbing by default because it creates an incorrect expectation that a responsibility of the shell should be the responsibility of individual programs, demanding everyone repeat the work that the shell is very well capable of doing itself.

@BurntSushi
Copy link
Owner

That's a fair point, but if end users are hitting this problem in a popular shell and there is no work-around for them, then it makes sense to me for ripgrep to expand the globs. AIUI, this is standard for Windows CLI tools? Although I'm not 100% sure on that. It might also help here to enumerate the shells that fail to expand globs. Is it just cmd.exe or is it PowerShell too?

@pfmoore
Copy link

pfmoore commented Jul 4, 2023

Powershell leaves it to the program as well. It’s the standard on windows that globbing is done by the individual program - in fact, the C runtime glob-expands argv automatically, so programs written in C get this behaviour without needing any explicit code for it.

@workingjubilee
Copy link

Is the argv globbing behavior exposed as a user function to call, at least?

It's slightly disappointing to hear PowerShell is afflicted with this. Globbing may start with a string, yes, but morally it expands to generate a list, the kind of nicely-structured data PowerShell likes. e.g. note this deviation between "classic" shell globbing and POSIX-compliant globbing:

    Empty lists
       The nice and simple rule given above: "expand a wildcard pattern
       into the list of matching pathnames" was the original UNIX
       definition.  It allowed one to have patterns that expand into an
       empty list, as in

           xv -wait 0 *.gif *.jpg

       where perhaps no *.gif files are present (and this is not an
       error).  However, POSIX requires that a wildcard pattern is left
       unchanged when it is syntactically incorrect, or the list of
       matching pathnames is empty.  With bash one can force the
       classical behavior using this command:

           shopt -s nullglob

       (Similar problems occur elsewhere.  For example, where old
       scripts have

           rm `find . -name "*~"`

       new scripts require

           rm -f nosuchfile `find . -name "*~"`

       to avoid error messages from rm called with an empty argument
       list.)

glob(7) man page

@BurntSushi
Copy link
Owner

BurntSushi commented Jul 4, 2023

@workingjubilee AIUI, you're supposed to use FindFirstFile and FindNextFile. The problem with those, IMO, is that the globbing syntax supported is extremely limited. I think it's basically just ? and *. Better than nothing I suppose. There is discussion above talking about whether to use the Windows native functions or to just do our own globbing.

@pfmoore

in fact, the C runtime glob-expands argv automatically

Is this documented anywhere?

@pfmoore
Copy link

pfmoore commented Jul 4, 2023

It's slightly disappointing to hear PowerShell is afflicted with this. Globbing may start with a string, yes, but morally it expands to generate a list, the kind of nicely-structured data PowerShell likes.

The thing is, it's not a matter of being "afflicted". It's a design choice - not all arguments are filenames, so why should the shell arbitrarily choose to treat them as such? But regardless of philosophy, it is a reality that Windows shells don't glob arguments, and it's extremely clumsy (probably impossible if you use cmd.exe) to manually glob in the shell so that a program sees a file list. So not globbing in the application is, whether you like it or not, a usability issue for Windows users.

Is this documented anywhere?

I found this. I'm surprised it says it's not the default - maybe MSVC links that in by default and that document is about the C runtime not the compiler. Or maybe my memory is faulty, and it always needed the extra object file linking in. it's been quite a long while since I programmed in C in earnest. But all of that's very much for C, I don't know how or if it would relate to Rust.

@workingjubilee
Copy link

The thing is, it's not a matter of being "afflicted". It's a design choice - not all arguments are filenames, so why should the shell arbitrarily choose to treat them as such?

So not globbing... is, whether you like it or not, a usability issue for Windows users.

Choose one!

@pfmoore
Copy link

pfmoore commented Jul 5, 2023

Choose one!

Personally, I choose having the application do the globbing on operating systems where that's the norm...

@workingjubilee
Copy link

Please go open an issue in the thousands of other application repos where this problem will remain unsolved by addressing it here, then.

@BurntSushi
Copy link
Owner

Can we stop this tit-for-tat? This issue tracker is for ripgrep. I appreciate the argument that it would be better if this were solved in shell, but that is not a trump card IMO. It's just one consideration of many.

This issue is open because I generally think this is worth fixing on ripgrep's side, but I also simultaneously am not currently prioritizing it.

@pfmoore
Copy link

pfmoore commented Jul 5, 2023

Sorry, you're right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
icebox A feature that is recognized as possibly desirable, but is unlikely to implemented any time soon. question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

10 participants