Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping glob patterns #51

Closed
cocowalla opened this issue Aug 10, 2018 · 9 comments
Closed

Escaping glob patterns #51

cocowalla opened this issue Aug 10, 2018 · 9 comments

Comments

@cocowalla
Copy link
Contributor

Is escaping glob patterns supported?

For example, let's say I have a literal path like:

/my*files/more[stuff]/is-there-more?/

Is there a supported mechanism my which *, [, ] and ? can be escaped, such that they will be treated as literals instead of glob characters? For example, can I escape characters with a backslash?

/my\*files/more\[stuff\]/is-there-more\?/

@dazinator
Copy link
Owner

dazinator commented Aug 10, 2018

Escaping isn't currently supported by the parser. However you should be able to achieve it with the fluent builder:

var glob = new GlobBuilder()
.PathSeperator()
.Literal("my*files")
.PathSeperator()
.Literal("more[stuff]")
.PathSeperator()
.Literal("is-there-more?")
.PathSeperator()
.ToGlob();

Supporting it at the parser level is something I'd like to add in the future.

I guess you have to give some thought to the escape sequence. If we went with a backslash, I guess you could now necessarily assume it to always be an escape character as someone might be just matching a path like this: \foo\bar. So I guess it would only need to be interpreted by the parser as an escape characer if the following character would not ordinarliy be parsed as a literal. I think with that additional check it should work. Sound right to you?

@cocowalla
Copy link
Contributor Author

cocowalla commented Aug 10, 2018

Fluent builder wouldn't work for my use case, as I need to parse arbitrary patterns (one I used was just an example).

Escaping with a backslash and only treating it as an escape sequence if followed by a special character sounds generally like the correct approach to me too.

@cocowalla
Copy link
Contributor Author

cocowalla commented Aug 11, 2018

So, I thought I'd have a crack at this! I thought I was done, and was just adding some more test cases... then I realised that I don't think escaping with a backslash is going to work :(

Consider this pattern (which is likely to be very common):
c:\MyFolder\*.txt

The intent is of course to look for *.txt files within c:\MyFolder, but the asterisk would be treated as a literal, so that's not going to work.

Same deal:

c:\MyFolder\[abc]de

If we use backslash as an escape character, the result would be a literal c:\MyFolder[abc]de.

You might think that at least it's not an issue on Linux, but technically you can have backslashes in filenames (even if it's uncommon).

I'm not sure what a good solution is here. A simple solution is to make the escape character configurable, effectively pushing the problem to devs that use this library, but at least allowing them to choose something that works for their scenario.

@dazinator
Copy link
Owner

dazinator commented Aug 11, 2018

Yeah its tricky, thanks for having a go at it.

I was just reading how other glob libraries approach it and came across this https://docs.python.org/3/library/glob.html

It mentions:

For a literal match, wrap the meta-characters in brackets. For example, '[?]' matches the character '?'.

:-) perhaps thats an easier approach

@cocowalla
Copy link
Contributor Author

Huh, that could be quite a clever solution! I'll give it a try tonight and see how it looks.

dazinator added a commit that referenced this issue Aug 15, 2018
@dazinator
Copy link
Owner

dazinator commented Aug 24, 2018

@cocowalla

I have done some more investigation on this, and it turns out that no logic changes are necessary for handling escaping.. It should already just work. So I ended up removing the escape sequence parsing.

I added these test cases to IsMatch and they all passed:

        [InlineData(@"C:\myergen\[[]a]tor", @"C:\myergen\[a]tor")]
        [InlineData(@"C:\myergen\[[]ator", @"C:\myergen\[ator")]
        [InlineData(@"C:\myergen\[[][]]ator", @"C:\myergen\[]ator")]
        [InlineData(@"C:\myergen[*]ator", @"C:\myergen*ator")]
        [InlineData(@"C:\myergen[*][]]ator", @"C:\myergen*]ator")]
        [InlineData(@"C:\myergen[*]]ator", @"C:\myergen*ator", @"C:\myergen]ator")]
        [InlineData(@"C:\myergen[?]ator", @"C:\myergen?ator")]
        [InlineData(@"/path[\]hatstand", @"/path\hatstand")]
        public void IsMatch(string pattern, params string[] testStrings)

I noted that one of the test cases you added I think you were expecting these to match:

C:\myergen[*]]ator pattern to match: "C:\myergen*]ator")]

This isn't actually how this is currently interpreted. The above pattern is actually still a character list, so it will expect to match any one character in that list, which means

C:\myergen[*]]ator matches either "C:\myergen*ator")] or "C:\myergen]ator")].

To match "C:\myergen*]ator")] you would want to use this pattern:

C:\myergen[*][]]ator

Hopefully that makes sense.

This feature should already work, but as part of the feature branch we need to just extend the README with a section explaining how escaping works. I'll get around to that at some point no doubt.

@dazinator
Copy link
Owner

Also negation also passes:

      [InlineData(@"/foo/bar[!!].baz", @"/foo/bar7.baz")] // anything except an exclaimation mark after bar
        [InlineData(@"/foo/bar[!]].baz", @"/foo/bar9.baz")] // anything except an ] after bar
        [InlineData(@"/foo/bar[!?].baz", @"/foo/bar7.baz")] // anything except an ? after bar
        [InlineData(@"/foo/bar[![].baz", @"/foo/bar7.baz")] // anything except an [ after bar

@cocowalla
Copy link
Contributor Author

The above pattern is actually still a character list, so it will expect to match any one character

^ emphasis mine :) Yeah, I messed that one up!

it turns out that no logic changes are necessary for handling escaping.. It should already just work!

Excellent 👍

@dazinator
Copy link
Owner

Cool. Well thanks for the PR, the added tests, and the removal of the unnecessary options. I think this has helped tidy it up a bit. I've merged this to develop, and i'll probably merge to master pretty soon as an incremental release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants