Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters missing from list of allowable path characters #52

Closed
cocowalla opened this issue Aug 10, 2018 · 3 comments
Closed

Characters missing from list of allowable path characters #52

cocowalla opened this issue Aug 10, 2018 · 3 comments

Comments

@cocowalla
Copy link
Contributor

The readme says:

By default, when your glob pattern is parsed, DotNet.Glob will only allow literals which are valid for path / directory names. These are:

Any Letter (A-Z, a-z) or Digit
., , !, #, -, ;, =, @, ~, _, :

Maybe I'm misunderstanding this section, but on all of Windows, Linux and MacOS, lot's of other characters are valid in file system paths, such as:

  • Any printable Unicode character 你好!
  • { } [ ] ( ) + ; % ? *

Also, on Windows : is not valid.

@dazinator
Copy link
Owner

dazinator commented Aug 10, 2018

Yeah you are right - this needs sorting.

Firstly the doc should say

  • Any Letter or Digit

Not just A-Z, a-z or Digit. Because it actually does Char.IsLetterOrDigit(). So in your example for 你好!the first two of those characters return true for the IsLetterOrDigit check so they are currently allowed.

The last character however !is categorised as unicode punctuation and so fails the current check for a valid literal, but yes it is valid for a filename.

However taken into a wider context - I don't think this "allowed literal characters" limitation is really helping anything much and I think I should just remove it. Like you say the set of allowed characters will differ per platform and that's not something I want to get into really.

This originally evolved because I wanted to identify if a character was a literal, and I thought if there was a small subset / array I could check in that pretty quickly. But actually the better approach seems to be parse for literals last after checking for other kinds of tokens, and then assume that as its not any other kind of token, then it must be a literal that remains. This way you only need to identify that the next character is not any other token rather than the next character is in a list of known good literal characters. The two checks are roughly the same performance wise.

So here is what I think I should do:

  1. Drop the set of allowed literal characters.
  2. Remove the option AllowInvalidPathCharacters.

The default behaviour will then just be that the character will be assumed to be a literal if it isn't parsed as any other token first - which is how AllowInvalidPathCharacters= true behaves.

@cocowalla
Copy link
Contributor Author

Yep, treating anything that isn't a special character as a literal makes sense to me

dazinator added a commit that referenced this issue Aug 15, 2018
@dazinator
Copy link
Owner

Thanks @cocowalla this will be releases in 2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants