Characters missing from list of allowable path characters #52

cocowalla · 2018-08-10T15:45:28Z

The readme says:

By default, when your glob pattern is parsed, DotNet.Glob will only allow literals which are valid for path / directory names. These are:
Any Letter (A-Z, a-z) or Digit
., , !, #, -, ;, =, @, ~, _, :

Maybe I'm misunderstanding this section, but on all of Windows, Linux and MacOS, lot's of other characters are valid in file system paths, such as:

Any printable Unicode character 你好！
{ } [ ] ( ) + ; % ? *

Also, on Windows : is not valid.

The text was updated successfully, but these errors were encountered:

dazinator · 2018-08-10T16:50:12Z

Yeah you are right - this needs sorting.

Firstly the doc should say

Any Letter or Digit

Not just A-Z, a-z or Digit. Because it actually does Char.IsLetterOrDigit(). So in your example for 你好！the first two of those characters return true for the IsLetterOrDigit check so they are currently allowed.

The last character however ！is categorised as unicode punctuation and so fails the current check for a valid literal, but yes it is valid for a filename.

However taken into a wider context - I don't think this "allowed literal characters" limitation is really helping anything much and I think I should just remove it. Like you say the set of allowed characters will differ per platform and that's not something I want to get into really.

This originally evolved because I wanted to identify if a character was a literal, and I thought if there was a small subset / array I could check in that pretty quickly. But actually the better approach seems to be parse for literals last after checking for other kinds of tokens, and then assume that as its not any other kind of token, then it must be a literal that remains. This way you only need to identify that the next character is not any other token rather than the next character is in a list of known good literal characters. The two checks are roughly the same performance wise.

So here is what I think I should do:

Drop the set of allowed literal characters.
Remove the option AllowInvalidPathCharacters.

The default behaviour will then just be that the character will be assumed to be a literal if it isn't parsed as any other token first - which is how AllowInvalidPathCharacters= true behaves.

cocowalla · 2018-08-10T16:52:47Z

Yep, treating anything that isn't a special character as a literal makes sense to me

Fixes for #51 & #52

dazinator · 2018-08-24T22:05:25Z

Thanks @cocowalla this will be releases in 2.1.0

dazinator added the enhancement label Aug 10, 2018

dazinator added a commit that referenced this issue Aug 15, 2018

Merge pull request #53 from cocowalla/develop

eaaf212

Fixes for #51 & #52

dazinator closed this as completed Aug 24, 2018

dazinator mentioned this issue Sep 14, 2018

What characters does the default setting allow in a file/folder path/name? #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Characters missing from list of allowable path characters #52

Characters missing from list of allowable path characters #52

cocowalla commented Aug 10, 2018

dazinator commented Aug 10, 2018 •

edited

cocowalla commented Aug 10, 2018

dazinator commented Aug 24, 2018

Characters missing from list of allowable path characters #52

Characters missing from list of allowable path characters #52

Comments

cocowalla commented Aug 10, 2018

dazinator commented Aug 10, 2018 • edited

cocowalla commented Aug 10, 2018

dazinator commented Aug 24, 2018

dazinator commented Aug 10, 2018 •

edited