Open
Description
Please describe the minimum necessary steps to reproduce this issue:
Run this Node.js script:
const nanomatch = require('nanomatch');
const reg = nanomatch.makeRe('é/**/*');
console.log(reg.test('é/foo.txt'));
What is happening (but shouldn't):
Output is false
because the RegExp
test fails.
What should be happening instead?
Output is true
because the RegExp
test succeeds.
What's happening
Here is the RegExp
produced by nanomatch
:
/^(?:(?:\.[\\\/](?=.))?é[\\\/]?\b(?!\.)(?:(?!(?:[\\\/]|^)\.).)*?[\\\/](?!\.)(?=.)[^\\\/]*?(?:[\\\/]|$))$/
**
The word boundary matcher (starred) is the culprit. This matcher requires that the end of the first part of the glob é
is a word boundary. There are two problems with the matcher:
- According to ECMA-262, the set of characters that constitutes a word boundary is quite small, which is why
é
gets rejected as a word boundary. One solution is to add the Unicode flagu
to the end of theRegExp
. This is only a partial solution because... - Directory names can end in odd characters like
#
for example. If you replace theé
in this example with#
, the test fails even with the Unicode flag.
Another odd behavior with this RegExp
is that the first test here fails but the second test passes:
reg.test('é/foo.txt'); // false
reg.test('é/a/foo.txt'); // true
The Unicode flag would be a good addition to un-break certain consumers of this library (see gulpjs/gulp#2153), but given the above odd behavior and above problem (2), it seems there might be some other consideration necessary.