Skip to content

Certain globs ending with "non-word" characters fail to match #18

Open
@pe8ter

Description

@pe8ter

Please describe the minimum necessary steps to reproduce this issue:

Run this Node.js script:

const nanomatch = require('nanomatch');
const reg = nanomatch.makeRe('é/**/*');
console.log(reg.test('é/foo.txt'));

What is happening (but shouldn't):

Output is false because the RegExp test fails.

What should be happening instead?

Output is true because the RegExp test succeeds.

What's happening

Here is the RegExp produced by nanomatch:

/^(?:(?:\.[\\\/](?=.))?é[\\\/]?\b(?!\.)(?:(?!(?:[\\\/]|^)\.).)*?[\\\/](?!\.)(?=.)[^\\\/]*?(?:[\\\/]|$))$/
                               **

The word boundary matcher (starred) is the culprit. This matcher requires that the end of the first part of the glob é is a word boundary. There are two problems with the matcher:

  1. According to ECMA-262, the set of characters that constitutes a word boundary is quite small, which is why é gets rejected as a word boundary. One solution is to add the Unicode flag u to the end of the RegExp. This is only a partial solution because...
  2. Directory names can end in odd characters like # for example. If you replace the é in this example with #, the test fails even with the Unicode flag.

Another odd behavior with this RegExp is that the first test here fails but the second test passes:

reg.test('é/foo.txt'); // false
reg.test('é/a/foo.txt'); // true

The Unicode flag would be a good addition to un-break certain consumers of this library (see gulpjs/gulp#2153), but given the above odd behavior and above problem (2), it seems there might be some other consideration necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions