Skip to content

Conversation

@mstoykov
Copy link
Contributor

No description provided.

@mstoykov
Copy link
Contributor Author

This was found by a user of a project I worked on and the original regexp is a lot more ... involved and is here.

I would like also the input of @dop251 (sorry about pinging you), as I am using this library through goja. I don't think this should be "fixed" goja though, but maybe I have missed something.

@mstoykov mstoykov changed the title Support \u{HEX} syntax in ECMAScript and Unicode Support \u{HEX} syntax in ECMAScript with Unicode flag Jun 29, 2022
@dlclark
Copy link
Owner

dlclark commented Jun 29, 2022

It appears the \x{hex} syntax is widely supported across regex engines. I suggest we change this PR to just add support for it without needing a specific flag. Thoughts?

It’s a syntax error in .NET but I’m fine allowing new syntax as long as existing .NET patterns work in the engine.

@dop251
Copy link
Contributor

dop251 commented Jun 29, 2022

I'm not sure about other modes, but as far as I'm aware the \u{...} syntax is supported in ECMAScript regardless of any other flags.

(Note for @dlclark, the \u{...} syntax appears to be unique to ECMAScript, others use \x{...})

@mstoykov
Copy link
Contributor Author

@dlclark I also couldn't find this to be a syntax anywhere else ... but to be honest googling unicode escape sequence doesn't really give good results 🤷

@dop251 I originally saw it is required from https://exploringjs.com/es6/ch_unicode.html#_where-can-escape-sequences-be-used

And couldn't really find good reference in the actual specification. Now that I tried again I found https://tc39.es/ecma262/multipage/additional-ecmascript-features-for-web-browsers.html#_ref_20735 seems like the unicode escapes at all should only work when u is added 🤔

@dop251
Copy link
Contributor

dop251 commented Jun 29, 2022

Yes, you're right, my bad.

@dlclark
Copy link
Owner

dlclark commented Jun 29, 2022

Ha, in my head I apparently just switched to \x instead of \u. Ignore my previous comment.

The only other thing I’d ask is some documentation in the readme about what the new Unicode option does.

@mstoykov
Copy link
Contributor Author

I triple checked on firefox, gjs and nodejs they have the behaviour explained in the first link I gave.

Looking at the specification more closely I just now realised that next to one of the lines there is ? and the other is + 🤦

So yes this is more or less the expected behaviour. There seem to be more cases to be implemented but this is the one I have hit so lets have only this for now 😬

The only other thing I’d ask is some documentation in the readme about what the new Unicode option does.

will try to come up with something, but it likely will be "enables ECMAScript's unicode mode, has no effect without ECMAScript mode enabled as well"

@dlclark dlclark merged commit 3511044 into dlclark:master Jul 17, 2022
@mstoykov mstoykov deleted the ecmascriptUnicodeEscape branch January 22, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants