Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ES2018 RegExp lookbehind assertions when available #1708

Open
ExE-Boss opened this issue Jan 15, 2019 · 7 comments
Open

Use ES2018 RegExp lookbehind assertions when available #1708

ExE-Boss opened this issue Jan 15, 2019 · 7 comments

Comments

@ExE-Boss
Copy link
Contributor

ExE-Boss commented Jan 15, 2019

ECMAScript 2018 adds built‑in support for lookbehind assertions to regular expressions, which can be a lot faster than the current pseudo‑lookbehind implemented in Prism.

Also, it’d be nice to use the official syntax and only use the pre‑ES2018 implementation when necessary.

@RunDevelopment
Copy link
Member

Sorry, but this feature is so new that not even FireFox (64.0.2) supports it at this point in time. It will probably take a few years until the majority of browsers support this feature.
So Prism is likely to be stuck with the pseudo-lookbehinds for at least 5 years or so.

Also, I don't think that a polyfill is possible for this as the underlying Regex-engine itself has to be changed for this.
Sure, you can implement one yourself in JS but then we don't have to talk about performance.

@ExE-Boss
Copy link
Contributor Author

I don’t mean implement a polyfill, I mean keep using the current code path for browsers which don’t support lookbehind, but add a new code path for browsers which do support this, which would convert the first ( into (?<= for lookbehind: true during runtime if it’s supported.

@RunDevelopment
Copy link
Member

I was actually also thinking that this might be a good idea at some point. "The browser (and Node) can optimize way better than we ever could and everything would be so much nicer", I was thinking.

But unfortunately, we can't do this because Prism's lookbehinds are not real lookbehinds; they behave differently.

To illustrate this, suppose the following pattern: /(a*)./ which is used to highlight this string: aa.

  • With Prism's lookbehind, we find 1 match which is the second a.
  • With real lookbehinds, we find both as.

The real lookbehind obviously behaves correct but the point is that some patterns actually use this 'incorrect' behavior of Prism's lookbehinds, so we might get different highlighting results depending on the browser (version).

Another problem is that real lookbehinds can even access parts of the string before lastIndex. Prism cannot. This means that greedy matching will behave differently. So we would have to change that implementation as well.

I do think that we should totally go for real Regex lookbehinds as soon as they become available on a majority of browsers. We can then remove the fake Prism lookbehinds and live happily ever after.
But until then the hassle of making Prism work with 2 slightly different lookbehinds is just too much to be worth it.


To be clear:
One can use ECMAScript 2018 lookbehinds in patterns right now of course.

@mAAdhaTTah
Copy link
Member

We have #1578 to track when we can drop IE11 support. We can't use new RegEx features until we've decided that we're going to abandon IE11.

@RunDevelopment
Copy link
Member

@mAAdhaTTah We can't even use it IF we were to abandon IE11. This Regex feature is so new that not even the latest release of FF supports it. Only V8 implements this at the moment.

@joshgoebel
Copy link

AFAIK Safari is the major hold out now on support for look-behind. We don't plan to support it in Highlight.js until Safari has added proper support so that we have the same behavior across all modern browsers.

@RunDevelopment
Copy link
Member

Interesting. Right now, about 75% use a browser that supports lookbehinds. If all Safari browsers added support for them now, we could be at >=90% in maybe a year. Well, that's a big if, so Prism will likely not use lookbehinds for another two years or so.

Also, Prism has to be conservative about this because we heavily use RegExp literals. Browsers that don't support lookbehinds in RegExp literals will throw a SyntaxError at parse time meaning that our whole library won't work instead of just one or two regexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants