Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a few cases of exponential backtracking #2268

merged 12 commits into from Apr 16, 2020


Copy link

@RunDevelopment RunDevelopment commented Mar 25, 2020

I'm currently writing a library to construct NFA from JS RegExp, so plugged it into Prism's pattern test to check for exponential backtracking and fixed (almost) all offending patterns. Please note that my test was not great. It only tested for one kind of exponential backtracking and it couldn't handle assertions and lookarounds at all, so there might very well be other patterns that still have exponential backtracking.

Now for the complex part: Textile.
The modifierRegex looks like this: \([^|)]+\)|\[[^\]]+\]|\{[^}]+\} with the important parting being the \([^|)]+\). Many pattern then use the pattern like this: (?:<MOD>|[<>=()])+ to get modifiers and some other characters.
The problem is that these other characters include ( and ), so for a string of the form /(?:\(=\)){100}/ there are 2^100 ways to match it.

The problem here is that I can't just remove the () from the set of other characters because unholy creations like p((. some text are valid.
So, how do we solve this?

(I refactored textile a little anyway.)

Copy link

@Golmote Golmote commented Mar 26, 2020

Nice work!

Regarding Textile, would it help to constrain the modifier regexp more?
Inside the pair of parentheses, there can only be either a classname followed by an optional id or just an id, I believe. So the content could be restricted to [\w-][\w\d-]*(?:#[\w-][\w\d-]*)?|#[\w-][\w\d-] (yes, this is not the exact grammar for identifiers in CSS but it's good enough). Would that help mitigate the backtracking issue?
Similarly, the content inside square brackets could be constrained too but I'm not sure there is the need.

Actually, maybe it's even simpler to just ban [<>=()] from the modifier? \([^|<>=()]+\)|\[[^\]]+\]|\{[^}]+\}

Copy link
Member Author

@RunDevelopment RunDevelopment commented Mar 26, 2020

Ok. I solved it by using a special pattern for parentheses.
The "only classnames and attributes" approach (was my first thought as well) wouldn't have worked because textile basically allows anything in between brackets.

@RunDevelopment RunDevelopment added this to the 1.20.0 milestone Mar 26, 2020
@mAAdhaTTah mAAdhaTTah removed this from the 1.20.0 milestone Apr 6, 2020
@RunDevelopment RunDevelopment merged commit 7a554b5 into PrismJS:master Apr 16, 2020
1 check passed
@RunDevelopment RunDevelopment deleted the backtracking branch Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants