Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Make attribute parsing possessive #12342

Merged
merged 6 commits into from Nov 30, 2018

Conversation

dmsnell
Copy link
Contributor

@dmsnell dmsnell commented Nov 27, 2018

Bug introduced in #11369

Someone discovered high CPU usage due to catastrophic backtracking on
an invalid block comment delimiter. The following input crashed the
parser on the server:

<!-- wp:block {"a":0} / -->

In another case a truncated excerpt crashed the server as well.

<!-- wp:image {"li<br /><br /><strong>This post was too long to display in full. Content has been truncated.</strong>

The optimization introduced in #11369 ended up opening a place for
backtracking that shouldn't be there. In this patch we're grouping
the attribute parsing section of the tokenizing RegExp pattern so
that we can make the group itself possessive so that we abort
any backtracking.

Status

  • needs a test to verify we don't struggle on this input
  • need to verify the JS parser, seems to be fine (was still slow but didn't fail as quickly)
  • need to explain confusing changes in JS parser (JS RegExp has no possessive qualifier)

@dmsnell dmsnell added [Type] Bug An existing feature does not function as intended [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Nov 27, 2018
@dmsnell
Copy link
Contributor Author

dmsnell commented Nov 27, 2018

Looks like this introduced no substantial difference in parsing performance for the test documents.

comparator-ywayevznec now sh_

@dmsnell dmsnell force-pushed the parser/fix-catastrophic-backtracking branch from 65b0e30 to 073f839 Compare November 27, 2018 05:52
@catehstn catehstn added this to the 4.6 milestone Nov 27, 2018
@catehstn
Copy link

Added to the 4.6 milestone, would like to prioritise fixing this given potential effects.

@mtias mtias added the [Type] Regression Related to a regression in the latest release label Nov 27, 2018
@dmsnell dmsnell force-pushed the parser/fix-catastrophic-backtracking branch 2 times, most recently from c7f954a to 910ba2d Compare November 27, 2018 21:07
@mtias mtias modified the milestones: 4.6, 4.8 Nov 29, 2018
@dmsnell dmsnell force-pushed the parser/fix-catastrophic-backtracking branch from 910ba2d to 41e37c7 Compare November 29, 2018 17:45
Bug introduced in #11369

Someone discovered high CPU usage due to catastrophic backtracking on
an invalid block comment delimiter. The following input crashed the
parser on the server:

```html
<!-- wp:block {"a":0} / -->
```

The optimization introduced in #11369 ended up opening a place for
backtracking that shouldn't be there. In this patch we're grouping
the attribute parsing section of the tokenizing RegExp pattern so
that we can make the group itself _possessive_ so that we abort
any backtracking.
@dmsnell dmsnell force-pushed the parser/fix-catastrophic-backtracking branch from 41e37c7 to 49c296c Compare November 30, 2018 18:10
@youknowriad youknowriad modified the milestones: 4.8, 4.6 Nov 30, 2018
Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@youknowriad youknowriad merged commit b650f4a into master Nov 30, 2018
@youknowriad youknowriad deleted the parser/fix-catastrophic-backtracking branch November 30, 2018 20:13
youknowriad pushed a commit that referenced this pull request Nov 30, 2018
* Parser: Make attribute parsing possessive

Bug introduced in #11369

Someone discovered high CPU usage due to catastrophic backtracking on
an invalid block comment delimiter. The following input crashed the
parser on the server:

```html
<!-- wp:block {"a":0} / -->
```

The optimization introduced in #11369 ended up opening a place for
backtracking that shouldn't be there. In this patch we're grouping
the attribute parsing section of the tokenizing RegExp pattern so
that we can make the group itself _possessive_ so that we abort
any backtracking.

* add test and fix broken fix

* really fix default JS parser

* add explanatory comment

* add @SInCE comment with updated target version

* version bumps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Bug An existing feature does not function as intended [Type] Regression Related to a regression in the latest release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants