Fix quadratic behavior with inline HTML #380
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Repeated starting sequences like
<?
,<!DECL
or<![CDATA[
couldlead to quadratic behavior if no matching ending sequence was found.
Separate the inline HTML scanners. Remember if scanning the whole input
for a specific ending sequence failed and skip subsequent scans.
The basic idea is to remove suffixes
>
,?>
and]]>
from therespective regex. Since these regexes are already constructed to match
lazily, they will stop before an ending sequence. To check whether an
ending sequence was found, we can simply test whether the input buffer
is large enough to hold the match plus a potential suffix. If the regex
doesn't find the ending sequence, it will match so many characters that
this test is guaranteed to fail. In this case, we set a flag to avoid
further attempts to execute the regex.
To check which inline HTML regex to use, we inspect the start of the
text buffer. This allows some fixed characters to be removed from the
start of some regexes.
matchlen
is adjusted with a single additionthat accounts for both the relevant prefix and suffix.
Fixes #299.