Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shorthand syntax parser fails silently and delivers raw template #556

Open
moenadic opened this issue Jul 8, 2021 · 4 comments
Open

shorthand syntax parser fails silently and delivers raw template #556

moenadic opened this issue Jul 8, 2021 · 4 comments

Comments

@moenadic
Copy link

moenadic commented Jul 8, 2021

When upgrading a legacy PHP app from 7.2 to 7.4, I encountered an error where the regex engine errored out with preg_last_error() = 2, preg_last_error_msg() = "Backtrack limit exhausted".

I propose the following actions:

  1. Add error handling throwing an Exception according to preg_last_error()!==0. Getting false for "expression did not match" should be treated differently from "we ran into catastrophic backtracking and gave up". In the latter case returning the raw string might do anything, from breaking the site to exposing secrets; and in complete silence, not even a log entry to be found.
  2. Change Patterns::$SPLIT_PATTERN_SHORTHANDSYNTAX and remove the + quantifier from the |\s alternate pattern. Repetition of the outer non-capturing group also matches all the spaces but by disambiguating quantifiers will prevent catastrophic backtracking (at least in all cases I could throw against it).
  3. Add a suitable test case (see below) to the unit tests to try and detect regressions.
    /**
     * Pattern which splits the shorthand syntax into different tokens. The
     * "shorthand syntax" is everything like {...}
     */
    static public $SPLIT_PATTERN_SHORTHANDSYNTAX = '/
		(
			{                                 # Start of shorthand syntax
				(?:                           # Shorthand syntax is either composed of...
					[a-zA-Z0-9\|\->_:=,.()*+\^\/\%] # Various characters including math operations
					|"(?:\\\"|[^"])*"         # Double-quoted strings
					|\'(?:\\\\\'|[^\'])*\'    # Single-quoted strings
					|(?R)                     # Other shorthand syntaxes inside, albeit not in a quoted string
					#|\s+ # Spaces	← ORIGINAL, AMBIGUOUS, EXPLODES
					|\s # Spaces	← DEFER LOOPING TO GROUP, WORKS
				)+
			}                                 # End of shorthand syntax
		)/x';

See in action here on regex101 to debug and test. This sample includes all the test cases I found in the unit tests and correctly recognizes them. The last part with the style tag is a crafted example that reliably provokes catastrophic backtracking for the original regex.

Disclaimer: The above fix should work correctly. So far I have not found any adverse effects on test systems. But since the unit tests only cover part of the problem space, a definitive verdict cannot be given.

@NamelessCoder
Copy link
Member

Very nicely spotted, @moenadic!

This seems like a valid solution and I've not been able to determine any negative effects from it; the original intention of the + in matching whitespace was to allow multiple whitespaces (e.g. when you add line breaks and indents to a long inline expression) - but this is supported equally well without the +.

The unit tests should indeed be extended here, to include examples of inline expressions with multiple spaces and linebreaks/indentations, plus some examples of arbitrary multiple whitespace within expressions. We might also add a unit test that confirms that things like inline style brackets at least won't be detected as Fluid - but we can't do the same for embedded JSON or JS function curlies since the only reason CSS isn't mis-detected is the ; within the block, and this isn't necessarily part of a JS block. Unfortunate, but having a test that asserts we're not detecting CSS as potential inline Fluid syntax would already be a nice improvement.

@cerlestes
Copy link

Great find and solid proposed fix.

I remember running into the same error with the same regex in 2016 when upgrading to PHP 7.0, but I don't remember the exact circumstances. Most likely they were similiar to those experienced by @moenadic . I think we found a workaround at the time, but this was most likely the root cause. Please fix like proposed, as already mentioned the behaviour of the regex shouldn't change at all from removing the duplicate quantity modifier.

@moenadic
Copy link
Author

Would it be possible to fix this in a release soon? Is any more input required?

I would hate to have to have a private fork for this single issue. It is currently blocking our PHP version upgrade, though.

@mbrodala
Copy link
Member

@moenadic You can apply a Composer patch in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants