Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overly complex default regex in SuppressWarnings format #6453

Closed
Vampire opened this issue Feb 22, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@Vampire
Copy link
Contributor

commented Feb 22, 2019

The default regex for SuppressWarnings format is ^$|^\s+$.
I'd suggest to simplify this to just ^\s*$ which should have the same effect.
Actually I would even use ^\s*+$ as little performance tweak.

@romani romani added the approved label Feb 23, 2019

@romani

This comment has been minimized.

Copy link
Member

commented Feb 23, 2019

Please explain why extra performance could be gained with last variant of regexp.

@Vampire

This comment has been minimized.

Copy link
Contributor Author

commented Feb 23, 2019

Assume you have a line with 50 spaces and then an a.

With greedy qualifier like ^\s*$, the regex engine will math the string start, then the 50 spaces, then it sees the a cannot match the end of string anchor. Then it will backtrack and give up one character, so the \s* matches only 49 spaces and then checks whether the now following space matches against the end of string anchor and so on, until \s* matches no character at all and the end of string character still does not allow the regex to complete and the match fails.

With the possessive quantifier backtracking is prevented, so once matched the match is atomic, so with ^\s*+$, the regex engine will math the string start, then the 50 spaces, then it sees the a cannot match the end of string anchor. Then as there is no dynamic part into which you can backtrack the match instantly fails.

Possessive quantifiers are a kind of fail-fast measure if you know that what comes after the quantifier cannot be matched inside the quantifier ever. In this case, the end of string can never occur within a sequence of spaces.

What would not work is ^\s*+\s$, it would fail even on a space-only string, because backtracking would be necessary, so that the \s*+ only matches 49 spaces in the second step and the second \s can match a single space which will not gonna happen due to the possessiveness, there ^\s*\s$ would be necessary..

@romani

This comment has been minimized.

Copy link
Member

commented Feb 23, 2019

Thanks a lot for details.
Default should be ^\s*+$.

Vampire added a commit to Vampire/checkstyle that referenced this issue Mar 7, 2019

@romani romani closed this in #6521 Mar 8, 2019

romani added a commit that referenced this issue Mar 8, 2019

@romani romani added the bug label Mar 8, 2019

@romani romani added this to the 8.19 milestone Mar 8, 2019

Vantuz added a commit to Vantuz/checkstyle that referenced this issue Apr 3, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.