New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.1] fixed smiley bottleneck, major performance improvement #2116
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the technical issues:
- Speed up parseSmilies in MessageParser #2113 (comment)
- I wonder how the performance of
preg_replace_callback
is. That way you don't need to loop manually and start the regex engine just once.
//$text = preg_replace('~(?<!&\w{2}|&\w{3}|&\w{4}|&\w{5}|&\w{6}|&#\d{2}|&#\d{3}|&#\d{4}|&#\d{5})'.preg_quote((!$enableHtml ? StringUtil::encodeHTML($code) : $code), '~').'(?![^<]*>)~', $html, $text); | ||
$text = preg_replace('~(?<=^|\s|<li>)'.preg_quote((!$enableHtml ? StringUtil::encodeHTML($code) : $code), '~').'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~', $html, $text); | ||
$codes = array_keys($this->smilies); | ||
$pattern = '~(?<=^|\s|<li>)'.str_replace('@', '|', preg_quote((!$enableHtml ? StringUtil::encodeHTML(implode('@', $codes)) : implode('|', $codes)), '~')).'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for the str_replace('@', '|')
? This breaks smileys containing an @
sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I use a character which is probably not used in any smiley palceholders and doesn't get escaped when calling preg_quote. I don't see a way around it without sacrificing performance. If you got any idea, that would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get escaped when calling preg_quote
This already does not work properly when $enableHtml = true
.
implode('|', array_map('preg_quote', $codes))
would work, if you use the \wcf\system\Regex
class. implode('|', array_map(function ($item) { return preg_quote($item, '~'); }))
would work directly with preg_replace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which way do you prefer? Calling preg_quote multiple times will probably reduce performance although it makes the replace obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer correct over fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant between your two alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd prefer the Regex class. I find the closure pretty ugly for that job. But first: Let @dtdesign take a look at the basic idea behind this PR.
$pattern = '~(?<=^|\s|<li>)'.str_replace('@', '|', preg_quote((!$enableHtml ? StringUtil::encodeHTML(implode('@', $codes)) : implode('|', $codes)), '~')).'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~'; | ||
$matches = array(); | ||
preg_match_all($pattern, $text, $matches); | ||
if (isset($matches[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (preg_match_all($pattern, $text, $matches))
preg_match_all($pattern, $text, $matches); | ||
if (isset($matches[0])) { | ||
foreach ($matches[0] as $key => $value) { | ||
$value = StringUtil::decodeHTML($value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check for !$enableHtml
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought I keep the original parsing where possible, hence the double en/de-code.
I tried that but failed to get it working. Not really experienced in that kind of work. If you could provide me with a working example I could profile it. |
Regarding #2113 I see that there is no need for it in that case, after all the encode is still being called all the time which doesn't give a noticeable performance improvement. But in this case the improvement is major, 20% (default smileys), up to +1000% (+1k smileys). |
Before investing too much time into this one I'll let @dtdesign take a look. Basically instead of looking up the smileys inside a loop you would look them up inside the callback. Could look like this:
|
Alright. Hmm. I tried that approach, looked exactly like this besides the additional html check and predefined pattern. |
@Slind14 I really do appreciate your effort, but as I said it earlier in #2113, this change is beyond of what I'm comfortable with. I can see that there may be a slight slowdown in some rare cases, but these aren't significant enough to justify such a move. The current code works just fine for almost everyone and this method has proven to be stable and reliable for many years. Again: If you require such a change for your own purposes, go ahead and edit the file right away. This is not going to be merged in the 2.1 branch ever. |
hmm I have it in use on multiple forums with +500k page views, no issues so far. Shouldn't this go at least into the new version then? |
WSC 3 no longer actively uses this method, but still ships it for backward compatibility. The new method works completely different including a single regex that includes all smiley codes. On top of that, the new version parses the smilies during save only, no longer having an impact on render time. |
Ahh thanks for explaining. |
ahh I see you just implemented it :D |
@TimWolla as I was also curious I did a quick test with the preg_replace_callback. It turns out it is +5 times slower. |
I've updated my implementation for WSC 3.0, it will now split the smiley code pattern to enforce a maximum length of ~30k characters per pattern. This is the result of an actual bug report from a user that uses rather long smiley codes and managed to hit PCRE's maximum length of a pattern. Using your code and delivering it with a stability update for WCF 2.1 would have crashed the user's forums in production. Don't get me wrong here, I'm not blaming you for overlooking it, as I wasn't aware of that limit either and running some tests wouldn't have yielded that error. Yet, it might help you to understand why we're extra cautious with changes to the stable version that don't appear to be absolutely required. |
Sure thanks for sharing. |
Disclaimer
Pre
Post
+1k Smileys
Board
Pre-Improvement:
WoltLab Benchmark:
Execution time: 5.1334s (98,34% PHP, 1,66% SQL) | SQL queries: 61 | Memory-Usage: 24,06 MB
XDebug:
1358 different functions called in 5152 milliseconds (1 runs, 23 shown)
Post-Improvement:
WoltLab Benchmark:
Execution time: 2.3657s (98,54% PHP, 1,46% SQL) | SQL queries: 60 | Memory-Usage: 24,06 MB
XDebug:
1334 different functions called in 2384 milliseconds (1 runs, 48 shown)
Thread
Pre-Improvement:
WoltLab Benchmark:
Execution time: 1.0229s (98,9% PHP, 1,09% SQL) | SQL queries: 24 | Memory-Usage: 16,4 MB
XDebug:
1275 different functions called in 1044 milliseconds (1 runs, 63 shown)
Post-Improvement:
WoltLab Benchmark:
Execution time: 0.7464s (97,9% PHP, 2,09% SQL) | SQL queries: 24 | Memory-Usage: 16,27 MB
XDebug:
1272 different functions called in 767 milliseconds (1 runs, 96 shown)
Default Smileys
Board
Pre-Improvement:
WoltLab Benchmark:
Execution time: 0.5504s (88,65% PHP, 11,33% SQL) | SQL queries: 37 | Memory-Usage: 11,61 MB
XDebug:
1162 different functions called in 523 milliseconds (1 runs, 130 shown)
Post-Improvement:
WoltLab Benchmark:
Execution time: 0.4693s (97,42% PHP, 2,56% SQL) | SQL queries: 37 | Memory-Usage: 11,63 MB
XDebug:
1162 different functions called in 472 milliseconds (1 runs, 139 shown)
Thread
Pre-Improvement:
WoltLab Benchmark:
Execution time: 0.3511s (95,5% PHP, 4,47% SQL) | SQL queries: 26 | Memory-Usage: 16,05 MB
XDebug:
1170 different functions called in 368 milliseconds (1 runs, 126 shown)
Post-Improvement:
WoltLab Benchmark:
Execution time: 0.3042s (91,65% PHP, 8,31% SQL) | SQL queries: 26 | Memory-Usage: 16,08 MB
XDebug:
1170 different functions called in 330 milliseconds (1 runs, 146 shown)