Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.1] fixed smiley bottleneck, major performance improvement #2116

Closed
wants to merge 1 commit into from

Conversation

Slind14
Copy link

@Slind14 Slind14 commented Oct 18, 2016

Disclaimer

Please keep in mind that these are production profilings and there are addons installed which decrease the performance drastically already e.g. lexicon. If we calculate these out of the results, the performance improvement is way bigger than it looks like at the beginning.

The profiling for these results happened during rush hour (~load of 5 with 8 logical core CPU). While writing the performance improvement and benchmarking it I got even better results, due to being at 8 AM with a load of 0.8.

Pre
image
Post
image

I'm not a php expert and I'm sure that there is a better way to implement such improvement and even improve it further. The goal is to limit the encodeHTML calls as much as possible as it is by far the most resource hungry part. On top the regex query can be limited to run only once and another time for every smiley found instead of once for every smiley in the database. At this point the current preg_match_all result could be filtered by duplicates to reduce further iterations.

+1k Smileys

Board

Pre-Improvement:

WoltLab Benchmark: Execution time: 5.1334s (98,34% PHP, 1,66% SQL) | SQL queries: 61 | Memory-Usage: 24,06 MB
XDebug: 1358 different functions called in 5152 milliseconds (1 runs, 23 shown)

image

Post-Improvement:

WoltLab Benchmark: Execution time: 2.3657s (98,54% PHP, 1,46% SQL) | SQL queries: 60 | Memory-Usage: 24,06 MB
XDebug: 1334 different functions called in 2384 milliseconds (1 runs, 48 shown)

image

Thread

Pre-Improvement:

WoltLab Benchmark: Execution time: 1.0229s (98,9% PHP, 1,09% SQL) | SQL queries: 24 | Memory-Usage: 16,4 MB
XDebug: 1275 different functions called in 1044 milliseconds (1 runs, 63 shown)

image

Post-Improvement:

WoltLab Benchmark: Execution time: 0.7464s (97,9% PHP, 2,09% SQL) | SQL queries: 24 | Memory-Usage: 16,27 MB
XDebug: 1272 different functions called in 767 milliseconds (1 runs, 96 shown)

image

Default Smileys

Board

Pre-Improvement:

WoltLab Benchmark: Execution time: 0.5504s (88,65% PHP, 11,33% SQL) | SQL queries: 37 | Memory-Usage: 11,61 MB
XDebug: 1162 different functions called in 523 milliseconds (1 runs, 130 shown)

image

Post-Improvement:

WoltLab Benchmark: Execution time: 0.4693s (97,42% PHP, 2,56% SQL) | SQL queries: 37 | Memory-Usage: 11,63 MB
XDebug: 1162 different functions called in 472 milliseconds (1 runs, 139 shown)

image

Thread

Pre-Improvement:

WoltLab Benchmark: Execution time: 0.3511s (95,5% PHP, 4,47% SQL) | SQL queries: 26 | Memory-Usage: 16,05 MB
XDebug: 1170 different functions called in 368 milliseconds (1 runs, 126 shown)

image

Post-Improvement:

WoltLab Benchmark: Execution time: 0.3042s (91,65% PHP, 8,31% SQL) | SQL queries: 26 | Memory-Usage: 16,08 MB
XDebug: 1170 different functions called in 330 milliseconds (1 runs, 146 shown)

image

Slind14 added a commit to Slind14/WCF that referenced this pull request Oct 18, 2016
@Slind14 Slind14 changed the title fixed smiley bottleneck, major performance improvement [2.1] fixed smiley bottleneck, major performance improvement Oct 18, 2016
Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the technical issues:

//$text = preg_replace('~(?<!&\w{2}|&\w{3}|&\w{4}|&\w{5}|&\w{6}|&#\d{2}|&#\d{3}|&#\d{4}|&#\d{5})'.preg_quote((!$enableHtml ? StringUtil::encodeHTML($code) : $code), '~').'(?![^<]*>)~', $html, $text);
$text = preg_replace('~(?<=^|\s|<li>)'.preg_quote((!$enableHtml ? StringUtil::encodeHTML($code) : $code), '~').'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~', $html, $text);
$codes = array_keys($this->smilies);
$pattern = '~(?<=^|\s|<li>)'.str_replace('@', '|', preg_quote((!$enableHtml ? StringUtil::encodeHTML(implode('@', $codes)) : implode('|', $codes)), '~')).'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the str_replace('@', '|')? This breaks smileys containing an @ sign.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I use a character which is probably not used in any smiley palceholders and doesn't get escaped when calling preg_quote. I don't see a way around it without sacrificing performance. If you got any idea, that would be nice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get escaped when calling preg_quote

This already does not work properly when $enableHtml = true.

implode('|', array_map('preg_quote', $codes)) would work, if you use the \wcf\system\Regex class. implode('|', array_map(function ($item) { return preg_quote($item, '~'); })) would work directly with preg_replace.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which way do you prefer? Calling preg_quote multiple times will probably reduce performance although it makes the replace obsolete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer correct over fast.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant between your two alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer the Regex class. I find the closure pretty ugly for that job. But first: Let @dtdesign take a look at the basic idea behind this PR.

$pattern = '~(?<=^|\s|<li>)'.str_replace('@', '|', preg_quote((!$enableHtml ? StringUtil::encodeHTML(implode('@', $codes)) : implode('|', $codes)), '~')).'(?=$|\s|</li>'.(!$enableHtml ? '|<br />' : '').')~';
$matches = array();
preg_match_all($pattern, $text, $matches);
if (isset($matches[0])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (preg_match_all($pattern, $text, $matches))

preg_match_all($pattern, $text, $matches);
if (isset($matches[0])) {
foreach ($matches[0] as $key => $value) {
$value = StringUtil::decodeHTML($value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check for !$enableHtml, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought I keep the original parsing where possible, hence the double en/de-code.

@Slind14
Copy link
Author

Slind14 commented Oct 18, 2016

I wonder how the performance of preg_replace_callback is. That way you don't need to loop manually and start the regex engine just once.

I tried that but failed to get it working. Not really experienced in that kind of work. If you could provide me with a working example I could profile it.

@Slind14
Copy link
Author

Slind14 commented Oct 18, 2016

Regarding #2113 I see that there is no need for it in that case, after all the encode is still being called all the time which doesn't give a noticeable performance improvement. But in this case the improvement is major, 20% (default smileys), up to +1000% (+1k smileys).

@TimWolla
Copy link
Member

I tried that but failed to get it working. Not really experienced in that kind of work. If you could provide me with a working example I could profile it.

Before investing too much time into this one I'll let @dtdesign take a look. Basically instead of looking up the smileys inside a loop you would look them up inside the callback. Could look like this:

preg_replace_callback('~yadayada~', function ($matches) {
   return $this->smilies[$matches[0]];
}, $text);

@Slind14
Copy link
Author

Slind14 commented Oct 18, 2016

Alright.

Hmm. I tried that approach, looked exactly like this besides the additional html check and predefined pattern.

@dtdesign
Copy link
Member

@Slind14 I really do appreciate your effort, but as I said it earlier in #2113, this change is beyond of what I'm comfortable with.

I can see that there may be a slight slowdown in some rare cases, but these aren't significant enough to justify such a move. The current code works just fine for almost everyone and this method has proven to be stable and reliable for many years.

Again: If you require such a change for your own purposes, go ahead and edit the file right away. This is not going to be merged in the 2.1 branch ever.

@dtdesign dtdesign closed this Oct 27, 2016
@Slind14
Copy link
Author

Slind14 commented Nov 9, 2016

hmm I have it in use on multiple forums with +500k page views, no issues so far.

Shouldn't this go at least into the new version then?
I know smileys are somewhat cached there, but at some point they still need to go through this process, else you probably would have removed the method or made it deprecated?

@dtdesign
Copy link
Member

dtdesign commented Nov 9, 2016

WSC 3 no longer actively uses this method, but still ships it for backward compatibility. The new method works completely different including a single regex that includes all smiley codes. On top of that, the new version parses the smilies during save only, no longer having an impact on render time.

@Slind14
Copy link
Author

Slind14 commented Nov 9, 2016

Ahh thanks for explaining.

@Slind14
Copy link
Author

Slind14 commented Nov 9, 2016

ahh I see you just implemented it :D

@Slind14
Copy link
Author

Slind14 commented Nov 12, 2016

@TimWolla as I was also curious I did a quick test with the preg_replace_callback. It turns out it is +5 times slower.

@dtdesign
Copy link
Member

I've updated my implementation for WSC 3.0, it will now split the smiley code pattern to enforce a maximum length of ~30k characters per pattern. This is the result of an actual bug report from a user that uses rather long smiley codes and managed to hit PCRE's maximum length of a pattern.

Using your code and delivering it with a stability update for WCF 2.1 would have crashed the user's forums in production.

Don't get me wrong here, I'm not blaming you for overlooking it, as I wasn't aware of that limit either and running some tests wouldn't have yielded that error. Yet, it might help you to understand why we're extra cautious with changes to the stable version that don't appear to be absolutely required.

@Slind14
Copy link
Author

Slind14 commented Nov 18, 2016

Sure thanks for sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants