New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance comparison of various elements #1813
Comments
Also - I forgot to mention - just look at the memory usage! Not just at the peaks (which, interestingly, don't follow the timings as |
I've added a new test to the script[1] which checks for memory leaks by ensuring we
As you can see, even destroying the DomPDF object does not free up the memory, so the memory leak will persist even if the calling code tidies up nicely. |
Nice tests! I just saw that memory_get_usage() internally triggers a garbage collection, so your values should be correct. But if you don't use memory_get_usage() in a script, just unsetting a variable does not free memory directly. You habe to call gc_collect_cycles() to force a garbage collection: Make sure you have zend.enable_gc enabled in your php.ini, or call gc_enable() to activate the circular reference collector. For DomPDF to speed up: It might make sense to disable the automatic garbage collector in some places, where nothing can be collected anyway, and the automatic garbage collector just wastes time. A nice read: PHP 7.3 will have some nice improvements in garbage collection: There is a difference between memory_get_usage() and memory_get_usage(true), maybe the second one gives more insights. And of cause: please use a recent version of PHP that is still supported, and has a better garbage collector. It has improved over the years. PHP 5.4 is 6 years old. Please use PHP 7.1 or 7.2, which are the "fully supported" versions today: DomPDF definitly needs some performance improvements, see the issues labeled "performance": I totally support you working on performance improvements! |
Thanks for replying! :-) My understanding about PHP garbage collection is that really you don't need to worry about it. It doesn't really matter when garbage collection happens; it just happens when it needs to. The reason Personally, I'm sceptical that we'll get much of a performance improvement by playing with garbage collection, but I'd be happy to be proved wrong... :-)
Actually, I think this will give less insight, as this reports how much memory PHP has requested from the operating system, which is only loosely coupled with the memory requirements of the script it is running. For example, if your script requires 5k of memory, PHP might decide it is most efficient to request 1MB of memory from the OS, but knowing this doesn't really help to optimise your code at all, and you can't control it. Your point about PHP versions is good. However, I have re-tested on PHP 5.6.18 and the profile is broadly similar. Therefore, I think this is about inefficiencies in the DomPDF code (or its dependencies) rather than anything related to PHP internal optimisations, and therefore that is definitely where I will be focussing my efforts. If you're able to test on PHP 7.2, it might be interesting to see how it compares...
Thanks for the pointer - this ticket should probably have that tag added, too. |
Don't make a PR just yet but I am interested in discussing the topic more. There have been some people who have worked to improve performance, but considering some of the outstanding deficiencies in basic HTML and CSS support it hasn't been a priority. Most of the performance notes you've made can be explained by how the code actually works. For example, anything that breaks a line results in additional layout calculation, and breaking within the same frame increases the line box count which requires more tracking within that frame. ... or something like that. I'd really have to sit down and think about these things to provide a truly good, accurate explanation. There are a number of circular references that likely cause heartburn for the GC. I wonder if At one point we had made significantly more effort to clean up object references with destructors in v0.6.2, but that was in relation to a memory leak. FYI, I've noticed a significant performance boost moving to PHP 7.x and encourage you to try setting up a system to test. Or use a project like phpbrew, which is how I manage testing across PHP variants. |
I've found memprof is a really nice tool for profiling scripts for memory leaks. I've tested @MarkMaldaba script on PHP7.2.10. You HAVE to call And a small nitpick: you forgot to Using both small additions I get these:
As you can see the GC of PHP7.2 made a huge leap forward and correctly collects (almost) all formerly claimed memory of DomPDF. |
Hi, I just ran the tests on PHP 7.2.10 on an Ubuntu Server using Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
I'm not sure why sometimes my memory usage is higher than yours, PHP7 should bring some memory savings (10-20%). Maybe it's because I'm using 64bit, and you 32bit PHP/Operating-System?
@MatthiasKuehneEllerhold That looks interesting! I trusted some comments on php.net, thought that memory_get_usage() triggers gc_collect_cycles() internally. But you proved that to be wrong, I should not trust others and check myself :-( I just did a very quick test: Disable GC at the start of the render() method, and enabling it at the end:
before:
after:
The memory consumption did not increase, but the speed is 2.5 times better (2.7 seconds instead of 7 seconds)! @bsweeney Maybe you can tell where it makes sense to do this (disable gc and re-enable later) in the code, it should be tested everywhere where hard work is done on lots of objects (loadHtml() maybe?). |
@PHPGangsta Did you add a Composer did something similar see https://news.ycombinator.com/item?id=8686934 . |
@MatthiasKuehneEllerhold No, I did not add it somewhere else, just did a very quick test of disabling and re-enabling GC inside render(). I trust you when you say that you have to call it manually before memory_get_usage() I wrote about the Composer Installer 5 comments above, that they did exactly the same: disable_gc before some hard work on lots of objects, and re-enabling it afterwards. See my comment above, and the 2 links inside to tideways and blackfire: |
Oh sorry, my bad. |
Hmmm... I can report a similar saving (better, even -> from ~82s -> ~15s) by adding the gc_collect() lines in, when running the The savings appear to be less good when running the I stand corrected! it appears that there may be some good performance wins to be got, by playing with PHP's garbage collection! Is there any reason not to add the gc_enable/disable lines suggested by @PHPGangsta into the main codebase? We probably need to do a bit more testing on real-world renderings, I suppose, but from what I can see so far, the speed improvement is substantial whilst the memory use is not much different, which makes it a clear win, in my book. The reason I started this investigation is due to the fact that a relatively long, but relatively simple HTML file, with very light mark-up (no tables, images, floats, etc.) and very little tag nesting, is taking about 8 hours to render! I will re-test with the gc_enable/disable lines in place, to see what difference it makes. If it matches what we've seen above it will be a massive improvement, but would still be taking about 1.5 hours... in which case, this won't be the end of the optimisation story. In response to @MatthiasKuehneEllerhold:
No I didn't - that is deliberate. For testing memory leaks, we want to compare |
Update: I've re-run my problem PDF, with the gc_enable/disable lines in place, and execution time has dropped from 8 hours, to 55 minutes! Awesome improvement - we will certainly be deploying this fix. |
FYI: I've just spotted the following on the PHP.net documentation for garbage collection:
Therefore, this strategy isn't wholly risk free, plus we should probably add a call to |
@MarkMaldaba Good catch! I just thought about something else: What happens if gc is already disabled by the user, outside of dompdf? Then we would enable it at the end of render(), which might not be what the user expects... I guess we have to check the status of the gc in the beginning, only disable it if it is not already disabled, and only re-enable it at the end if it was enabled at the beginning of the funktion. Like this:
I think we agree that we should implement it. Question is: Where in the code, in which functions? Just in render()? And maybe later in other functions? |
Is this something which a Bachelor student at our college could investigate for his Bachelor thesis? Kind regards |
I just ran the performance test "break_tag" with PHP 7.3.0 RC4 without disabling the GC:
As you can see, PHP 7.3 fixed the GC-performance-hit, it is even faster than PHP 7.2 with disabled GC! Summary:
If DomPDF disables the GC in the render() function, it improves performance of PHP <= 7.2. For 7.3 it's not needed anymore to disable the GC. @MarkMaldaba Could you please test your big document that takes hours with PHP 7.3? It should be faster than 55 minutes with 7.3. |
@eothein it certainly seems to be the case that this might be a good area for investigation. I don't know the release of PHP 7.3 changes your thoughts on that since it appears that GC has significantly improved. Your student could independently study pre-7.3 GC methods and analyze for efficiency in a white box environment. Or perform a comparison between the methods used in the various PHP versions as compared with other research on the topic. PHP is certainly not the only managed memory programming environment. |
@PHPGangsta this is great news. I've been pretty happy with where PHP has been moving in its 7.x releases (though tbh I'm not able to keep up with the latest developments since my day-job is now in other environments). |
I have written a test script to check performance of various constructs, when rendered by DomPDF.
I have committed it to my local fork of DomPDF: MarkMaldaba@95b59f2
I don't know if this is something you would want to merge into the main repository - if so, let me know and I will create a PR. I'll be happy to make changes, e.g. to coding style, if guidance is provided.
The script takes a simple HTML snippet (e.g.
<div></div>
) and creates an HTML file whose body contains many instances of this snippet (10,000 in my tests). It then renders it, outputting timing and memory usage information.It has thrown up some interesting and (to me, at least) surprising results:
So,
<span>
tags are 5 times faster than<div>
tags, which are nearly 5 times faster than<br>
tags!Initial take-home seems to be to avoid
<br>
tags, but some more research needed. I suggest this ticket be used for general discussion around this issue, with any specific actionable items being spun off as separate tickets.(For reference, the tests were run using the current 'develop' branch using PHP 5.4.45 on Windows.)
The text was updated successfully, but these errors were encountered: