Autoloading speedup #1529

Closed
dlsniper opened this Issue Jan 26, 2013 · 28 comments

Projects

None yet

6 participants

@dlsniper

Hello,

I've found this article and I'd thought I should open an issue here and have it discussed, see if it could help.

@Seldaek
Member
Seldaek commented Feb 14, 2013

It's hard to do because a single namespace prefix can be mapped to two dirs. In that case the symlinks wouldn't work. Then there is the added complexity of symlink management and all.. It's not impossible, but it'll most likely introduce more bugs than it'll speed things up. Unless it's proven to be faster than the current optimized mode, I don't think it's worth adding to composer.

@Seldaek Seldaek closed this Feb 14, 2013
@igorw
Contributor
igorw commented Feb 14, 2013

Would it be possible to make one symlink per file while keeping the performance characteristics? It seems quite easy to do.

ping @patrickallaert

@stof
Contributor
stof commented Feb 14, 2013

@igorw but you would then have to recreate the symlinks each time you create or remove a file. Not better than classmap generation (and more complex)

@igorw
Contributor
igorw commented Feb 14, 2013

@stof if it's more performant then it is better.

@patrickallaert

@Seldaek It's not that difficult and this is how we are able to speedup things a lot (benchmarks to come on my blog page) for eZ Publish.
Even if you have multiple directories for one NS.

Example:
A\B* => /directory0/B
A\B\C => /directory1/B/C
A\B\D => /directory2/B/D

$ mkdir -p /psr-0/A/B
$ cd /psr-0/A/B
$ ln -s /directory0/B/* .
$ ln -s /directory1/B/C
$ ln -s /directory2/B/D

If some sub-NS exists in another directory (for example: A\B\C\X\Y in /directoryN/...) then it means that "C" must be a directory instead of a symlink, and all sub-NS must be linked independently.

@patrickallaert

@stof Have you already profiled (xhprof?) an application that is using a classmap of several MiB?
Remember that APC or Zend Optimizer+ will only help partially because the variables still need to be reassigned inside ZVAL structures on every requests, even if bytecode is cached.

@dlsniper

As a addition to what @patrickallaert said above, you can check the discussion here: symfony/symfony-standard#437 and view some performance benchmarks for Symfony2 here.

While classmaps might be good for a small number of classes, the more you have in your project the more time will take the autoloader to resolve/load them individually rather that having a bunch of them loaded into memory from start.

Obviously, this approach has it's own disadvantages as it will introduce a large file that needs to be parsed at once and not all the classes in there might be used for all requests/console applications but until someone more people contribute to these kind of benchmarks all is purely theoretical imho.

@patrickallaert

@dlsniper Will the technique I proposed be added to the benchmark document?

@Seldaek
Member
Seldaek commented Feb 14, 2013

@patrickallaert you can't have one symlink pointing to two files. What if A\B points to directory0/B OR directory1/B? You can symlink all files as @igorw said, but I don't know how slow that is.

With your technique you still do a file_exists check too, which the classmap (--optimize-autoloader composer flag) does not do. I realize those are cached by php, but I would like to see benchmarks against the optimized composer autoloader for various amounts of classes etc. I am really not against changing it, I would be very happy to make this the fastest solution so nobody has to think about this ever again, but I want solid evidence before changing anything, because as far as my research went the classmap approach we have was the fastest.

@Seldaek
Member
Seldaek commented Feb 14, 2013

@dlsniper I'm not sure I got your comment. You seem to say classmaps are bad and good at the same time. From your benchmark document (maybe I misinterpret the numbers?) it seems like the composer optimized loader (classmap) is faster than the ApcClassLoader?

@patrickallaert

@Seldaek I fully understand, I will put this in my TODO list. Classmap has been the fastest option for us for years but we:

  1. didn't had PSR-0 classes or even a computational way to reach the file containing a class given its name.
  2. didn't had so many classes before.

This is why xhprof now showed us ~10% of time spent in including the classmap file (with APC).

I will ping you once the benckmarks are online.

@Seldaek
Member
Seldaek commented Feb 14, 2013

OK great, I'll reopen this so it's easier to find it again. Worst case we'll add a -o2 that does static analysis to find the most used classes, put those on the classmap, and rely on PSR0 for the others, with -o3 to symlink all the files on top of it to speed up the PSR0 part. Yay.

@Seldaek Seldaek reopened this Feb 14, 2013
@Seldaek
Member
Seldaek commented Mar 8, 2013

Thanks, I answered on your blog, but to keep this thread up to date:

Very interesting results, but I think that a few things are missing to properly test the classmap vs PSR-0 approach in Composer:

  • What about testing a real page view (not pre-cached html)? The current runs you showed don't load many classes so they probably don't offset the cost of loading the classmap.
  • Could you do the test on 5.5 with O+ enabled? Curious to see if it handles loading the huge classmap better than APC (who knows).
@dlsniper
dlsniper commented Mar 8, 2013

@patrickallaert inspired by you I've did some tests using the gist here: https://gist.github.com/dlsniper/5120578
From my tests (only APC enabled, no xdebug, no nothing else), fetching the array from APC was faster by 50% each time (and it would make sense to be like that imho).

Could you please conform the above results on our end as well?

Thanks.

@dlsniper
dlsniper commented Mar 8, 2013

Note: the above statement would be true only from a certain number of elements in the array up, which I'm willing to bet it will vary from configuration to configuration. For small enough values in the array, the results differences would tend to be very close to each other.

@patrickallaert

@dlsniper No that is not faster if the classmap would be created in an optimal way[1]. Please, note the comments I added on my blog post with new benchmarks (line: "Registering the whole classmap in APC as suggested by Anonymous").

[1] One possible optimization of the current classmap is to not prepend every entries with $baseDir, but rather saving the whole file with every entries already with full path, as this requires extra CPU (concat operation) at every requests to, note that APC is storing the bytecode of that file, not the result of executing that classmap PHP file.

@patrickallaert

@Seldaek I answered you with new data on my blog post, copying the results here:

Using fresh content (non cached HTML):

With APC[1]:
stock: 30.94 reqs/sec (avg: 32.320 ms)
removing loadClassCache(): 27.53 reqs/sec (avg: 36.324 ms)
(restoring loadClassCache() as it seems interesting for non cached pages)
activating ApcClassLoader: 35.52 reqs/sec (avg: 28.152 ms)
Using Composer --optimize: 32.26 reqs/sec (avg: 31.000 ms)
+ registering the whole classmap in APC as suggested by Anonymous: 30.95 reqs/sec (avg: 32.307 ms)
Using PSR-0 tree: 35.50 reqs/sec (avg: 28.166 ms)

ApcClassLoader seems a very little faster than the PSR-0 tree but I would say they would be about the same

With Zend Optimizer+[2]:
stock: 29.21 reqs/sec (avg: 34.238 ms)
removing loadClassCache(): 27.75 reqs/sec (avg: 36.042 ms)
(restoring loadClassCache() as it seems interesting for non cached pages)
Using Composer --optimize: 30.67 reqs/sec (avg: 32.608 ms)
Using PSR-0 tree: 32.69 reqs/sec (avg: 30.590 ms)

[1] APC settings:
apc.max_file_size=5M
apc.shm_segments=4

[2] ZO+ settings:
zend_optimizerplus.memory_consumption=256
zend_optimizerplus.interned_strings_buffer=16
zend_optimizerplus.max_accelerated_files=4000
zend_optimizerplus.revalidate_freq=0
zend_optimizerplus.fast_shutdown=1
zend_optimizerplus.enable_cli=1
zend_optimizerplus.save_comments=0

@Seldaek
Member
Seldaek commented Mar 11, 2013

Thanks for the update. So it seems that your PSR0 tree and APC are two
viable options. However they both have problems:

  • PSR0 tree is not very portable across machines (windows symlinks are
    meh, and moving symlinks around can sometimes cause issues depending on
    programs)
  • APC is obviously not deployed everywhere, and the introduction of ZO+
    will probably not help.

That said, --optimize is still faster than doing nothing; So I would
suggest we keep this as the default since it's portable, and then
introduce new --optimize=apc and --optimize=symlinks options for
people to use when it makes sense.

@fprochazka
Contributor

I really see no point in optimizing this for windows - in windows it should just work, and on linux it should be blazing fast. I know no sane person, that would host on windows.

@stof
Contributor
stof commented Mar 11, 2013

@beberlei will be glad to learn he is not sane when deploying on Azure...

@Seldaek
Member
Seldaek commented Mar 11, 2013

@HosipLan it's not about optimizing for windows, it's about having something that works cross-platform. For example if someone installs (on windows or not to be honest) and then uses rsync or sftp or whatever to transfer the files to a server, things work out of the box right now. If you start relying on APC or symlinks things might break. So I am happy to allow people to do this, but it won't be the default behavior. People doing it right will have a build/deploy script anyway, so for them the cost of adding --optimize=foo is near zero. But making it harder for newcomers or people without proper infrastructure in place serves no purpose.

@fprochazka
Contributor

Well, if you use symlinks and then copy stuff using rsync (or ftp(s)) there is basically zero chance it will not break :)

@Seldaek
Member
Seldaek commented Mar 11, 2013

With rsync -l and relative symlinks it could work, but again maybe not on every platform.

@dlsniper

@Seldaek give the fact that the concatenation is again surfacing (after I've signaled it here first: #1585 ) would it be cool if one where to add a --autoload-no-concat option?

For example, we have the same folder structure on all the machines as it allows us to keep things under a tight control and I don't see how this could affect someone who's not passing that option to Composer.

What do you think?

@Seldaek
Member
Seldaek commented Mar 11, 2013

Yes that's another optional mode we could have to speed things up a bit.

@patrickallaert

Is there any valid reason to concat? In my opinion, there is none (that would have been different if the prefix would have been set in another file). Because of that, I'm -1 on another option while the existing one can just be optimized for anyone.

Back to this current issue, there is probably cases where it doesn't fit, but it has never been suggested to be the default or anything close to that. I am only asking for a PSR-0 tree support by Composer and think it would make sense to be based on symbolic links but I'm open to alternatives of course.

If I would have time I would propose a Pull Request, but unfortunately I can't afford taking time for this right now. Any volunteer out there?

@stof
Contributor
stof commented Mar 12, 2013

@patrickallaert See my answer in #1585 (comment) giving 2 reasons

@dlsniper dlsniper closed this Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment