New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with the $getPrevious callable #21

Open
moufmouf opened this Issue Apr 18, 2016 · 15 comments

Comments

Projects
None yet
6 participants
@moufmouf
Contributor

moufmouf commented Apr 18, 2016

Hey guys,

After working on adapters for Yaco and Symfony, I've come to wonder whether the decision we took when we replaced the $previous argument by a callable $getPrevious is the good one. For the record, the issue was discussed in #9 and #10.

A bit of history: the issue we were trying to solve was that in case of "overriding" a service, the $previous argument had to be resolved by the container, even if it wasn't used. This is obviously suboptimal. To solve this, we decided to transform the $previous argument into a $getPrevious callable that would resolve to $previous. This way, the service could be resolved "on demand" by the service provider. Problem solved...

... but in the meantime, this change caused some severe changes in the way services are extended. We did this to gain some performance in the case of overridden services but we traded that versus some performance loss on extending services (now, we have to create a closure, and then execute it each time we extend a service).

Here is an imaginary sample of container code generated by Yaco (with one service provider providing a Twig_Environment service and 2 service providers adding extensions):

...
'twig_environemnt' => function ($container) {
    return TwigExtension2ServiceProvider::addExtension($container, 
        function () use ($container) {
            return TwigExtensionServiceProvider::addExtension($container,
                function () use ($container) {
                    return TwigServiceProvider::createService($container);
                }
            );
        }
    );
},
...

And here is what it looks like if we directly pass the $previous instead of a callable:

...
'twig_environemnt' => function ($container) {
    return TwigExtension2ServiceProvider::addExtension($container, 
        return TwigExtensionServiceProvider::addExtension($container,
            return TwigServiceProvider::createService($container);
        );
    );
},
...

In Symfony, it gets even worse. With the $getPrevious callables, I cannot use the native "extension" mechanism of Symfony (that enables inlining most of the code). Therefore, I end up making a big number of services (one service that wraps the entry in a callable, one that extends this wrapped service, etc...). In order to extend a service, I have to call 5 or 6 methods where one single call should be enough.

Let's be clear, I haven't made proper performance tests so far. But it seems to me pretty sure that the $getPrevious callable enables us to gain some performance on overriden services, but also reduces performance on extended services. If I think about use cases, I'm pretty sure that in a typical application, the number of extended services will be greater than the number of overridden services.

So I've been thinking about alternatives to the $getPrevious callable.

Alternative 1

One obvious alternative was identified by @mnapoli in #9: we could add another method to the ServiceProvider interface that handles specifically the extension of services.

class MyServiceProvider implements ServiceProvider
{
    public function declareServices()
    {
        return [
            'my_service' => 'MyServiceProvider::createMyService',
        ];
    }

    public static function createMyService(ContainerInterface $container)
    {
        ...
    }

    public function extendServices()
    {
        return [
            'event_dispatcher' => 'MyServiceProvider::extendEventDispatcher',
        ];
    }

    public static function extendEventDispatcher(ContainerInterface $container, EventDispatcher $previous)
    {
        $previous->registerListener(new MyListener());
        return $previous;
    }
}

What I like in this solution:

I like the fact it is very expressive.

It is obvious from the method names that service providers can not only provide definitions, but also extend definitions.
We can also type-hint on the $previous parameter to get errors if this is not an entry from the type we expect.

Alternative 2

Another alternative is to simply revert back to the $previous signature (instead of callable $getPrevious), and rely on implementations being smart enough to optimize and adopt the correct behaviour. Here is a sample:

class MyServiceProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            "service" => function(Container $c) {
                return MyService($c->get('my_dependency');
            }
        ];
    }
}

It is obvious from the factory signature that this service will not extend another service. Containers could therefore analyze the factory signature and decide to provide $previous entry to the factory based on the number of arguments expected. It is as simple as analyzing the number of parameters on the ReflectionMethod object representing the closure. Of course, this kind of access to the ReflectionMethod in a compiled container (although runtime containers can also do it with some level of caching).

If we really want to go deeper in code analysis, we can even do some static code analyzing (for instance using Roave/BetterReflection):

"service" => function(Container $c, $previous) {
    return MyService($c->get('my_dependency');
}

In the code above, some static analysis (or even a simple regex) can tell us that the $previous variable is not used in the method body. Hence, no extension.

What I like in this solution:

Well, it is really the most simple solution.

We can also type-hint on the $previous parameter.

What I don't like in this solution:

It is easy to optimize for compiled/cached containers, but for more "naive" containers, optimization is harder. Now, the container is "naive", so does it really need this kind of optimization?

Conclusion

I'd be interested in knowing what you think. The more I think about it, the more I'd like to get rid of the $getPrevious callable. We did that as an optimisation that ends up preventing us from optimising extended services.

I'm ok with both alternatives I mentioned above. Any idea of another alternative we did not think about? Or would you still prefer to keep the $getPrevious callable? If you, why?

@mnapoli

This comment has been minimized.

Member

mnapoli commented Apr 18, 2016

Your point makes sense to me. I have a preference with the second solution since it's simpler, but I also fear it might imply too much magic… Also it wouldn't work with service providers using, for example, func_get_args(). I don't see why one would use that though.

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Apr 18, 2016

Yes, I was thinking about that. A really solid implementation should:

  • parse the function body and look if func_get_args is used anywhere
  • check that no variadics are used
  • if the $previous argument is passed to the function, check that the $previous argument is indeed used (i.e. appears somewhere in the method body) or that no $$xxx is used (otherwise, the parameter might be accessed by name).

Also, there is still a way to access the arguments of the function using debug-backtrace, but that becomes a bit crazy :)

So indeed, solution 2 implies some kind of magic to be properly implemented (or at least enough magic to be reserved to compiled/cached containers)

So far, I don't really have a proper preference between the 2 solutions yet. Shall we ping more people to get some feedback?

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Apr 18, 2016

Ping @Giuseppe-Mazzapica @stof @ziege @dragoonis @mindplay-dk @xhuberty @webmozart

Guys, you have been involved in container-interop/service-provider at some point. Any preference/comment regarding this issue?

@gmazzap

This comment has been minimized.

gmazzap commented Apr 18, 2016

I have no strong opinion on this, The second alternative souds better to me, but is a very personal point of view, because I always try to treat services resolved in the container as read-only, avoiding any "extending".

When I need something to change, I resolve a dedicated factory in the container, using that to return the service.

But I'm aware that it is not a valid an argument that can be used to build a standard. So, I'll leave this to you :)

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Apr 19, 2016

For the record, I've coded a simple class that tries to detect if the $previous argument is used or not.

https://github.com/thecodingmachine/service-provider-utils

I've put it in its own separate package to be usable by anyone easily.
Also, @mnapoli : it uses CallableReflection from PHP-DI/Invoker. Very useful class :)

Feedback is welcome.

@webmozart

This comment has been minimized.

webmozart commented Apr 19, 2016

I also prefer the second solution. This solution leave the option to optimize calls, but doesn't force the container to do so (they could also simply resolve the previous value all the time to keep it simple).

@weierophinney

This comment has been minimized.

weierophinney commented Apr 19, 2016

There is a performance optimization to having $getPrevious callable: in cases where you may decide to return an alternate instance based on inspecting other services.

As an example, let's consider a repository service. Perhaps, for testing, we can return a test spy, and the way we know we're in testing is based on a configuration value, which is available in a "config" service. In that particular case, we'll return our test spy, but otherwise, we can return the original service.

function ($container, callable $getPrevious = null)
{
    if (! $container->has('config')) {
        return $getPrevious();
    }

    $config = $container->get('config');
    if (! isset($config['testing_environment']) || true !== $config['testing_environment']) {
        return $getPrevious();
    }

    return new TestingSpy();
}

Contrived, yes, but here's the thing: if the conditions are correct, we will never want to create the original instance. As such, passing a callable that will generate the original instance saves cycles — which might include removing the need for a database connection, or a cache service, etc.

I'd argue keep it callable, and still recommend that compiling containers check signatures to determine what the expected behavior of a given factory might be. This is compatible with your second alternative; factories can choose to define the argument or not, and static analysis tools can check to see if it is called within the factory itself. The only difference is that the callable is passed, vs the generated instance.

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Apr 20, 2016

@weierophinney Oh, indeed, I did not think about the use case where $getPrevious argument is actually used in the function body but ignored.

This definitely needs a bit more thinking than I initially thought.

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented May 31, 2016

Hey @weierophinney, hey all!

I've been thinking for quite some time about Matthew's comment (about keeping the callable $getPrevious signature).

@weierophinney: After some thoughts, I'm not sure your argument is strong enough to justify keeping the callable $getPrevious signature. @mindplay-dk made a good point in #28 that the current proposed signature is dangerous at best.

Regarding your example, I perfectly understand that in a test scenario, we might not want to create the full object. You say:

As such, passing a callable that will generate the original instance saves cycles — which might include removing the need for a database connection, or a cache service, etc.

This implies that creating the original object might be long because the object might need another service that connects to a database, etc... But in practice, I've never seen a service that is slow to instantiate because it does connect to a database on instantiation. Most libraries do this lazily (the __construct simply stores connection credentials and the real connection is performed when needed).

For instance:

What I mean is that even in the worst case scenario, creating a "real" service cannot take an order of magnitude more time than a "test" service. If the "real" service is created for nothing and not used, this is not going to multiply the time of your tests by 10. And even if it does, it probably means your code needs optimization.

Furthermore, if you really want to create a TestingSpy using a service provider, I'd advocate to put it in its own service provider and to load that service provider from your tests.

Something like:

class TestingSpyServiceProvider implements ServiceLocator {
    public function getServices() {
        'MyService::class' => function() {
            return new TestingSpy();
        }
    }
}

In your code:

if ($container->has('debug')) {
    // By looking at the signature of the factory (no 2nd argument), "clever" containers can completely bypass instantiation of the "real" service.
    $container->register(TestingSpyServiceProvider::class);
}

Conclusion

I'd really like to push one of the 2 alternatives I'm proposing again (see the first post in this issue)

I have no personal preference on one of those 2 alternatives (as long as we choose one, or something else that is better than the nullable callable we currently have)

To sum up expressed voices, we have:

  • @mnapoli : slight preference for "Alternative 2"
  • @webmozart : preference for "Alternative 2"
  • @Giuseppe-Mazzapica : slight preference for "Alternative 2"
  • @mindplay-dk : preference for "Alternative 1" (if I understand correctly, see #28 )
  • @weierophinney : preference to keep things as they are

Guys, before taking a decision, could you have a look at @mindplay-dk 's issue here (#28) and tell me what you think?
Also, I'd be interested in getting @mindplay-dk opinion about Alternative 1.

I'm sure we can find the best solution by discussing this a bit more!

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Jun 6, 2016

Hey, we just received a comment from @skalpa here. The comment makes a lot of sense, but was opened on a closed issue.

I'm reposting @skalpa message below.

My first contribution here, so before everything, hello and thanks for your work (and please accept my apologies for the length of this post).

I think the solution 2 that you adopted is too limited and doesn't allow some simple use cases like conditional extensions

To give you an example:

  • You have a Twig provider that registers a twig service
class TwigProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'twig' => function () {
                return new \Twig_Environment();
            }
        ];
    }
}
  • You have a Quote provider that registers its own services and wants to automatically add an extension to Twig. Now this provider should be usable by itself and does not require the Twig provider/service. With the actual solution I would have to do this:
class QuoteProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'my_quote_service' => function () {
                return new Something();
            },
            'twig' => function (ContainerInterface $container, callable $getPrevious = null) {
                $twig = $getPrevious();
                $twig->addExtension(new QuoteTwigExtension($container->get('my_quote_service')));
                return $twig;
            }
        ];
    }
}

Here are the problems I see with this:

  • As a user of those two providers, I will have to know that for this to work, TwigProvider needs to be registered before QuoteProvider. This can quickly become a nightmare. I believe it shouldn't be the users responsibility to understand the dependencies between all the providers they use.
  • We can't determine if the twig service exists anymore :-( If I use the QuoteProvider but not the TwigProvider any attempt to call $container->has('twig') will return true, which is wrong.

The only way to implement this correctly with the current solution would be to split the QuoteProvider into two separate providers:

  • One QuoteProvider that providers the quote service
  • One QuoteTwigExtensionProvider that extends Twig

Again, this far from optimal. That will basically mean splitting code into dozens of optional extensions, and asking the users to manage them manually.

On the other side, a slightly modified version of the solution 1 would solve all this:

interface ServiceProvider
{
    public function getServices();
    public function getServicesExtensions(ContainerInterface $container);
}

Which in return will allow this kind of provider:

class QuoteProvider implements ServiceProvider
{
    public function getServices()
    {
        return [
            'my_quote_service' => function () {
                return new Something();
            }
        ];
    }
    public function getServicesExtensions(ContainerInterface $container)
    {
        $extensions = [];
        if ($container->has('twig')) {
            $extensions['twig'] = [$this, 'addTwigExtension'];
        }
        return $extensions;
    }
    public function addTwigExtension(ContainerInterface $container, callable $getPrevious = null) {
            $twig = $getPrevious();
            $twig->addExtension(new QuoteTwigExtension($container->get('my_quote_service')));
            return $twig;
    }
}

If the standard specifies that all the services MUST be registered by the containers before the services extensions, i.e:

// In the container
$this->register($provider1->getServices());
$this->register($provider2->getServices());
$this->register($provider3->getServices());

$this->register($provider1->getServicesExtensions());
$this->register($provider2->getServicesExtensions());
$this->register($provider3->getServicesExtensions());

Then users don't even have to worry about the providers registration order.

This is basically the approach adopted by Silex/Pimple, where you have two available methods: register() and boot().

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Jun 6, 2016

@skalpa, a lot of this makes sense.

You are pinpointing 2 very interesting facts:

  • a service might want to extend another service "if it exists"
  • with the current implementation, service providers should be inserted in the right order (we always assumed that but it would be great if this wasn't an issue)

I like a lot your suggestion:

interface ServiceProvider
{
    public function getServices();
    public function getServicesExtensions(ContainerInterface $container);
}

However, implementation might be a bit tricky, especially for compiled containers.

Indeed, with compiled container, the container does not exist yet when the service providers getServicesXXX methods are called. It is compiled at the end. Therefore, it might be tricky to provide a $container object at this point of time.

This might not be impossible though, I'll try to see how this works out with Symfony.

@mnapoli

This comment has been minimized.

Member

mnapoli commented Jun 7, 2016

Considering the problem with "alternative 2":

where $getPrevious argument is actually used in the function body but ignored

I'm tending towards +1 on separating service declaration and extensions.

However @skalpa while I like your suggestion of this signature: function getServicesExtensions(ContainerInterface $container) I'm afraid it will make it might be hard to implement in all containers (because it requires having the container built and ready to use while it's still being constructed). We can give it a try.

I do understand though that sometimes we want to register extensions only if the entry actually exists, and sometimes we want to override/extend the previous entry even if it doesn't exist. So we can't assume one or the other behavior.

@skalpa

This comment has been minimized.

skalpa commented Jun 8, 2016

[EDIT] This thread was about the signature of extensions, so I'll open a different issue to discuss the getServicesExtensions signature

@moufmouf

This comment has been minimized.

Contributor

moufmouf commented Jun 10, 2016

For the record, I continued discussion about @skalpa idea here: #27 (comment)

@webmozart

This comment has been minimized.

webmozart commented Jun 23, 2016

IMO we're not approaching this problem from the right angle. If I write a Twig extension in some service provider (as example), I don't want to need to know what is necessary to initialize that extension (i.e. the imperative approach). I just want to tell the container that "this service is a Twig extension" (optionally with some additional parameters), now do whatever you want (the declarative approach).

This declarative approach is one of the big strengths of tags in Symfony.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment