Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce background workers #5405

Merged
merged 20 commits into from Jan 9, 2023
Merged

Introduce background workers #5405

merged 20 commits into from Jan 9, 2023

Conversation

Toflar
Copy link
Member

@Toflar Toflar commented Oct 18, 2022

This introduces real background workers based on cronjobs and Symfony Messenger.

Motivation

Modern applications often require background workers in order to move more heavy processes in the background. We already have things in place that should better be done asynchronously in order to speed up the front end:

  • Sending e-mails via Symfony Mailer
  • Search indexing

This PR also opens up the opportunity for extension developers to e.g. put other heavy workloads in the background. Here are some ideas:

  • Preparing huge ZIP files (combining lots of files in the BE anyone?)
  • Creating complex reports
  • Sending lots of things to a slow external API
  • etc.

It's also relevant for us, if we want to work on a more complex search index because it might (we don't know yet) increase the time it takes to index a page for example.

Concept

The concept builds on top of Symfony Messenger. This means you can technically route your messages to RabbitMQ or any other transport. Contao ships with the Doctrine based transport by default which means there are 0 additional requirements.
In order to make this work, however, we need real Cronjobs. Which we don't want to make a system requirement because for simple/small sites, we probably don't need real cronjobs. Also, we don't need them for simple development/testing instances etc.

Here's how it works from a developer's perspective (default Contao Managed Edition):

  • We have three priorities: High, Medium, Low. As a developer of an extension, you don't need to do anything at all in terms of configuration. All you need to do is prepare your message, implement one of the priority interfaces (LowPriorityMessageInterface, NormalPriorityMessageInterface and HighPriorityMessageInterface) and pass it to the message bus (MessageBusInterface). Then you're good to go and build your message handler. Contao takes care of the rest.
  • Taking care of the rest means it automatically detects if there is a worker running for the specific queue. If so, messages are dispatched to the configured transport (Doctrine by default). If not, messages are processed synchronously immediately.
  • In terms of the SearchIndexListener, for example, this means that nothing changes at all if you do not have any worker running. Contao detects that there's no worker and indexes synchronously on every request like it does today. If, however, you configured a real worker, things will automatically happen asynchronously in the back end, making your front end considerably faster and freeing up your FPM childs etc.

Here's how it works from a Contao user's perspective (default Contao Managed Edition):

  • Running workers and having the benefit of asynchronous processes is as easy as configuring, a real, minutely cronjob calling our cronjob framework (contao:cron) as documented. Done, Contao does the rest - magic! 🥳
  • Even better: Contao even does autoscaling! When there are more messages to be processed on the queue, more workers are started automatically. But this too is managed by Contao completely 😎

Technical details and remarks:

  • Workers are messenger:consume commands and started via our new async cronjobs support.
  • The implementation is completely independent from our cron job framework. The default configuration for the Managed Edition looks like this and I have some remarks to it:
framework:
    messenger:
        transports:
            sync: sync://
            contao_auto_fallback_prio_high: contao_auto_fallback://contao_prio_high?fallback=sync
            contao_auto_fallback_prio_normal: contao_auto_fallback://contao_prio_normal?fallback=sync
            contao_auto_fallback_prio_low: contao_auto_fallback://contao_prio_low?fallback=sync
            contao_prio_high: doctrine://default?table_name=tl_message_queue&queue_name=prio_high&auto_setup=false
            contao_prio_normal: doctrine://default?table_name=tl_message_queue&queue_name=prio_normal&auto_setup=false
            contao_prio_low: doctrine://default?table_name=tl_message_queue&queue_name=prio_low&auto_setup=false
        routing:
            'Symfony\Component\Mailer\Messenger\SendEmailMessage': contao_auto_fallback_prio_low
            'Contao\CoreBundle\Messenger\Message\HighPrioMessageInterface': contao_auto_fallback_prio_high
            'Contao\CoreBundle\Messenger\Message\NormalPrioMessageInterface': contao_auto_fallback_prio_normal
            'Contao\CoreBundle\Messenger\Message\LowPrioMessageInterface': contao_auto_fallback_prio_low

# Contao configuration
contao:
    messenger:
        console_path: '%kernel.project_dir%/vendor/bin/contao-console'
        workers:
            -
                transports:
                    - contao_prio_high
                options:
                    - --time-limit=60
                    - --sleep=5
                autoscale:
                    desired_size: 5
                    max: 10
            -
                transports:
                    - contao_prio_normal
                options:
                    - --time-limit=60
                    - --sleep=10
                autoscale:
                    desired_size: 10
                    max: 10
            -
                transports:
                    - contao_prio_low
                options:
                    - --time-limit=60
                    - --sleep=20
                autoscale:
                    desired_size: 20
                    max: 10

As you can see, we configure the routing for the Symfony Messenger in such a way that, our interfaces map to one of our three queues (contao_prio_low, contao_prio_normal and contao_prio_high). This has a major advantage: As a developer, you don't need to register any config. You can just implement the desired interface and that's it. So you can have

class FoobarMessage implements NormalPriorityMessageInterface
{
}

and

$this->messageBus->dispatch(new FoobarMessage()).

and that's it. Contao automatically does the rest for you and ensures that if there is no worker running, the sync transport will be used. So your message is processed - either in the optimized way when workers are running - or not.

Using the interface is a very nice defaul. It provides the mentioned DX and it still allows manual configuration adjustment if needed. So you could addjust your framework.messenger.routing config like so, if you like:

framework:
    messenger:
        routing:
            'App\Messenger\FoobarMessage': my_other_queue

The concrete class name always wins over the interface implemented.

Even the transport logic is completely independent from Contao. The automatic fallback to sync is implemented by a custom contao_auto_fallback which you are free to use. So instead of using the Contao worker framework and auto detection magic, you can disable it all and configure Contao the way you like. Or you can combine as you like:

# This would map all messages implementing our default interfaces to the same
# RabbitMQ queue and our workers are disabled. So you are responsible to run
# `messenger:consume rabbit_mq ...` yourself using supervisord or whatever you prefer.

framework:
    messenger:
        transports:
            rabbit_mq: amqp://guest:guest@localhost:5672/%2f/messages
        routing:
            'Contao\CoreBundle\Messenger\Message\HighPriorityMessageInterface': rabbit_mq
            'Contao\CoreBundle\Messenger\Message\NormalPriorityMessageInterface': rabbit_mq
            'Contao\CoreBundle\Messenger\Message\LowPriorityMessageInterface': rabbit_mq

contao:
    messenger:
        workers: []
# This would map all messages implementing our default interfaces to the same RabbitMQ
# queue if it is running. If Contao does not detect any worker running, it will fallback to 
# the sync transport. Again you are responsible to run `messenger:consume rabbit_mq ...`
# yourself using supervisord or whatever you prefer but in this case, if they do not run,
# Contao falls back to sync.

framework:
    messenger:
        transports:
            sync: sync://
            contao_auto_fallback_rabbit_mq: contao_auto_fallback://rabbit_mq?fallback=sync
            rabbit_mq: amqp://guest:guest@localhost:5672/%2f/messages
        routing:
            'Contao\CoreBundle\Messenger\Message\HighPriorityMessageInterface': contao_auto_fallback_rabbit_mq
            'Contao\CoreBundle\Messenger\Message\NormalPriorityMessageInterface': contao_auto_fallback_rabbit_mq
            'Contao\CoreBundle\Messenger\Message\LowPriorityMessageInterface': contao_auto_fallback_rabbit_mq

contao:
    messenger:
        workers: []

Maximum flexibility! 😎

To finish, let's look at our default workers configuration in the Managed Edition:

contao:
    messenger:
        console_path: '%kernel.project_dir%/vendor/bin/contao-console'
        workers:
            -
                transports:
                    - contao_prio_high
                options:
                    - --time-limit=60
                    - --sleep=5
                autoscale:
                    desired_size: 5
                    max: 10
            -
                transports:
                    - contao_prio_normal
                options:
                    - --time-limit=60
                    - --sleep=10
                autoscale:
                    desired_size: 10
                    max: 10
            -
                transports:
                    - contao_prio_low
                options:
                    - --time-limit=60
                    - --sleep=20
                autoscale:
                    desired_size: 20
                    max: 10

As you can see, I've configured autoscaling for all 3 queues but I've configured them in a way, that there are never more than 10 workers per queue. Also, you can see that --sleep differs. Important note: sleep does not define the sleep time between all messages. Instead, it only sleeps when idle. This means it only sleeps e.g. 20 seconds if the last time it asked, there was no message. If there was one, it will ask for the next message immediately after it finished processing. Hence, I configured them to 5, 10 and 20 seconds. In other words: In worst case, it takes 5 seconds for a message to be processed if it was on the "high prio" queue and 20 on the "low prio".

Pretty proud of this implementation, I hope it will serve us well and allow for some cool new use cases for extensions 😎

core-bundle/contao/config/default.php Outdated Show resolved Hide resolved
core-bundle/src/EventListener/DoctrineSchemaListener.php Outdated Show resolved Hide resolved
core-bundle/src/Search/Document.php Outdated Show resolved Hide resolved
manager-bundle/skeleton/config/config.yaml Show resolved Hide resolved
core-bundle/src/Cron/AbstractConsoleCron.php Outdated Show resolved Hide resolved
@leofeyer leofeyer added this to the 5.1 milestone Oct 18, 2022
@leofeyer leofeyer changed the title [POC] Introduce background worker Introduce background worker Oct 18, 2022
@leofeyer leofeyer changed the title Introduce background worker Introduce background workers Oct 18, 2022
@Toflar
Copy link
Member Author

Toflar commented Oct 21, 2022

Notes for myself:

  • The current implementation would block other cronjobs. Async crons should be a top level feature.
  • Could make the number of workers dynamic based on MessageCountAwareInterface 😎

# Conflicts:
#	core-bundle/src/Cron/Cron.php
#	core-bundle/tests/Cron/CronTest.php
@Toflar Toflar marked this pull request as ready for review December 29, 2022 14:00
@Toflar Toflar requested a review from a team December 29, 2022 14:00
Copy link
Member

@aschempp aschempp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm able to fully review this without any messenger knowledge, but it looks good in general. One thought though:

Why do we need the full worker configuration in the Contao bundle? I mean we are setting up three priority queues for devs that "just want it to work". If you have other needs, use your own transport. Why would I ever reconfigure the workers? Also, if I want to run workers differently, because I might have special needs, wouldn't I just want to disable the cron jobs and run the actual messenger:consume command with my priority etc?

@Toflar
Copy link
Member Author

Toflar commented Jan 4, 2023

Why do we need the full worker configuration in the Contao bundle? I mean we are setting up three priority queues for devs that "just want it to work". If you have other needs, use your own transport. Why would I ever reconfigure the workers? Also, if I want to run workers differently, because I might have special needs, wouldn't I just want to disable the cron jobs and run the actual messenger:consume command with my priority etc?

Not sure I understand all the questions here but let me try.

Why do we need the full worker configuration in the Contao bundle? I mean we are setting up three priority queues for devs that "just want it to work". If you have other needs, use your own transport. Why would I ever reconfigure the workers?

Why do we need it? Because Contao uses it itself as of this PR. The SearchIndexListener does no longer index right away but instead sends the new SearchIndexMessage to the message bus. Which means that if no workers are running, it will be processed immediately and thus the same as today. But if workers do run, it will happen in the background.

Why would I ever reconfigure the workers?

You mean reconfigure the routing? Because maybe you want the SearchIndexMessage to be treated as high priority. Then you can adjust that easily by providing a specific entry for the SearchIndexMessage.

Also, if I want to run workers differently, because I might have special needs, wouldn't I just want to disable the cron jobs and run the actual messenger:consume command with my priority etc?

I think you're mixing up things. "Priority" is not a thing that exists in Symfony Messenger. It's just a different transport and we make it a priority thing because prio_high is pinged more often and more volatile in terms of autoscaling configuration than prio_low.
Also, there is no "disable the cron jobs and run the actual messenger:consume command". We do run the actual messenger:consume commands as well. We just do it via a minutely cron job which is super handy for all shared hostings etc. that will certainly not allow you to configure your own workers with supervisord or whatever.

@aschempp
Copy link
Member

aschempp commented Jan 5, 2023

I don't think that's what I meant 😂 I meant why do we need the stuff in contao bundle configuration. Isn't it "just" used to configure the messenger queues? So if I really desperately need to change the default Contao queues, can't I "just override the messenger configuration that is applied by default from Contao?

@Toflar
Copy link
Member Author

Toflar commented Jan 5, 2023

Sorry but I have absolutely no clue what you're talking about.

@Toflar Toflar requested review from aschempp and ausi January 6, 2023 13:09
ausi
ausi previously approved these changes Jan 8, 2023
Copy link
Member

@ausi ausi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 😎

ausi
ausi previously approved these changes Jan 8, 2023
@leofeyer leofeyer enabled auto-merge (squash) January 9, 2023 09:40
@leofeyer
Copy link
Member

leofeyer commented Jan 9, 2023

Thank you @Toflar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants