Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cronjob causes many console.CRITICAL errors in log file #6907

Closed
LIVID-Media opened this issue Feb 18, 2024 · 21 comments · Fixed by #6995
Closed

Cronjob causes many console.CRITICAL errors in log file #6907

LIVID-Media opened this issue Feb 18, 2024 · 21 comments · Fixed by #6995
Assignees
Labels
Milestone

Comments

@LIVID-Media
Copy link

Affected version(s)

5.3.0

Description

Since the update from 5.2.8 to 5.3 there are a lot of critical errors in the log file after cronjob is executed. The main problem is "more than 'max_user_connections' active connections".

Here's an excerpt from the log file:

[2024-02-18T20:40:23.227068+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=5 contao_prio_high". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=5 contao_prio_high","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []
[2024-02-18T20:40:23.228826+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=10 contao_prio_normal". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=10 contao_prio_normal","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []
[2024-02-18T20:40:28.286152+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=5 contao_prio_high". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=5 contao_prio_high","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []
[2024-02-18T20:40:28.286574+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=10 contao_prio_normal". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=10 contao_prio_normal","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []
[2024-02-18T20:40:28.301905+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=20 contao_prio_low". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=20 contao_prio_low","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []
[2024-02-18T20:40:33.323472+01:00] console.CRITICAL: Error thrown while running command "messenger:consume --time-limit=60 --sleep=10 contao_prio_normal". Message: "An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections" {"exception":"[object] (Doctrine\\DBAL\\Exception\\DriverException(code: 1203): An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/API/MySQL/ExceptionConverter.php:117)\n[previous exception] [object] (Doctrine\\DBAL\\Driver\\PDO\\Exception(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/Exception.php:28)\n[previous exception] [object] (PDOException(code: 1203): SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections at /home/www/xxx.de/vendor/doctrine/dbal/src/Driver/PDO/MySQL/Driver.php:33)","command":"messenger:consume --time-limit=60 --sleep=10 contao_prio_normal","message":"An exception occurred in the driver: SQLSTATE[HY000] [1203] User webxxx already has more than 'max_user_connections' active connections"} []

The error didn't occur in Contao 5.2. Maybe it has something to do with the new SuperviseWorkersCommand script?!

@leofeyer
Copy link
Member

@LIVID-Media What is the value of max_user_connections on your system?

@Toflar /cc

@Toflar
Copy link
Member

Toflar commented Feb 20, 2024

I can hardly imagine this error did not happen before. In 5.2 we ran the commands every minute and did not supervise whether they were already running. Now we do. Or in other words, if you now have too many connections, you should've had this before as well :)

@LIVID-Media
Copy link
Author

@leofeyer Only 15. I know that's not very much, but this was never a problem before.

@Toflar I check the log files regularly, the error only occurres since the update to 5.3.

I will contact my web host and ask them to increase the value.

@fritzmg
Copy link
Contributor

fritzmg commented Feb 20, 2024

Have the visitors increased? This could also be the reason, though Contao 5 having lots of message workers hogging your connections is not ideal of course 🙈 . Unfortunately a low number of max_user_connections is pretty common with cheap shared hostings. You could enable the HTTP cache in order to alleviate the issue.

@Toflar
Copy link
Member

Toflar commented Feb 20, 2024

Maybe in 5.2 there were just less messages, was it used in prod? If you run ps aux | grep messenger regularly, you can see how many workers are started. But by default, autoscaling allows to start 30 processes = could use 30 connections on CLI alone (plus your web ones).
See https://github.com/contao/contao/blob/5.x/manager-bundle/skeleton/config/config.yaml#L63-L93

We can of course reduce the defaults there. 10 is a bit high. But this has been the case in 5.2 too, so I don't think there's anything wrong in the supervision introduced in 5.3.

@LIVID-Media
Copy link
Author

@fritzmg No, the number of visitors has not changed.
@Toflar Yes, I have two live websites, both of which now have the same problem.

@LIVID-Media
Copy link
Author

I've a Contao 5.2.8 test installation on the same server with the same setup. When I call contao-console contao:cron --force there, a maximum of four processes are displayed in MySQL.

In 5.3 it's getting more and more during the cron job until the connection is lost and the site is no longer accessible.

In my local system (with max_user_connections = 0), the problem also occurs. I'd already deactivated all extensions.

@leofeyer leofeyer added this to the 5.3 milestone Feb 20, 2024
@leofeyer
Copy link
Member

@Toflar Is there an easy way to disable the supervisor without disabling the background workers, so @LIVID-Media can test if the problem is really caused by the supervisor?

@fritzmg
Copy link
Contributor

fritzmg commented Feb 20, 2024

I did some local tests. When I execute contao:cron "Contao\CoreBundle\Cron\SuperviseWorkersCron" in Contao 5.3.0 the PHP processes slowly increase from 4 to 30 over the course of the next 55 seconds (even though there is no work to do).

But when I execute contao:cron "Contao\CoreBundle\Cron\MessengerCron" in Contao 5.2.10 the number of PHP process stays at 4 (1 process for contao:cron and 3 worker processes).

So I can confirm the issue.

@leofeyer
Copy link
Member

@LIVID-Media While we debug the problem, you can disable the background workers by adding this in your config.yaml file:

contao:
    messenger:
        workers: []

@fritzmg
Copy link
Contributor

fritzmg commented Feb 20, 2024

Careful, this will disable email sending!

@leofeyer
Copy link
Member

No, it won't. I have been using this for some time on contao.org and the emails are sent without any problems. Not asynchronously, of course.

@fritzmg
Copy link
Contributor

fritzmg commented Feb 20, 2024

Ah sorry, did not know that. Then yes, that's the way to go.

@LIVID-Media
Copy link
Author

@leofeyer Thanks for the information. This solved the problem for now.

@fritzmg
Copy link
Contributor

fritzmg commented Feb 20, 2024

My bad, in my local setup the issue was actually the lack of support for Windows from toflar/cronjob-supervisor (see Toflar/cronjob-supervisor#1). With the added Windows support, I do not notice any run-away process creation via contao:supervise-worker.

@fritzmg
Copy link
Contributor

fritzmg commented Feb 21, 2024

@LIVID-Media is there a possibility to gain test access to your server environment?

@LIVID-Media
Copy link
Author

@fritzmg Sure. Please tell me what you need.

@fritzmg
Copy link
Contributor

fritzmg commented Feb 21, 2024

I can confirm the issue with the credentials provided by @LIVID-Media. It's a Webgo shared hosting environment, where the ps command does not work:

$ ps
Error, do this: mount -t proc proc /proc

Thus Supervisor::isRunningPid() always returns false and therefore the Supervisor keeps starting processes every 5 seconds.

This will likely to be the case on all of Webgo's shared hosting environments (I was also able confirm this in the Webgo hostings of two of our customers).

@LIVID-Media as a first step you could ask Webgo's support to make the ps command work in the shell environment (I will do the same).

There could of course also be other shared hosting environments where ps does not work. We will probably have to implement a failover in the Supervisor, if ps does not work (not sure how though).

@Toflar
Copy link
Member

Toflar commented Feb 26, 2024

With Toflar/cronjob-supervisor#2 we can probably fix this as posix_getpgid() might be supported. But then the question here is still what do we do if we still cannot supervise. We can now check this using Supervisor::canSupervise() in v2 but where would we check this?

We have 2 places where we can check it:

a) Check it at Container runtime. This is hard because the Supervisor instance is not created there at the moment so it means we need to do quite some work.
b) Check it when running contao:supervise-workers. This should be easy to implement.

And then the question is also what to do if the check fails.

Case a): When building the container, we could disable the entire worker configuration - > but where would we log?

Case b): In case of doing it in the contao:supervise-workers command we have also all sorts of questions to clarify -> where do we log? (probably just inject the logger, but what level then?) and should we still run 1 worker but ignore the autoscaling? (= it could still lead to too many processes running if they all run more than a minute) or should we disable it completely? Should it exit with exit code 0 or with an error? Would contao:cron fail then?

@leofeyer leofeyer added the up for discussion Issues and PRs which will be discussed in our monthly Mumble calls. label Feb 27, 2024
@LIVID-Media
Copy link
Author

Even though there may soon be a solution. Nevertheless, here is the answer from WebGo in short:
It's not possible to limited the ps command to a specific chroot environment on shared hosting. This would display all running processes of the server, including processes of other customers and the root user. Therefore they can't activate ps in shared hosting.

@leofeyer
Copy link
Member

leofeyer commented Mar 7, 2024

As discussed in the Contao call, we want to set contao.messenger.workers to an empty array when building the container and supervising is not supported. This can then be checked with container:debug (similar to the image library).

@leofeyer leofeyer removed the up for discussion Issues and PRs which will be discussed in our monthly Mumble calls. label Mar 7, 2024
leofeyer pushed a commit that referenced this issue Mar 11, 2024
Description
-----------

Fixes #6907.

Ready for review but needs an update of the `composer.json` entries before merging. Requires Toflar/cronjob-supervisor#2 to be released as 2.0.0 which I'll only do once this PR is approved concept-wise.

Commits
-------

b63f645 Update to toflar/cronjob-supervisor v2 and disable workers if not sup…
7b81d99 Use static helpers
c1e55c0 Update to stable release
leofeyer pushed a commit to contao/core-bundle that referenced this issue Mar 11, 2024
Description
-----------

Fixes contao/contao#6907.

Ready for review but needs an update of the `composer.json` entries before merging. Requires Toflar/cronjob-supervisor#2 to be released as 2.0.0 which I'll only do once this PR is approved concept-wise.

Commits
-------

b63f6452 Update to toflar/cronjob-supervisor v2 and disable workers if not sup…
7b81d990 Use static helpers
c1e55c04 Update to stable release
@leofeyer leofeyer linked a pull request Mar 20, 2024 that will close this issue
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants