[systemd] Frequent agent service restarts can lead to chef run failures #467

olivielpeau · 2017-09-19T22:56:29Z

Happens on systemd-based systems when a lot of integrations are being added by the cookbook. The datadog_monitor resource restarts the agent after each integration file is added, which can make the service hit the maximum number of restarts allowed by systemd by default (5 every 10 seconds). That limit also applies to "manual" restarts of the service.

We get the following error in systemd's journal:

datadog-agent.service: Start request repeated too quickly.
systemd[1]: Failed to start "Datadog Agent".

Root cause: the service resource in datadog_monitor is different from the one in the main chef run (chef limitation, custom resources have their own resource collection), so the restarts that happen there are done immediately instead of being queued up nicely at the end of the run.

The Right Fix would be to remove the service definition from datadog_monitor and make all invocations of datadog_monitor notify a restart on the global service resource, see #323

The text was updated successfully, but these errors were encountered:

olivielpeau added the bug label Sep 19, 2017

olivielpeau added this to the 2.11.0 milestone Sep 21, 2017

olivielpeau mentioned this issue Sep 21, 2017

[service] Avoid failures of service resource with frequent restarts #469

Merged

olivielpeau closed this as completed in #469 Sep 21, 2017

olivielpeau mentioned this issue Sep 28, 2017

Chef 12.7+ Monitor Resource #450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[systemd] Frequent agent service restarts can lead to chef run failures #467

[systemd] Frequent agent service restarts can lead to chef run failures #467

olivielpeau commented Sep 19, 2017

[systemd] Frequent agent service restarts can lead to chef run failures #467

[systemd] Frequent agent service restarts can lead to chef run failures #467

Comments

olivielpeau commented Sep 19, 2017