Checks with large check_intervals scheduled outside of short time periods #647

sawolf · 2019-05-29T18:23:07Z

In short, when a service has a check_interval much shorter than the total length of the check_period, the check may be scheduled after the end of the time period. When this happens, the check isn't run and the service remains PENDING.

sawolf · 2019-06-06T21:02:02Z

The issue seems to be from 62af867, where the previous maintainer was trying to reduce the load caused by scheduling checks from outside their timeperiods. He did this by 'rescheduling' the tests by some random amount of time after the start of the timeperiod. However, this code doesn't ensure that the new time is still in the timeperiod, instead only using the check_interval or retry_interval of the host/service.

* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly * fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable * Fix initial scheduling of service checks

LoZini · 2019-08-28T13:28:13Z

Hi @Madlohe, I see in change log:

Partially reverted changes for Checks with large check_intervals scheduled outside of short time periods #647 due to CPU load issues

We're still with 4.4.3 so we didn't try the previous fix yet. Do we should expect a partial solution?

Thanks,
Antonio

sawolf · 2019-08-28T14:24:55Z

Originally (for 4.4.4) I took all instances of check scheduling and ensured that they checked the scheduled time against the timeperiod to make sure it would run. The issue was that this is a somewhat expensive operation, so on startup we'd see CPU load issues due to scheduling hundreds or thousands of checks like this simultaneously.

For 4.4.5 the timeperiod logic is only used when rescheduling. Rescheduling occurs after each check, but also in the time several minutes before each check. So, in your specific case, I think the changes should still work. If you upgrade and still have issues with this, do let me know.

…ods (NagiosEnterprises#649) * Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly * fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable * Fix initial scheduling of service checks

sawolf self-assigned this May 29, 2019

sawolf added the Bug label May 29, 2019

sawolf added this to the 4.4.4 milestone May 29, 2019

sawolf mentioned this issue Jun 11, 2019

Fix #647 - multiple issues related to short timeperiods #649

Merged

sawolf closed this as completed Jul 24, 2019

sawolf mentioned this issue Jan 16, 2023

Fix Schedule Service Check #887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checks with large check_intervals scheduled outside of short time periods #647

Checks with large check_intervals scheduled outside of short time periods #647

sawolf commented May 29, 2019

sawolf commented Jun 6, 2019

LoZini commented Aug 28, 2019

sawolf commented Aug 28, 2019 •

edited

Checks with large check_intervals scheduled outside of short time periods #647

Checks with large check_intervals scheduled outside of short time periods #647

Comments

sawolf commented May 29, 2019

sawolf commented Jun 6, 2019

LoZini commented Aug 28, 2019

sawolf commented Aug 28, 2019 • edited

sawolf commented Aug 28, 2019 •

edited