Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checks with large check_intervals scheduled outside of short time periods #647

Closed
sawolf opened this issue May 29, 2019 · 3 comments
Closed
Assignees
Labels
Milestone

Comments

@sawolf
Copy link
Member

sawolf commented May 29, 2019

See here for context

In short, when a service has a check_interval much shorter than the total length of the check_period, the check may be scheduled after the end of the time period. When this happens, the check isn't run and the service remains PENDING.

@sawolf sawolf self-assigned this May 29, 2019
@sawolf sawolf added the Bug label May 29, 2019
@sawolf sawolf added this to the 4.4.4 milestone May 29, 2019
@sawolf
Copy link
Member Author

sawolf commented Jun 6, 2019

The issue seems to be from 62af867, where the previous maintainer was trying to reduce the load caused by scheduling checks from outside their timeperiods. He did this by 'rescheduling' the tests by some random amount of time after the start of the timeperiod. However, this code doesn't ensure that the new time is still in the timeperiod, instead only using the check_interval or retry_interval of the host/service.

sawolf added a commit that referenced this issue Jul 24, 2019
* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly

* fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable

* Fix initial scheduling of service checks
@sawolf sawolf closed this as completed Jul 24, 2019
@LoZini
Copy link

LoZini commented Aug 28, 2019

Hi @Madlohe, I see in change log:

We're still with 4.4.3 so we didn't try the previous fix yet. Do we should expect a partial solution?

Thanks,
Antonio

@sawolf
Copy link
Member Author

sawolf commented Aug 28, 2019

Originally (for 4.4.4) I took all instances of check scheduling and ensured that they checked the scheduled time against the timeperiod to make sure it would run. The issue was that this is a somewhat expensive operation, so on startup we'd see CPU load issues due to scheduling hundreds or thousands of checks like this simultaneously.

For 4.4.5 the timeperiod logic is only used when rescheduling. Rescheduling occurs after each check, but also in the time several minutes before each check. So, in your specific case, I think the changes should still work. If you upgrade and still have issues with this, do let me know.

msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 24, 2023
…ods (NagiosEnterprises#649)

* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly

* fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable

* Fix initial scheduling of service checks
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Feb 28, 2023
…ods (NagiosEnterprises#649)

* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly

* fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable

* Fix initial scheduling of service checks
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…ods (NagiosEnterprises#649)

* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly

* fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable

* Fix initial scheduling of service checks
msdiamanti pushed a commit to gwos/nagioscore that referenced this issue Mar 1, 2023
…ods (NagiosEnterprises#649)

* Create reschedule_within_timeperiod(), which handles the ranged_urand() rescheduling more correctly

* fix error in reschedule_within_timeperiod, replace ranged_urand() calls where applicable

* Fix initial scheduling of service checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants