Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nagios freezing when using Query Handler #635

Closed
ToToL opened this issue Apr 25, 2019 · 9 comments
Closed

Nagios freezing when using Query Handler #635

ToToL opened this issue Apr 25, 2019 · 9 comments
Assignees
Milestone

Comments

@ToToL
Copy link

ToToL commented Apr 25, 2019

Hello

I'm using Nagios 4.4.3 under CentOS 7.

I activated query handler in nagios with default configuration.

On the other side, I have a perl program that open a socker, register to query handler and stop.
Register command : wproc register name=col-;pid=;max_jobs=1;plugin=Collector
the is replaced with the PID of my perl process

After some hour, nagios is freezing. When dooing strace on it, I get infinit lines with this message :
write(7170, "job_id=306\0type=0\0command=/usr/l"..., 104) = -1 EAGAIN (Resource temporarily unavailable)

lsof tell me that nagios have over 7000 open files with lot of nagios.qh file. I tried to allow Nagios to open more files but it just make the issue need more time to appear.

Is this a known issue ? Can you help me on investigate this ?

Thanks

@ToToL
Copy link
Author

ToToL commented Apr 28, 2019

Hello

With this little python script, I can reproduce that nagios is not releasing nagios.qh

https://pastebin.com/PjSm2mi9

before executing the script :
lsof | grep nagios.qh | wc -l
=> 10

After executing the script :
lsof | grep nagios.qh | wc -l
=> 11

And this number never go down so after a while, it reach nagios limit opened files.

Can you help me ?

@ToToL
Copy link
Author

ToToL commented Apr 28, 2019

it seems that the issue is arround
force = !!(flags & WPROC_FORCE);
in wproc_destroy function in base/workers.c

if force is True, the issue goes away.

@ToToL
Copy link
Author

ToToL commented May 7, 2019

Up ?

@ToToL
Copy link
Author

ToToL commented May 17, 2019

up ?

@sawolf
Copy link
Member

sawolf commented May 22, 2019

Hi @ToToL, thanks for reporting this issue. I'll need to do some more testing before using your suggested changes (most likely, there's just one flag that's not getting set properly, rather than that whole variable being incorrectly assigned).

@sawolf
Copy link
Member

sawolf commented May 23, 2019

I've had a little time to look into this. For now, I've decided to make a slightly different change: in workers.c, I changed 752 from wproc_destroy(wp, 0); to wproc_destroy(wp, WPROC_FORCE);. That line seems to be called when the 'worker' (or in your case, your perl/python script) dies.

My sense of this, however, is that the force variable can probably just be removed from wproc_destroy() entirely. @hedenface what do you think about this?

sawolf added a commit to sawolf/nagioscore that referenced this issue May 24, 2019
@ToToL
Copy link
Author

ToToL commented May 24, 2019

Hello

WPROC_FORCE is not 0 by default ?

Thanks for digging my issue :)

Regards

@sawolf
Copy link
Member

sawolf commented May 24, 2019

WPROC_FORCE itself is set to 1 in include/workers.h - the force = !!(flags & WPROC_FORCE); is using a mask to determine whether the specific flag was set when the function was called. It seems like WPROC_FORCE was used by the workers, but not by the nagios server itself. It's not clear to me why this was done, but my server hasn't had any issues yet by changing the flag.

Either way, thank you for reporting the issue! It looks like we should have a fix for this in 4.4.4

@sawolf sawolf added this to the 4.4.4 milestone May 24, 2019
sawolf added a commit that referenced this issue May 29, 2019
Resolve #635: When there is no more data to read from a socket, release it
@sawolf
Copy link
Member

sawolf commented May 29, 2019

Patch has been merged into the maint branch and will be released with 4.4.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants