Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set close_fds to False when spawning collector #367

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

synical
Copy link

@synical synical commented Apr 22, 2017

I have noticed that on systems with a high value for SC_OPEN_MAX, tcollector
can cause spikes in CPU utilization when it runs a collector process.

The reason for this is that close_fds calls close() on every integer
between 3 and SC_OPEN_MAX before executing the child process, regardless
of if those file descriptors are currently open or not. On a
system with say a value of 1 million for SC_OPEN_MAX this is a pretty
expensive operation, even more so if that system is running something
like auditd/kaudit.

It seems that tcollector will typically spawn a short lived collector
that it will kill after a certain amount of time passes, so I don't
think the normal concerns of setting close_fds to False are applicable.

Some strace histogram output to demonstrate (note the error count):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.76    4.463530        3351      1332         5 select
  4.70    0.221257           2    127189    127114 close
...

A python bug report that discusses the same issue:
https://bugs.python.org/issue1663329

I am not clear on all of the potential downsides here so if anyone has
strong opinions on why it is a bad idea to disable close_fds here, I am
all ears.

I have noticed that on systems with a high value for SC_OPEN_MAX, tcollector
can cause spikes in CPU utilization when it runs a collector process.

The reason for this is that close_fds calls close() on every integer
between 3 and SC_OPEN_MAX before executing the child process, regardless
of if those file descriptors are currently open or not. On a
system with say a value of 1 million for SC_OPEN_MAX this is a pretty
expensive operation, even more so if that system is running something
like auditd/kaudit.

It seems that tcollector will typically spawn a short lived collector
that it will kill after a certain amount of time passes, so I don't
think the normal concerns of setting close_fds to False are applicable.

Some strace histogram output to demonstrate (note the error count):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.76    4.463530        3351      1332         5 select
  4.70    0.221257           2    127189    127114 close
...

A python bug report that discusses the same issue:
https://bugs.python.org/issue1663329

I am not clear on all of the potential downsides here so if anyone has
strong opinions on why it is a bad idea to disable close_fds here, I am
all ears.
@johann8384
Copy link
Member

@synical I know this is three years old. Any chance you can rebase and make this a config option? It can default to "False"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants