Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in which "dead" subscribers would not be removed right away leading to bspwm running out of file descriptors #1126

Closed
wants to merge 1 commit into from

Conversation

JBouron
Copy link
Contributor

@JBouron JBouron commented May 25, 2020

Fix bug in which "dead" subscribers would not be removed right away and would
consume resources and file descriptors which could lead to the bspwm process
running out of available file descriptors and not being able to accept new
incoming connections.

When a process subscribes to bspwm events (i.e. bspc subscribe <event>), bspwm
keeps some state about this subscriber such as the file descriptor in which to
write the events and which events are requested by the subscriber.

When a subscriber dies (i.e. the bspc subscribe ... process is killed) bspwm
does not know it right away. Instead, on the next event being logged, if bspwm
encounters an error when trying to write() into the file descriptor associated
to a subscriber bspwm will consider this subscriber dead and remove it from the
linked list (which will also close() the file descriptor of the subscriber).
This is done in the put_status() function defined in subscribe.c.

This means that bspwm keeps file descriptors on potentially closed streams for
some time.
Normally this is ok, because events are generated often (when focusing nodes,
switching desktop, etc ...), therefore those "dead" file descriptors are not
kept around for too long.
However, during a period of inactivity (i.e. the user is not actively using the
computer or never switching nodes or desktops) those file descriptors can be
kept for a long time.
This is especially problematic in the following scenario:
1. User leaves the machine idle.
2. A "churn" of subscribers happens, where subscribers come and go.
3. Since the machine is idle, no events are happening and the file
descriptor of all dead subscribers are not closed.
4. bspwm consumes more and more file descriptors as new subscribers are
created.
5. At some point bspwm has too many opened file descriptors and cannot
accepts new incoming connections. The accept() syscall in the mainloop
fails (returning EMFILE) and bspwm is stuck in an infinite loop, not
being able to serve new request. At this point bspwm needs to be killed
and restarted.

While this could be deemed as unlikely (the maximum number of opened files being
1024 AFAIK (Arch Linux)), it turns out that this exact scenario can happen with
polybar (https://github.com/polybar/polybar) and the bspwm module. When a
laptop connected to an external monitor becomes idle, and the screen saver kicks
in, polybar periodically reloads itself (because of xrandr events) and therefore
periodically kill and re-create a subscriber. Since there is no activity on the
bspwm process, all dead instances of the polybar subscriber are not removed and
bspwm eventually runs out of file descriptors and gets stuck in the infinite
loop while consuming 100% cpu.
(Note: This is not a polybar issue, polybar does the right thing here).

The fix is as follows: In the main loop of bspwm, check for any dead subscribers
and remove them (if any). This is accomplished by a new function declared in
subscribe.h: prune_dead_subscribers().
prune_dead_subscribers() will iterate through the linked list of subscribers and
for each of them will try to call write() on their file descriptor with an empty
buffer (i.e. size == 0). If the write() fails (returns -1) then the subscriber
is declared dead and this function can remove it from the list and close the
associated file descriptor.

…nd would

consume resources *and* file descriptors which could lead to the bspwm process
running out of available file descriptors and not being able to accept new
incoming connections.

When a process subscribes to bspwm events (i.e. `bspc subscribe <event>`), bspwm
keeps some state about this subscriber such as the file descriptor in which to
write the events and which events are requested by the subscriber.

When a subscriber dies (i.e. the `bspc subscribe ...` process is killed) bspwm
does not know it right away. Instead, on the next event being logged, if bspwm
encounters an error when trying to write() into the file descriptor associated
to a subscriber bspwm will consider this subscriber dead and remove it from the
linked list (which will also close() the file descriptor of the subscriber).
This is done in the put_status() function defined in subscribe.c.

This means that bspwm keeps file descriptors on potentially closed streams for
some time.
Normally this is ok, because events are generated often (when focusing nodes,
switching desktop, etc ...), therefore those "dead" file descriptors are not
kept around for too long.
However, during a period of inactivity (i.e. the user is not actively using the
computer or never switching nodes or desktops) those file descriptors can be
kept for a long time.
This is especially problematic in the following scenario:
    1. User leaves the machine idle.
    2. A "churn" of subscribers happens, where subscribers come and go.
    3. Since the machine is idle, no events are happening and the file
       descriptor of all dead subscribers are not closed.
    4. bspwm consumes more and more file descriptors as new subscribers are
       created.
    5. At some point bspwm has too many opened file descriptors and cannot
       accepts new incoming connections. The accept() syscall in the mainloop
       fails (returning EMFILE) and bspwm is stuck in an infinite loop, not
       being able to serve new request. At this point bspwm needs to be killed
       and restarted.

While this could be deemed as unlikely (the maximum number of opened files being
1024 AFAIK (Arch Linux)), it turns out that this exact scenario can happen with
polybar (https://github.com/polybar/polybar) and the bspwm module. When a
laptop connected to an external monitor becomes idle, and the screen saver kicks
in, polybar periodically reloads itself (because of xrandr events) and therefore
periodically kill and re-create a subscriber. Since there is no activity on the
bspwm process, all dead instances of the polybar subscriber are not removed and
bspwm eventually runs out of file descriptors and gets stuck in the infinite
loop while consuming 100% cpu.
(Note: This is not a polybar issue, polybar does the right thing here).

The fix is as follows: In the main loop of bspwm, check for any dead subscribers
and remove them (if any). This is accomplished by a new function declared in
subscribe.h: prune_dead_subscribers().
prune_dead_subscribers() will iterate through the linked list of subscribers and
for each of them will try to call write() on their file descriptor with an empty
buffer (i.e. size == 0). If the write() fails (returns -1) then the subscriber
is declared dead and this function can remove it from the list and close the
associated file descriptor.
Copy link

@RBOrtmann RBOrtmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "as been closed" to "has been closed" in the comment in subscribe.h, otherwise looks good to me.

@baskerville
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants