Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at 100% CPU in mainloop (w/ generated coredump) #22

Open
mikljohansson opened this issue Dec 22, 2015 · 1 comment
Open

Stuck at 100% CPU in mainloop (w/ generated coredump) #22

mikljohansson opened this issue Dec 22, 2015 · 1 comment

Comments

@mikljohansson
Copy link

During a performance test I managed to get pen-0.29.0 stuck in an infinite loop consuming 100% CPU. The process seems stuck in mainloop() and isn't accepting new connections, nor does it respond to penctl commands via the control socket. The behavior persists even after the load generator tool (gatling) is shut down. I'd be most happy to assist with trouble shooting or testing

The proxy was started in a Docker container like and SIGHUP'ed a few times to change the list of backends.

pen -r -u pen -x 8192 -c 8192 -S 64 -F /tmp/pen-1234-tcp.cfg -p /tmp/pen-1234-tcp.pid -C /tmp/pen-1234.ctl

This is the contents of the config file

I used gdb generate-core-dump to create a dump of the process

Pen was installed from this Fedora 23 x86_64 package

The Gatling scenario uses 200 concurrent clients, and is attached at

An easy way to hop into a gdb with debug symbols is e.g.

# unpack coredump to current directory
docker run -it -v "$(pwd):/work" --workdir=/work fedora:23 bash
dnf -y install pen gdb yum-utils
debuginfo-install -y pen
gdb /usr/bin/pen core.32358
@UlricE
Copy link
Owner

UlricE commented Mar 21, 2016

This was tricky to reproduce because it only triggers when a connection to a backend server times out and then only sometimes.

The culprit is this code in pending_and_closing:

    if (pending_list != -1) {
            p = start = pending_list;
            do {
                    int conn = dlist_value(p);
                    if (conns[conn].state == CS_IN_PROGRESS) {
                            check_if_timeout(conn);
                    }
                    p = dlist_next(p);
            } while (p != start);
    }

A loop over a linked list of pending connections to see if they are connected yet. The bug is that check_if_timeout is called from the loop; if the connection has timed out, check_if_timeout calls failover_server; if there is no server to fail over to, failover_server calls close_conn; close_conn modifies the list we're looping over; if the `start´ node happens to be removed by close_conn, the loop never ends.

The fix, now in git, is to not modify the list while looping over it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants