Skip to content

Commit

Permalink
Fix FATAL in function client_proto(): bad client state: 6/7
Browse files Browse the repository at this point in the history
The cancel request protocol is very simple:
1. Clients send a cancel key
2. Server cancels the query running on the matching backend
3. The server closes the socket to indicate the cancel request was
   handled

At no point after sending the cancel key the client is required to send
any data, and if it does it won't get a response anyway because the
server would close the connection. So most clients don't, specifically
all libpq based client don't.

The Go client however, sends a Terminate ('X') packet before closing the
cancel socket from its own side (usually due to a 10 second timeout).
This would cause a crash in PgBouncer, because PgBouncer would still
listen on the client socket, but on receiving any data we wouldreport a
fatal error. This PR fixes that fatal error by stopping to listen on the
client socket once we've received the cancel key (i.e. when we change
the client state to CL_WAITING_CANCEL).

I was able to reproduce both of the errors reported by @raymasson and
confirmed that after this change the errors would not occur again.

Reproducing this with libpq based clients is not possible, except when
manually attaching to the client using `gdb` and manually running `send`
on the cancel request its socket. So no test is added, since our test
suite uses psycopg which uses libpq under the hood.

Fixes pgbouncer#904

Related to pgbouncer#717
  • Loading branch information
JelteF committed Aug 17, 2023
1 parent 15a9d1e commit 4757046
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 2 deletions.
3 changes: 3 additions & 0 deletions src/client.c
Original file line number Diff line number Diff line change
Expand Up @@ -1190,6 +1190,9 @@ bool client_proto(SBuf *sbuf, SBufEvent evtype, struct MBuf *data)
break;
case CL_WAITING:
fatal("why waiting client in client_proto()");
case CL_WAITING_CANCEL:
case CL_ACTIVE_CANCEL:
fatal("why canceling client in client_proto()");
default:
fatal("bad client state: %d", client->state);
}
Expand Down
22 changes: 20 additions & 2 deletions src/objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,24 @@ static void pause_client(PgSocket *client)
disconnect_client(client, true, "pause failed");
}

/*
* Deactivate the client socket and put it into the cancel request wait queue.
* We're not expecting any data from the client anymore at this point at all.
* But some clients might send some anyway (specifically the Go client). Since
* we don't care about any of that extra data we just stop reading from the
* socket.
*/
static void pause_cancel_request(PgSocket *client)
{
Assert(client->state == CL_LOGIN);

slog_debug(client, "pause_cancel_request");
change_client_state(client, CL_WAITING_CANCEL);
if (!sbuf_pause(&client->sbuf))
disconnect_client(client, true, "pause cancel request failed");
}


/* wake client from wait */
void activate_client(PgSocket *client)
{
Expand Down Expand Up @@ -1695,7 +1713,7 @@ static void accept_cancel_request_for_peer(int peer_id, PgSocket *req)
* request.
*/
req->pool = pool;
change_client_state(req, CL_WAITING_CANCEL);
pause_cancel_request(req);

/*
* Open a new connection over which the cancel request is forwarded to the
Expand Down Expand Up @@ -1803,7 +1821,7 @@ void accept_cancel_request(PgSocket *req)
* request.
*/
req->pool = pool;
change_client_state(req, CL_WAITING_CANCEL);
pause_cancel_request(req);

/*
* Open a new connection over which the cancel request is forwarded to the
Expand Down

0 comments on commit 4757046

Please sign in to comment.