New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waitingClients grows w/o bounds #108

Closed
timothykimball opened this Issue Dec 12, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@timothykimball

timothykimball commented Dec 12, 2015

Scenario:

  • Steady load introduced to server. Server handles requests fine
  • Something happens causing DB to slow down (e.g. backup of DB)
  • This causes service response times to increase, which in turn creates a back log of waiting clients
  • DB pool becomes maxed out
  • Client network connections start to timeout waiting for a DB connection
  • But generic pool does not know this. Instead, the waitingClients list continues to grow w/ clients that have long gone.
  • This is effectively a memory leak as long as the load is present.
  • Memory leak in proportion to the size of the request - bigger requests create a bigger impact.

Log of behaviour:

[2015-12-12T19:37:33.267Z] TRACE: express/178 on eda75d89cf72:  REQUEST: 1 NUM: 2 (uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b, num_reqs=1, req.uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b)
    GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp HTTP/1.1
[2015-12-12T19:37:33.268Z] TRACE: express/178 on eda75d89cf72: route: USE /:suite_ref/:auth (uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b)
[2015-12-12T19:37:33.269Z] TRACE: express/178 on eda75d89cf72: middleware: suite: called (uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b)
[2015-12-12T19:37:33.269Z] TRACE: express/178 on eda75d89cf72: connectToDb: open (uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b)
trace (0a9fdebd-4822-4a66-bfe9-c8682d2e40d2): pg: connection: postgres://postgres:aire@192.168.99.100/aire
[2015-12-12T19:37:33.271Z]  INFO: express/178 on eda75d89cf72: postgres pool: dispense() clients=1 available=0
Number of connections: 0
[2015-12-12T19:38:03.270Z]  WARN: express/178 on eda75d89cf72: TERMINAT: 0 NUM: 2 (uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b, num_reqs=0, res.method=GET, res.url=/, res.uuid=033ffdc7-a1c5-4a3c-94c2-51bf2f7e526b, res.status_code=200)
Number of connections: 1
GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp - - ms - -
[2015-12-12T19:38:16.566Z] TRACE: express/178 on eda75d89cf72:  REQUEST: 1 NUM: 3 (uuid=08c453f4-5401-41e7-893a-a50271de9780, num_reqs=1, req.uuid=08c453f4-5401-41e7-893a-a50271de9780)
    GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp HTTP/1.1
[2015-12-12T19:38:16.567Z] TRACE: express/178 on eda75d89cf72: route: USE /:suite_ref/:auth (uuid=08c453f4-5401-41e7-893a-a50271de9780)
[2015-12-12T19:38:16.568Z] TRACE: express/178 on eda75d89cf72: middleware: suite: called (uuid=08c453f4-5401-41e7-893a-a50271de9780)
[2015-12-12T19:38:16.568Z] TRACE: express/178 on eda75d89cf72: connectToDb: open (uuid=08c453f4-5401-41e7-893a-a50271de9780)
trace (15ed9c8b-178b-48c8-ac82-18b3827272bb): pg: connection: postgres://postgres:aire@192.168.99.100/aire
[2015-12-12T19:38:16.570Z]  INFO: express/178 on eda75d89cf72: postgres pool: dispense() clients=2 available=0
Number of connections: 0
[2015-12-12T19:38:46.570Z]  WARN: express/178 on eda75d89cf72: TERMINAT: 0 NUM: 3 (uuid=08c453f4-5401-41e7-893a-a50271de9780, num_reqs=0, res.method=GET, res.url=/, res.uuid=08c453f4-5401-41e7-893a-a50271de9780, res.status_code=200)
Number of connections: 1
GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp - - ms - -
[2015-12-12T19:39:23.145Z] TRACE: express/178 on eda75d89cf72:  REQUEST: 1 NUM: 4 (uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7, num_reqs=1, req.uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7)
    GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp HTTP/1.1
[2015-12-12T19:39:23.146Z] TRACE: express/178 on eda75d89cf72: route: USE /:suite_ref/:auth (uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7)
[2015-12-12T19:39:23.147Z] TRACE: express/178 on eda75d89cf72: middleware: suite: called (uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7)
[2015-12-12T19:39:23.147Z] TRACE: express/178 on eda75d89cf72: connectToDb: open (uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7)
trace (e3a4223a-ad78-4a86-b82b-caffcbccbd49): pg: connection: postgres://postgres:aire@192.168.99.100/aire
[2015-12-12T19:39:23.149Z]  INFO: express/178 on eda75d89cf72: postgres pool: dispense() clients=3 available=0
Number of connections: 0
[2015-12-12T19:39:53.148Z]  WARN: express/178 on eda75d89cf72: TERMINAT: 0 NUM: 4 (uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7, num_reqs=0, res.method=GET, res.url=/, res.uuid=c06041e8-58ef-4fc5-abe1-77d6264677e7, res.status_code=200)
Number of connections: 1
GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp - - ms - -
[2015-12-12T19:40:17.986Z] TRACE: express/178 on eda75d89cf72:  REQUEST: 1 NUM: 5 (uuid=104895a3-1602-4f1c-ad63-f3f022333302, num_reqs=1, req.uuid=104895a3-1602-4f1c-ad63-f3f022333302)
    GET /passport/kata5hq06i7bebi0y467yqnp5o5wwvqp HTTP/1.1
[2015-12-12T19:40:17.988Z] TRACE: express/178 on eda75d89cf72: route: USE /:suite_ref/:auth (uuid=104895a3-1602-4f1c-ad63-f3f022333302)
[2015-12-12T19:40:17.989Z] TRACE: express/178 on eda75d89cf72: middleware: suite: called (uuid=104895a3-1602-4f1c-ad63-f3f022333302)
[2015-12-12T19:40:17.991Z] TRACE: express/178 on eda75d89cf72: connectToDb: open (uuid=104895a3-1602-4f1c-ad63-f3f022333302)
trace (2c513426-e427-43a2-8eca-ec807ef03c1a): pg: connection: postgres://postgres:aire@192.168.99.100/aire
[2015-12-12T19:40:17.997Z]  INFO: express/178 on eda75d89cf72: postgres pool: dispense() clients=4 available=0

Notes:

  • Each 'REQUEST' log message is a new request
  • Each 'TERMINAT' log message is a timed out request
  • The pool size is one
  • Before the log starts, I managed to eat the 1 db connection by making a special 'buggy' request which deliberately did not close it's DB connection.
@sandfox

This comment has been minimized.

Show comment
Hide comment
@sandfox

sandfox Dec 13, 2015

Collaborator

Indeed, at the moment the "waiting list" for resources is unbounded and it's up to user to check the size of the "waiting list" and do what they want with that information. I don't have strong feelings either way about allowing the user to supply an optional bounds for the "waiting list, but I don't think it will provide much value over what it already achievable.

In terms of solutions to this problem I can think of a couple of things:

  1. returning some kind of handle when pool.acquire is called, so that "resource requests" can be cancelled by the user before they are fulfilled. (and possibly some other stuff too)
  2. Allow an acquire timeout value as one of the args taken by pool.acquire so that "resource requests" will be removed

Both of them seem useful and there are several requests for 2 already #85 #94 , but I've not had any time to do it and the pull-request #82 needs some work to pass the tests last time I looked)

Collaborator

sandfox commented Dec 13, 2015

Indeed, at the moment the "waiting list" for resources is unbounded and it's up to user to check the size of the "waiting list" and do what they want with that information. I don't have strong feelings either way about allowing the user to supply an optional bounds for the "waiting list, but I don't think it will provide much value over what it already achievable.

In terms of solutions to this problem I can think of a couple of things:

  1. returning some kind of handle when pool.acquire is called, so that "resource requests" can be cancelled by the user before they are fulfilled. (and possibly some other stuff too)
  2. Allow an acquire timeout value as one of the args taken by pool.acquire so that "resource requests" will be removed

Both of them seem useful and there are several requests for 2 already #85 #94 , but I've not had any time to do it and the pull-request #82 needs some work to pass the tests last time I looked)

@sandfox sandfox self-assigned this Dec 13, 2015

@DanielYWoo

This comment has been minimized.

Show comment
Hide comment
@DanielYWoo

DanielYWoo May 26, 2016

+1
If we reach maxWaitClients, acqurie() should fail-fast and throw an error

DanielYWoo commented May 26, 2016

+1
If we reach maxWaitClients, acqurie() should fail-fast and throw an error

@catblade

This comment has been minimized.

Show comment
Hide comment
@catblade

catblade Jun 9, 2016

Can we add a cleanup option to said clients that calls something like "forceDestroy" set-able by the client object?

catblade commented Jun 9, 2016

Can we add a cleanup option to said clients that calls something like "forceDestroy" set-able by the client object?

@sandfox

This comment has been minimized.

Show comment
Hide comment
@sandfox

sandfox Oct 30, 2016

Collaborator

there is now a maxWaitingClient option and acquireTimeout which should help here.

Collaborator

sandfox commented Oct 30, 2016

there is now a maxWaitingClient option and acquireTimeout which should help here.

@sandfox sandfox closed this Oct 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment