Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Getting a connection from pool without actually running a query against it will leak it. #137

Closed
NathanaelA opened this Issue Jun 21, 2012 · 28 comments

Comments

10 participants

If I grab a connection and never send a query on to the server this will leak the connection out of the pool. Once you hit the max (default is 10) - you are hosed. Not sure there is a lot that can be done about this, since you don't know why they got the connection and why it is still unused. Maybe add a debug mode timer to check how old all unused connections are and spit out an error in the event they are > 60 seconds. -- but at the minimum it probably should be documented somewhere.

Consideration for debugging purposes:
I added a "this.id = parseInt(Math.random()*10000,10);" to the client.js p.connect routine. Then I spit this # out on every pool log output. This allowed me to see the entire flow of a pooled connection (create, reuse, return, pause, drain, destruction) and find the missing connection. Then in my code I was able to do client.id to track it through my code. This was very useful for debugging purposes. You might also consider adding to your pool logging how many connections are out (count - available, including at the automated destruction routine how many are still "out").

troyk commented Jun 27, 2012

What version are you using. If you look in lib/defaults.js poolIdleTimeout should be set to 30 seconds by default, meaning that after 30 seconds your client will be destroyed and thus removed from the pool, which should cause a new client to be created on the next call to connect.

Granted, this is probably not what you want either as you'll have a bunch of reconnects going on.

In a web app, I used to grab a client via connect in the middleware and attach it to the request object as a convenience to save a callback in the main "controller" function and ran into this same problem. My fix was to emit "drain" on the response.end, but ultimately I ended up just installing plv8 on the server as dealing with transactions in an async environment turns out to be a PITA. plv8 is very cool, has a synchronous workflow yet is very performant because no round trips to the server.

+1 for better logging, I've added logging queries to console, etc. Generic-pool used by node-postgres already has some debug logging, but it's not exposed via the node-postgres API. Pull request forth coming when I get my project in a deployable state!

No, I'm aware of the poolIdleTimeout -- the issue is that if you do grab a connection from the pool; then don't do anything with it -- ie:

pg.connect(..., callback(err,client) {
// some corner case bypass sending any sql to server
if (x && y && z) return;

// Use my client to run query
client.query( ... );
}
);
In the case of x,y & z -- this "client" will be leaked from the pool because it never is used it is never put back into the pool to even be affected by the poolIdleTimeout or any other parameter.

However, like I said above I'm not sure a fix for this because I asked for the connection and until I use it pg library is making a "proper" assumption that I needed it, so because I haven't used it yet it leaves it alone. The only solution that seems worth while is adding some additional (on by default) warning logging would be good if the pool gets to 0 available connections available in the pool and the waiting connections is greater than 10% of the pool size.

Doing a drain emit, would work to return it to the pool. Might be better to have helper function that does this (i.e. client.finished(), client.returnToPool(), or client.unused() ) -- just something that documents this issue because at least no one on my team had a clue this could happen. And so it was a lengthly debug session to track it through our code back to this issue.

On the logging;
I'm willing to submit my additional logging (in debug mode) which basically shows the entire pool state at every connect so you can easily see if something is wrong in the pool. (i.e. my output is:
PGSQL dispensing connection, waiting clients=2, available in pool: 7, currently in use: 5
and
PGSQL acquiring Connection, Current Pool Count: 5 of 10, waiting in queue: 3

I can see immediately that I have 5 being used, and 3 waiting in the pool in a few seconds I would expect the numbers to be somewhere around (since those three waiting in the queue will create 3 new connections)
PGSQL Acquiring Connection, Current Count: 8 of 10, waiting in queue: 0

But if I see (frequently):
PGSQL Acquiring Connection, Current Count: 10 of 10, waiting in queue: 7
Then I realize that I have a couple issues; Too small of a pool. And a potential queue leak, if the 10 of 10 never drops to 0 or 1 of 10, then I know I have a leak and need to turn on additional logging to find it. :)

In addition the additional debug logging has an random "id" associated with every connection; so that I can see which connection leaked and who called it. (this.id = parseInt(Math.random()*10000,10); at client connection creation) -- All the logging has the "id" with it so that I can easily trace a id through the log, when it is returned to the pool, any re-use, and finally its cleanup

As for the plv8 -- that is very cool, I'm going to have to look into that, I hadn't see that project, thx! :)

Owner

brianc commented Jun 27, 2012

+1 for exposing the built in pool logging from node-pool since it's already there. As for adding debugging logging into node-postgres in other places I'm not a fan of that. If you really want to add logging you can monkey patch around the methods you need to log. Or log when you call into the node-postgres library. I find having different modules all log a different way is annoying (like the way express and socket.io log for you by default). Logging should be handled in the application layer since it's infinitely varied depending on the needs and likes of the developer reading the log.

I view requesting a connection from the pool and not using the connection as a code bug, actually. If you take a connection from the pool you better use it. If you don't use it you can manually emit a drain event from the connection I suppose to give the connection back to the pool. In your example the

// some corner case bypass sending any sql to server
if (x && y && z) return;

Should be moved outside of the pg.connect call. If it absolutely can't you can always call client.emit('drain') to signal you're done. Any automatic check in thing of "idle" connections is going to lead to inteterminate behavior in the case where someone actually wanted to keep 1 of the connections checked out from the pool for a "very long time" for whatever reason (i.e. using a client.query('LISTEN event_name') on one of the connections)

Oh, I don't disagree Brian -- that is the best solution; I am verifying outside the connect that none of the variables that typically get built and sent to the server are not null and then bypassing the connect in the event that all of them are null. :-) And I would also consider it a bug in our code also. However, this is a nasty gotcha if you don't know about it. I fully understand the reasoning why it works the way it does and I'm not advocating any sweeping changes; all I'm advocating is something like

Adding a simple:
(psedo-code) if (count == factory.max && waiting > factory.max/4) console.warn("Potential connection leak in pgpool, or your pool size should be increased.");

Which would be a simple quick debug code to make sure that this is caught early rather than much, much later after a very long debug session. :-)

My team probably won't get caught by this again; I want to make sure tomorrow someone else doesn't get caught by it.

We were also caught by this. In our previous code base (before we added caching) we fetched a database connection for each user request. In the cases where one wasn't used because the user made an invalid request, the connection was leaked. Until I saw this issue, we never actually took the time to find this problem. Now that we ask for connections lazily, it all works.

However, I think the way this pool functions is fundamentally broken. Having the pool code guess when to free the connection is alright only if it 100% correct. Since it isn't, it should not accept that responsibility. If it were me, I'd make the connection objects proxies that are actually connected when they are first used. That way, you wouldn't leak them.

Also, putting a warning somewhere might also be a good idea.

Owner

brianc commented Jul 6, 2012

I like the idea of returning a proxy which only connects if it receives a query.

Contributor

strk commented Jul 13, 2012

Did the debug code ever enter the codebase ? I've code starving while waiting for a callback after exactly 5 connection failures and wondering if it's due to thi problem..

Owner

brianc commented Jul 13, 2012

It hasn't entered the codebase. Do you have a failing test case for this?

Contributor

strk commented Jul 13, 2012

On Fri, Jul 13, 2012 at 08:51:02AM -0700, Brian Carlson wrote:

It hasn't entered the codebase. Do you have a failing test case for this?

Not really. I'm cheasing an heisenbug but can't really tell yet if it's rooted
in node-postgresql: Vizzuality/CartoDB-SQL-API#38

--strk;

Contributor

strk commented Jul 13, 2012

On Fri, Jul 13, 2012 at 06:04:56PM +0200, strk wrote:

On Fri, Jul 13, 2012 at 08:51:02AM -0700, Brian Carlson wrote:

It hasn't entered the codebase. Do you have a failing test case for this?

Not really. I'm cheasing an heisenbug but can't really tell yet if it's rooted
in node-postgresql: Vizzuality/CartoDB-SQL-API#38

Oh, by the way, the tests do run fine for me, but I guess none of the tests
actually communicate with a PostgreSQL server, right ?
(talking about make test)

--strk;

Owner

brianc commented Jul 13, 2012

try make test-all connectionString=pg://username:password@host:5432/database

make test only runs the unit tests, not the integration tests

Contributor

strk commented Jul 16, 2012

On Fri, Jul 13, 2012 at 01:06:59PM -0700, Brian Carlson wrote:

try make test-all connectionString=pg://username:password@host:5432/database

Ah, great!

May I suggest to have the required tables created as part of it ?
To avoid this:

It is possible you have not yet run the table create script under script/create-test-tables

--strk;

Owner

brianc commented Jul 25, 2012

I'm going to talk to felixge sometime soon about connection pooling and hopefully have a little more elegant solution once we put our heads together

Collaborator

booo commented Jul 25, 2012

@NathanaelA asked for a client.release function. If we can just fake a drain event this should be easy to implement. What do you think?

Owner

brianc commented Jul 25, 2012

Yeah, I think the methods on the pooled clients dealing with checking in/checking out should semantically match the actual operations. I've been thinking a lot about this recently and it will probably require a new-ish API for using the pool. I'm thinking actually of implementing the pool in a stand-alone module, but I'm still not sure about that. What do you propose the client.release function would do?

mattly commented Jul 26, 2012

FYI we were just bit by this pretty hard, eventually causing our app to hang. I've removed all the paths through our code where we would grab a client and not do anything with it, solving the immediate problem.

I'm considering doing something with a process.nextTick check against the queue size to fire drain, especially since we can't (immediately) upgrade to node 0.8 and use newer stuff. Would such a patch be useful?

Collaborator

booo commented Jul 26, 2012

client.release would release an unused (maybe even used) client to the pool. An unused client is a client where we never called the query function. At some point in the current code base we execute pool.release(client) when we receive a drain event. We should add a function (client.release) that calls this function (pool.release) if we did not use the client. Otherwise I don't see an option to return an unused client into the pool at the moment. Of course you can emit a drain event but that's a hack...

@mattly I don't get your process.nextTick idea. I you don't mind provide us with a pull request so we can discuss the code?

Contributor

cosbynator commented Jul 30, 2012

For what it is worth, I seem to be hitting this issue too. The way I structure my code is to have apply a "decorator" to all my data-functions that checks out the postgres client, and returns it whenever the callback is executed (this is similar to the .pooled method on node-pool: https://github.com/coopernurse/node-pool#pooled-function-decoration).

Since the pool handling is done outside my function, it is tough to ensure that I don't have a return path that discards the client without use. client.release would solve my problem, and a proxy would make this structure impossible.

I'm thinking about just using node-pool manually since my use-case might be too hard to bake in.

Contributor

mjackson commented Aug 30, 2012

It seems like the API in this case is backwards. We're instantiating and obtaining connections and then we're stuck with the problem of when to drain/release them back into the pool.

Just thinking out loud here, but wouldn't it be interesting to have an API like this:

var query = new Query('select * from users');
var config = 'postgres://localhost/my_database'; // could be an object of params as well

query.run(config, function (err, results) {
  // do something with the results
});

In this API the Query object is king. You can build a complex query in memory and it won't actually connect to the database until you run it. This would free the user from needing to worry about holding on to client and connection objects as the underlying code could determine all of that for you. It could even be smart about reusing existing connections to the same host.

The run method could even return an instance of the client it's using in case you really need it for something. For example, @brianc's LISTEN query could look something like this:

var query = new Query('LISTEN my_channel');
var config = 'postgres://localhost/my_database';

var client = query.run(config, function (err, results) {
  // do something with the results
});

client.pauseDrain();

This way you could explicitly tell the client that you want it to hang around for a while and not be automatically collected. Just some thoughts, but it would help to resolve this issue and some of the headaches around managing the client pool.

How are you envisioning transactions?

Contributor

mjackson commented Aug 30, 2012

@nick-apperson Off the top of my head, I think we could just handle them in a similar fashion.

var trans = new Transaction();

trans.add(query1);
trans.add(query2);
trans.add(query3);

trans.run(config, function (err, results) {
  // do something with the results
});

If you needed results on a per-query basis, you could aggregate them in the returned results object.

@mjijackson That behavior would be broken. Often, when you use transactions, you do not know when you start how many queries you need. Perhaps it would be better to do:

var trans = new Transaction();

trans.run(query1, callback1);
trans.run(query2, callback2);

trans.commit(final_callback)
Contributor

mjackson commented Aug 30, 2012

That looks good to me.

About bringing transactions into library interface, IMHO, it will only complicate the library without a good return of investment. In its current use, a connection/client stands at the perfect point as a container for transactions or any other grouping of queries. Once you start bringing SQL into library, the need for adding more and more containers will never end. Fortunately this is not a Java library.

In an application, I switch between schemas, run a few queries and then switch to another. In this application, multiple jobs are done concurrently so I need each job to keep things separately. These group of queries can of course be contained in separate transactions; but it will complicate things that are already really simple.

I believe the library is a good example of Node's philosophy of keeping things simple and playable so people can play with them as they need. Suggested implementations are trivial to implement so anybody can do it if they wish that convenience in their application.

Bringing these into the library itself will break the beauty of simplicity it currently has.

For the original issue, I believe, keeping the solution in the implementation of client object is the correct way to go. Making the client object separate from an actual connection and having the actual connection only when the first query arrives looks like a very reasonable solution. It is simple and beautiful it solves the problem without any change to API. On the other hand, returning a client object from a query exec call is not a nice way of carrying implementation details to API level.

These are just my humble opinions as I stated at the beginning.

Owner

brianc commented Aug 31, 2012

ugg

I just typed a huge response and accidentally deleted it. If you have it in your email, would you be so kind as to repost it here or forward it to me?

The tl;dr version is this -

  1. I stand in full accord with everything @hasanyasin has said. He said it better than I could have.

  2. What we want is the following code below to not block the event loop and allow for clean program exit because nothing was checked out from the pool, correct?

var pg = require('pg');
var connectionString = 'pg://blabla@blabla:5432/bla'

for(var i = 0; i < 100; i++) {
  pg.connect(connectionString, function(err, result) {
    //do nothing
  });
}

We want connections to be lazily initialized only on the first call to Client#query. This should hopefully fix the large and original gotcha to which this issue is relates.

From brianc:

"I absolutely agree with @hasanyasin -- honestly he said it better than I could have with regards to a transactions API

I believe the library is a good example of Node's philosophy of keeping things simple and playable so people can play with them as they need. Suggested implementations are trivial to implement so anybody can do it if they wish that convenience in their application.

It might be good to come up with a wrapper module you enjoy using with transaction support, sql support and so on for a higher-level API. I definitely think the pooling logic within node-postgres needs some iterations or tweaks to get rid of the large gotcha!'s it currently contains, but I aim for "the simplest thing that could possibly work" valuing correctness, explicitness, and error-free over succinctness or high-level API.

This is, after all, a database client and not a fully implemented data access layer. The less we can assume for our consumers of the API here the less our consumers, present and future, will have to force their code to work around our assumptions or monkey-patch their way back to their version of code sanity.

Now, to the implementation of the lazy client, what we want is something like this?

var pg = require('pg');
var config = 'some completely invalid connection information';

for(var i = 0; i < 100; i++) {
pg.connect(config, function(err, client) {
//do nothing what-so-ever with the client
});
}
Executing this should not cause the app to hang. In fact, almost no code within node-postgres should execute, because no queries were issued to clients. Right? I think this is totally doable. Would it fix the initial concern and make the library more usable and less error prone? I'd like to think so."

I agree that bringing transactions into the system would be unnecessary and undesired. However, there is connection pooling built in that is fundamentally broken. I wrote a connection pool for our redis connections in about an hour. Connection pooling is easy to do. What is not easy is working with connection pooling that is broken. I'd honestly prefer it just wasn't in the node pg module at all.

Owner

brianc commented Aug 31, 2012

Thanks @nick-apperson...I accidentally clicked "delete" and then, like always, clicked the confirmation box without reading it. Impatience FTL

Owner

brianc commented Feb 22, 2013

I've changed the pool substantially to no longer rely on the drain event at all. No pauseDrain no resumeDrain. Documentation is a work in progress.

https://github.com/brianc/node-postgres/wiki/pg
#274

@brianc brianc closed this Feb 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment