blocking node with large result sets #181

houmanb opened this Issue Sep 2, 2012 · 7 comments

4 participants


We have a table with more than 1x10ˆ6 entries. We use the evented API for data retrieval.
Although ignoring the row event (for debugging purposes) the node server is blocked during data processing.
Using the non-native mode the node server is not blocked but responding extremely slow (up to 5 secs instead of 100 mis).
In the native mode the node server is completely blocked and allocates more than 2Gig of RAM.

In order to dive into the problem we tried to use your own "large-datatset-bench.js" with slightly different number of rows (1x10ˆ6).
The execution times with respect to number of items to be inserted does not increase in a linear manner (as expected).

What we have noticed during both test cases was the fact that the Postgres backend processes did not use more than 1.5 % of CPU time.


Yikes. The only thing I can think of is postgres is pushing you rows as fast as is possible and they're taking all the memory available. Though the API is evented, it can still get overwhelmed by a back-end system sending you 1x10^6 rows. Are you using a query with parameters or one without?


Hi, thanks for your response.
We use only non parametric statements (i.e. not prepared statements). Our statements look like this

SELECT T.col_1, T.col_2, T.col_3, ... FROM T;

We actually do not load all the data into the memory.

Maybe the issue is somehow correlated with the following behavoir.
In the file "large-datatset-bench.js" located in the "./pg/benchmark" directory (after adding the missing value conString)

var conString = 'postgres://postgres@';

I was able to run the large-dataset-bench.js. The "pure-javascript benchmark." works fine and the native test ends with a segfault and terminates (which is not the issue here).
But if I increase the number of rows by multiplies of 10 (i.e. starting from your value 10000 to 100000 and then to 1000000) I get the non linear behavior I mentioned in my last post.

Here my results:

loading all rows into memory:

average(10000) = 110.9 ms ms
average(100000) = 1200.9 ms
average(1000000) not as expected around 12000 ms

The first encounter with this issue was the following situation: Given a database (DB1) containing a table with 10^6 rows and another database (DB2) with an empty copy of the same table. We created two connections with both databases (evented API) and tried to read form DB1 and stream the data into DB2 on each 'row' event. Did we abuse the driver? Or did we use it in a manner it was never meant to?

thanks in advance


I have an issue with Node blocking as well. Ive structured my query method using both the callback and event listener methods and while I get results back from both, the following scenario blocks.

Client requests query of about 180,000 rows. an instance pg client is created and the query is sent to a remote server on the same network. The query takes about 10 seconds to run. All my console.log debugging message fire immediately alerting me that all my methods have finished save the query, which is how node is supposed to work.

The client sends a second request 1 second later for a dataset of 500 rows. The client uses $.post in both instances and I know it isn't blocking for the request. My server will not take my second request until the dataset from the first request has been returned. Is there a problem with multiple async requests from the same client? Am I missing something? Here is my code.
pgClient = require 'pg'
@dbquery = new pgClient.client {'host': host, 'user': user, 'password': pass, 'port': port, 'database': database}
@dbQuery.query qString, (err, result) =>
if err
console.log 'Error in getData'
response.send result.rows.

Thanks in advance.


Does this help?

Single row mode in libpq is only available for 9.2+ though. Regardless it seems like it would be worth implementing. I have been wondering how to handle very large result sets and this seems like the solution as long as you are using 9.2+.

You can also use FETCH.


There is another use case for supporting single row mode which I just ran into. Let's say you want to do a COPY but want the output of a particular column reformatted. For example, "true" instead of "t" for boolean columns. As far as I am aware, you can't do this with COPY or at least not easily. But you could do this with a SELECT * FROM T and use single row mode to avoid exceeding RAM.

@rpedela rpedela added a commit to rpedela/node-postgres that referenced this issue Aug 22, 2013
@rpedela rpedela #181 #366 Add support for single row mode in Postgres 9.2+. This will…
… enable single row mode only when the user wants to stream rows.
@rpedela rpedela added a commit to rpedela/node-postgres that referenced this issue Sep 4, 2013
@rpedela rpedela #181 #366 Add support for single row mode in Postgres 9.2+. This will…
… enable single row mode only when the user wants to stream rows.

Single row mode would indeed help with this but still wont properly take into account back-pressure when you are reading rows faster than you can process them. The problem is libpq by default and always until 9.2 selected everything into client memory before making any rows available to the application. If you really want to do this -- and I do this all the time -- the best way is to use pg-cursor and pg-query-stream. These will efficiently stream an unlimited data and handle back-pressure properly. I use it to daily pipe a few gigs of database results into elastic search without every going above 200 megs of node memory.

@brianc brianc closed this Oct 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment