Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass several queries to Postgres at the same time and have the results returned together #1190

Closed
rightaway opened this issue Jan 6, 2017 · 13 comments

Comments

@rightaway
Copy link

rightaway commented Jan 6, 2017

It would be useful if we could chain one query made up of multiple other queries and send them all to the server at once (for performance reasons) rather than as multiple separate queries. And the results would be returned together.

select firstfunction($1, $2); select secondfunction($3, $4, $5, $6); select thirdfunction($7);

In this example there would only be one roundtrip from the application to the database rather than 3. And there would be something like an array of 3 results because there are 3 queries.

I can make a Postgres function for things like this but that's less useful in cases like this where you can have several different combinations of queries you're sending to the server from various parts of your application, and you wouldn't want to be making several Postgres functions for each combination of queries you'll send to the database.

I see https://github.com/brianc/node-postgres/wiki/Parameterized-queries-and-Prepared-Statements#multi-statement-parameterized-queries but I mean batching the queries on the client (node-postgres) rather than in Postgres itself.

@abenhamdine
Copy link
Contributor

It's not currently implemented in node-postgres, there's a pending PR for this : #1110
It would be indeed a useful feature.

However, I don't know if a real performance improvement can be expected from this.

@rightaway
Copy link
Author

I don't know if a real performance improvement can be expected from this.

Wouldn't it increase performance by saving round trips to the database? Why do you think it might not have a real improvement?

@abenhamdine
Copy link
Contributor

Because I expect the main bottlenecks to be the query execution time and the result parsing time.
However, I could be wrong.
It depends on your infrastructure too.

@manueltimita
Copy link

manueltimita commented Mar 2, 2017

Perhaps this issue should be closed, since #1110 is in the works. However, solving this problem is more complex than one thinks (transactions, anyone?), so I wouldn't bet my money on that PR.

Here is a potential approach:

Firstly, database connections have a significant overhead, and if you know you'll be opening many of them, the best approach would be to use pgbouncer along node-postgres. See here for how this could work: #975, and this blog post to understand why this is important (pay attention to the difference between pooling sessions vs pooling transactions, otherwise you might get inconsistent results, like in this example: http://stackoverflow.com/questions/13707977/how-do-transactions-work-with-node-postgres)

Secondly, you need a way to run parallel queries from your Node.js app. There is a brilliant library called async which allows you to push multiple functions into an array, and then execute them in parallel, or series. Once all the results are collected, it runs the final callback.

Wouldn't it increase performance by saving round trips to the database? Why do you think it might not have a real improvement?

Unless you connect to a database in a different datacenter, which is not always recommended anyway, latency should not be a big problem.

@matt212
Copy link

matt212 commented May 12, 2017

hi,
I agree with @rightaway since if we exclude the insert/update factor and take into account , paging queries with seperate count(*) query scenario having two {[resultset of query1 ][resultset of query 2]}, would have significant impact on overall performance of application .

since if take below factors into account ( correct me if, I may have understood driver incorrectly)

  1. time setting up the TCP connection
  2. handshaking with the pg sever
  3. performing the authentication and setting setup
  4. allocating memory in Node.js,
  5. then constructing the packet buffers to send to the pg server,
  6. then the network transfer time of sending the packet.
  7. then the query execution time, then the time of the results transfer back.
  8. then more memory allocation in Node.js,
  9. then trying to parse the received packets,
  10. then the time to cast the types into native JavaScript types.

hence, if we discard multi statement resultset array segregation and use async/await all the above
steps would be 10* (number of queries).
which would be enormous load if n number of users increases .
I agree async/await is current titan in parallel processing but still parallel processing 10* 2 or 3 has humongous impact on overall performance .
Hence , we would require multi statement result set segregation .

@brianc
Copy link
Owner

brianc commented May 12, 2017

Sorry this isn't going to have the effect you expect. Postgres can only execute 1 query at a time - if you try to execute a second query on a connection while the first query is still in process postgres will reply with an error message saying the 2nd query cannot be executed. Even if it were possible batching them and sending multiple will yield very, very little performance improvement in your application other than potentially allocating a few fewer buffers by sending all three query texts out at once. The backend traffic you receive will be again only a few packets smaller, and I'm talking like 64 bytes * number of queries - if you are that resource constrained you should probably be using a systems programming language. Furthermore, Postgres itself does not support having multiple parametrized statements executed in a 'batch' - they each must be sent separately and each can only contain one statement.

The existing pull request, which I've unfortunately overlooked until now 😬 is for when you send several non-parameterized queries in a single call - which postgres does support. In this circumstance node-postgres behaves incorrectly, smashing all results into a single array instead of 1 array per query response. That's definitely something that should be fixed. Unfortunately the PR went stale, so I might have to fix that one up myself. Supporting multiple parameterized queries in one call is not ever gonna happen unless the internals of postgres substantially change.

As far as @matt212's analysis - it's not exactly how node-postgres works under the covers. Connection & auth handshake happens very rarely in production as they're pooled and reused. The process overhead introduced into node w/ allocations & network packets by executing a single query with node-postgres is so incredibly small it's really not worth worrying about. 99% of the time your database is gonna be the bottleneck in production. I've definitely been wrong about perf in the past & I'd love to see low hanging fruit being found & fixed, but I'd suggest you do some benchmarks and profiling locally - you'd be surprised at what does and does not cause node / v8 to slow down.

Another thing you could do is write an extension or additional module which consumes the node-postgres library - particularly the connection which handles parsing and connection at a very low level & see if you can come up w/ any cleverness there.

@brianc
Copy link
Owner

brianc commented May 12, 2017

I'm going to close this but keep that old stale PR open so I'll have a reminder to fix it. Thanks for drawing it to my attention! ❤️

@brianc brianc closed this as completed May 12, 2017
@matt212
Copy link

matt212 commented May 13, 2017

hi @brianc ,
thanks for much needed clarification ! opted and implemented Async for achieving the desired objective
many thanks !

@snailmail123
Copy link

Did this ever get implemented?

@charmander
Copy link
Collaborator

@leafygreens88 Sort of. You can get results back from multiple queries, but not ones with parameters.

@rightaway
Copy link
Author

Furthermore, Postgres itself does not support having multiple parametrized statements executed in a 'batch' - they each must be sent separately and each can only contain one statement.

@brianc Does it mean that this queries can't be batched since they take parameters?

client.query("select * from function1($1, $2)", [value1, value2])
client.query("select * from function2($1, $2)", [value1, value2])

But these can? It will be one round trip to the database?

client.query("select * from function1('value1', 'value2'); select * from function2('value1', 'value2');"

How will the results come back in the second one, is it an array of result sets that are each transformed based on pg.types.setTypeParser we've added?

@abenhamdine
Copy link
Contributor

abenhamdine commented Apr 26, 2020

Furthermore, Postgres itself does not support having multiple parametrized statements executed in a 'batch' - they each must be sent separately and each can only contain one statement.

@brianc Does it mean that this queries can't be batched since they take parameters?

client.query("select * from function1($1, $2)", [value1, value2])
client.query("select * from function2($1, $2)", [value1, value2])

right, can't be batched

But these can? It will be one round trip to the database?

client.query("select * from function1('value1', 'value2'); select * from function2('value1', 'value2');"

yes, it should work

How will the results come back in the second one, is it an array of result sets that are each transformed based on pg.types.setTypeParser we've added?

yes IIRC, each result is parsed as a single statement query

@bfelbo
Copy link

bfelbo commented Oct 27, 2020

99% of the time your database is gonna be the bottleneck in production. I've definitely been wrong about perf in the past & I'd love to see low hanging fruit being found & fixed, but I'd suggest you do some benchmarks and profiling locally - you'd be surprised at what does and does not cause node / v8 to slow down.

@brianc, is this also the case when the DB is located far from the node.js server?

We need to have a single server communicating with DBs in different regions so network latency is a big factor. Here it seems like allowing multiple parameterized queries to be sent in a single network request would speed up things substantially despite them being run one-by-one on the Postgres DB.

Sadly, I can't find any posts on the Postgres mailing lists about support for sending multiple parameterized queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants