New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[watermill-sql] Allow batch reading #218
Comments
Following @breml , I've got a similar question about mysql implementation. https://github.com/ThreeDotsLabs/watermill-sql/blob/a2768559a9c416c1d8b5fe506401dc51639abb63/pkg/sql/subscriber.go#L249 Is it possible to optimise the reads? |
Today I did some initial tests with batch reading and it is not that easy (and I am no longer sure, it will bring a performance gain). The problem is: The workaround to the above problem would be to first read all rows (e.g 100) into memory, close the sql.Rows and only then update the offset_consumed field in the DB. I did not implement or test this, because I am no longer convinced, that this will bring the performance gain I was hoping for. |
I think that in theory, it should help to achieve some performance gains. The downside is that we need to accept that we will lose exactly-once delivery. But I'd not expect Kafka-like performance - this was not a goal of this Pub/Sub 😉 What I can recommend, is to try to do some TDD and experimentation of reading more rows at once and committing in a separate transaction. With doing small changes and trusting tests you should be able to develop something that works :) Please keep in mind, that you will need to change test features a bit: https://github.com/ThreeDotsLabs/watermill-sql/blob/master/pkg/sql/pubsub_test.go#L175 (due to lost of exactly-once delivery for example). |
@roblaszczak @m110 Some thoughts based on my current experience with pub/subs :) One of the argument for using batch reading is the implementation of batch processing. Of course, one by one processing provides delivery order by design. But I think in practice you may need guarantee order of messages only in certain cases. Most use cases don't bother with message ordering. In cases when we really need message ordering in SQL, it could be considered as a special feature like Google pub/sub message ordering: something like For example, we read batch of messages
so we could send messages concurrently: [1], [2] ,[7], [3,5], [4,6], Batch size processing in general has downsides, like partial success and atomicity (when only part of batch was processed):
but I think in case of "at-least-once" semantics (which is fundamental for pub/subs) it should not be a problem. |
As part of fixing #311 I added experimental support for batch reading for Postgres here: ThreeDotsLabs/watermill-sql#22 (warning: this is Proof of concept, so far). |
The current implementation of watermill-sql does read exactly one message at the time (PostgreSQL implementation: https://github.com/ThreeDotsLabs/watermill-sql/blob/master/pkg/sql/schema_adapter_postgresql.go#L66).
I assume a significant subscriber performance improvement, if the subscriber could read the messages in batches (of configurable size), e.g. 10 or 100 items per SELECT.
My question is, if there are conceptually arguments against this idea.
The text was updated successfully, but these errors were encountered: