-
Notifications
You must be signed in to change notification settings - Fork 126
It is possible for subscriptions to miss messages when chasing the tail under high concurrent load #121
Comments
In SQL Server at least you may be able to handle the uncommitted (2nd case) by using a sp_applock per sequence number as it is a reader/writer lock, you can block readers when it is being written, if you aren't blocked then it was a rollback. Unfortunately the other DBs may not have similar capabilities. |
How is the thinking about this going? As we're doing some acceptance tests trying to incorporate SQStreamStore into a new app, we're finding we can easily replicate the case under load where many events are missed by the subscriptions (~5-10 of 500000). We haven't tried the proposed PR, but we have a very hacky workaround just to prove that this is really the problem. While at a gut level I don't like the delay approach any more than you appear to, this is a really serious problem that results in data integrity and trust issues, so I'm resigned to the delay. If we don't want to incorporate that into the base library, could we at least expose a IHandeEventStoreGaps or something we can provide a custom implementation for (so we don't have to fork the library)? Also, we've found that there's another hole in the base implementation in addition to what's described here. When there's only a single message on a page, and the gap that's returned is just before it, that message is missed too. Our wrapper outputs these messages: 00:23:22 Returning contiguous page from base 20813..20815 <= page right before the problem Aside from this issue - this is a really great library - very happy with it's improvements over NEventStore |
Hi, @areicher, we feel your pain, and I'm personally setting aside some time this weekend to perform a deep dive into this very problem. Trust me when I say, we want this this fixed sooner rather than later. I hope to come back to you with good news. |
@areicher |
|
Thanks @yreynhout - that would be great! In case it's helpful: today we copied enough of the classes into our library to apply a temporary fix for the very short term (still based on the idea of delaying, but then on returning the contiguous subset of the page). If you're interested in looking at that for your implementation for some reason, the file's attached (only the ReadAllForwards has relevant changes). In addition to what I mentioned earlier, we think there's still another risk in only looking at the last page for this too, since the boundary is (for this purpose) arbitrary and the gap could appear across it. The attache impl has a time based filter instead of page with the idea that for our case the ES is both the reader and writer so time synchronization shouldn't be a problem. |
Before we start it's useful to explain the actual problem with our SqlStreamStore implementation on top of SQL Server. The position of a message in the virtual The only thing I've found that comes close to giving us a certain degree of order are Log Sequence Numbers (LSN) but to be honest I know too little about them, nor how they behave over time and space, let alone how to reconcile them with what we have today. Change data capture builds on top of LSN, so that might be worth exploring in the future but for now, again, I think it'd take us too far away from what we have already in place. Thus I set out to build a mental model that attempts to overcome this issue as best as I can, building on top of what some of you have already covered. When a subscription starts from a position that is "far way" from the I'll be the first to acknowledge that doing it this way does not provide absolute guarantees nor is it a fundamental shift from what has been proposed or implemented thus far. There are certainly edge cases to be covered, especially with regard to the |
Hmm. That seems like a reasonable idea. Mine bombed because it treated gaps
due to rollback the same as gaps due to in-flight transactions.
One interesting thing is that there will generally be a subscription that
is ahead. That will have to deal with gaps by repolling I think, but it
could always report back to a DB table where it got to.
I haven't worked with them for a long time, but SQL 2005+ supports push
notifications to clients which could be used to notify other active
subscriptions that a gap is legitimate. All they'd need to know is where
the absolute correct head is: any gaps before that have already been
handled.
It doesn't help the first consumer, but it will stop churn subsequently.
…On Sun, 10 Feb 2019, 13:26 Yves Reynhout ***@***.*** wrote:
Before we start it's useful to explain the actual problem with our
SqlStreamStore implementation on top of SQL Server. The position
<https://github.com/SQLStreamStore/SQLStreamStore/blob/master/src/SqlStreamStore/Streams/StreamMessage.cs#L12>
of a message in the virtual all stream is based on an IDENTITY
<https://github.com/SQLStreamStore/SQLStreamStore/blob/master/src/SqlStreamStore.MsSql/ScriptsV3/CreateSchema.sql#L55>
column. Each message that is appended to a stream gets its own unique
position within the all stream. Under normal circumstances, you can think
of the position as being part of a monotonically incrementing sequence.
However ... each time a stream-append-transaction is rolled back, *whatever
may be the reason* (e.g. wrong expected version, a crash or fail-over,
etc...), there's a chance that those positions are lost (this is by
design - blame Microsoft SQL Server). In essence, what this means is that
positions are *NOT* monotonic. Phrased differently, you can expect *gaps*.
For example, [0,1,2,3,4,5,6,7,1000,1001,1002,...] is a perfectly
acceptable sequence of positions. If only that was the end of our
troubles. When stream-append-transactions are in-flight, they each allocate
one or more positions. What we do not have control over is the *order* in
which these transactions commit and thus the *order* in which other -
mostly implicit read - transactions observe the committed positions. This
due to the transaction isolation level being used and the ORDER BY
[Position] clause. In other words, this is yet another source of *gaps*,
but a temporary one. Once the transaction does commit, any later read
transactions will observe the positions in the correct *order*. This is
what the title of this issue alludes to - this only happens under higher
concurrent load. So to summarize, there are two issues that may cause
*gaps*, one is legit the other only of a temporary nature, without many
hints as to which one we're dealing with. Why is this a problem? Mostly
because a lot of consumers, especially subscribers, will expect to observe
all positions in ascending order.
The only thing I've found that comes close to giving us a certain degree
of *order* are Log Sequence Numbers (LSN) but to be honest I know too
little about them, nor how they behave over time and space, let alone how
to reconcile them with what we have today. Change data capture builds on
top of LSN, so that might be worth exploring in the future but for now,
again, I think it'd take us too far away from what we have already in place.
Thus I set out to build a mental model that attempts to overcome this
issue as best as I can, building on top of what some of you have already
covered.
[image: image]
<https://user-images.githubusercontent.com/142834/52533143-dadadc80-2d2f-11e9-8f28-218a9acc5fde.png>
When a subscription starts from a position that is "far way" from the head
(I keep thinking tail), I think there's a window, either in time or
number of positions between where the subscription is at and where the head
is at, in which it's safe to say that we don't need to be detecting gaps,
and can safely ignore them since, if they are there, it's probably because
of rollbacks. Given that SQLStreamStore is in charge of the transaction
life cycle I'm not envisioning transactions that have a very long lifespan.
Nonetheless, I think our best option is to allow people to inject what
window they feel comfortable with (think "it's a setting" with a
reasonable default). As a subscription "enters" that window, the behavior
it exhibits should switch from ignoring gaps to detecting gaps. If the
positions being read are increasing monotonically, we're in what I'd call
a gapless state. As soon as we detect a gap we enter into a gapped state
where we may want to wait for that gap to be filled, albeit in a reasonable
amount of time (think "another setting" people can leverage). When
positions increase monotonically again, we can switch back to the gapless
state. If we "exit" the window between the subscription position and the
head we can go back to ignoring gaps (e.g. a subscriber that is slowing
down).
I'll be the first to acknowledge that doing it this way does not provide
absolute guarantees nor is it a fundamental shift from what has been
proposed or implemented thus far. There are certainly edge cases to be
covered, especially with regard to the position as of which one
subscribes and / or gaps at the edges of a page. Because most of the
subscription logic is written on top of the IReadonlyStreamStore
abstraction, it's not that hard to emulate an implementation which exhibits
gaps. It'll be my next order of business to spike these ideas, refine
them and evaluate the result.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#121 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANOrNV4u294F_qLfAYC2RyUZrnZAKEVks5vMB3-gaJpZM4TvPBJ>
.
|
I doubt that's worth the effort and complexity, but true that's also an assumption on my part and all paths are open at this point. |
Something I've been pondering... Isn't Really Committed isolation supposed to prevent the in-flight reading anyway? Essentially, if you read over a section where a tx is in-flight it would block. My only thought here is perhaps it is RCSI (Read Committed Snapshot Isolation) that is "helping" us here by using a lock&free copy-on-read strategy. My suspicion is turning that off would make this go away.
Apologies if this is already being done I just don't want to lose these thoughts. Regarding the gaps in rollbacks, can't we use an offset/fetch? Essentially, you hold your position in another table and each position change on the stream changes the offset in the query, so essentially you do
```
SELECT *
FROM StreamStore
Order by Id
Offset &position Fetch 100
```
And then save your position as you checkpoint.
…________________________________
From: Yves Reynhout <notifications@github.com>
Sent: Sunday, February 10, 2019 6:00:56 AM
To: SQLStreamStore/SQLStreamStore
Cc: Ovan Crone; Mention
Subject: Re: [SQLStreamStore/SQLStreamStore] It is possible for subscriptions to miss messages when chasing the tail under high concurrent load (#121)
I doubt that's worth the effort and complexity, but true that's also an assumption on my part and all paths are open at this point.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#121 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAl8JTLXKsTqiK9a0LVIjG-fvHTKSWNCks5vMCYYgaJpZM4TvPBJ>.
|
I think only I'm a bit puzzled as to how offset / fetch is going to improve matters, especially given the sorting is happening on |
Our local workaround (I work with @areicher) essentially does what I think @yreynhout outlined above. We've updated ReadOnlyStreamStoreBase.ReadAllForwards a little:
[edited to update formatting a little] |
I am happy to submit this as a PR (or Aaron would, I'm sure), but given the hesitation around the previous PR, I think we wanted to avoid that unless it was likely to get accepted. Would you like us to make a PR? To be honest, we would very much rather have this in a patched 1.1.x or in a trunk version than continue using a local workaround. |
What's the current consensus on this? |
What is the status on this issue? Seems like a high priority issue. The SQLStreamStore is not reliable while this is unsolved. Will this be fixed or is should you not use SQLStreamStore when you want a reliable store? |
@bertzor I think one workaround that came up is running you database connection in serializeable transaction mode. This will definitely hurt performance, but should give guarentees. |
We had good luck with the changes I posted above, but ymmv. The missing piece from my snippet above is the backing field:
Keep in mind that my change above was applied to 1.1.3, and I haven't looked at updated sources. I am not actively using EventSourcing at the moment, and unfortunately I don't really have time to do an up-to-date PR for this against the latest sources with tests, but I will try to respond if there are specific questions. It would be great if this made it into official sources, though, because I expect to be back to using SQLStreamStore soon. I want to add that there were certain activities in our system that could create huge bursts of events (in some cases millions of them). With the patch above, we haven't lost events, but we definitely see the logged messages showing the fixes operating. We also don't see a significant performance hit with this fix vs. the original 'lossy' implementation. |
An alternative idea: instead of guessing, ask the database. First, check if there are gaps by comparing row count returned vs. last position minus first position in the result set. If there are gaps, issue a READ UNCOMMITTED (e.g. WITH (NOLOCK)) COUNT over the same query you're using to retrieve the messages. The result of this query will (?) accurately account for rolled back transactions vs. uncommitted transactions. If the result matches the number of rows you previously received, you know you can process the page. If it does not match, you repeat the above process until it does. You could probably do this with a single Execute command to eliminate an additional roundtrip. Thoughts? |
@ryanjshaw That's a fine idea. I'm not sure it would work because the reads don't necessarily hit the serialization issues (at least not from my preliminary testing) I did have success with just trying to append to the table with the missing position -- transaction 1
BEGIN
-- this will be position 13 in the test
SELECT * from dbo.append_to_stream(
'hashedid'::char(42),
'Test-123'::varchar(1000),
'hashedmetadataid'::char(42),
-2::integer,
now()::timestamp without time zone,
Array[(uuid_generate_v4(), 'Test', '{"test": 123}', NULL)]::dbo.new_stream_message[])
-- transaction 2
BEGIN
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
insert into dbo.messages(stream_id_internal, stream_version, position, message_id, created_utc, type, json_data, json_metadata)
values (
-- we insert position 13 triggering a wait for the 1st transaction
1, 13, 13, uuid_generate_v4(), now(), '$$gapcheck', '{"position": 13}', null
) on conflict do nothing;
-- transaction 2 is now blocked
-- transaction 1
COMMIT;
-- transaction 2 is now unblocked
select * from dbo.read_all(10, 10, true, true);
commit; So some pseudocode
|
@nordfjord I'm not sure I follow. My understanding of the issue, based on comments from @spadger and @yreynhout, is as follows: We want to retrieve a page of messages where the range of rows is being determined by an Note: Scenario 1 - no rolled back or uncommitted transactions in the range
Query Results: 3 rows returned, Scenario 2 - rolled back/uncommitted exclusively following committed
Query Results: 1 row returned, Position 3, once committed, will be selected for in a subsequent query, which will then map to one of these scenarios depending on exactly how things play out. Scenario 3 - a rolled back transaction between committed transactions
Query Results: 2 rows returned, So we issue a
Scenario 4 - an uncommitted transaction between committed transactions
Query Results: 2 rows returned, As you can see, the analysis is the same as for scenario 3 -- we have a gap. But in this case we also have a problem, because if we process the page, we will start looking for So we issue a same
which is covered by one of the previous scenarios Scenario 5 - a mix of rolled back and uncommitted transactions between committed transactions
Query Results: 2 rows returned, So we issue a same
which is covered by one of the previous scenarios Additional NotesWe use As I mentioned previously, you could include the It also seems as though there is an optimisation desired to only retrieve the 'gap' rows rather than refresh the entire page -- I imagine you could achieve this using a TVP of gap positions supplied to the @jschuetze does your solution cater for both |
@ryanjshaw Are you sure
I get the same result with three transactions
Note: I tested this on Postgres |
@nordfjord I'm not sure I follow why you are using You should execute |
I was testing this on Postgres where |
Ah, that's really interesting. This seems like an oversight on the part of the PostgreSQL designers: they support something ( I understand now why you are trying to insert into those positions - it's the only way you can probe the Here is what happens in SQL Server -- works perfectly (requires EDIT: Both MySql and Sqlite support |
@nordfjord I've been thinking about this and maybe those PostgreSQL guys are smart after all. I think we could implement a new procedure, It seems that PostgreSQL has something similar, I tested this with SQL Server and it works perfectly in snapshot isolation mode (default mode for SQLStreamStore). Aside from the same pattern being applicable to all databases, what's really nice about this approach is you execute this SQL when you detect a gap, and the DBMS will return when your page is ready -- there is no repeated polling, no arbitrary delays, etc. To avoid a 2nd roundtrip to fetch the now committed rows, you can actually just implement In fact it might be possible to resolve this issue just by putting the locking hint directly into the existing |
Hey, I've tried using But it sounds like there's a decent way forward for the rest of the sql databases. I think you might run into deadlock scenarios if you introduce locking into |
@nordfjord According to the PostgreSQL manual link I shared, |
Yet another idea for this. What if we used a separate table for the all stream that was updated via a deferrable constraint trigger? CREATE TABLE all_messages
(
position bigserial not null,
stream_id text not null,
message_id uuid not null,
primary key (position)
);
create or replace function append_message_to_all()
returns trigger
language plpgsql
as
$$
begin
insert into all_messages(stream_id, message_id) VALUES (NEW.stream_id_internal, NEW.id);
return NEW;
end;
$$;
create constraint trigger append_to_all
after insert
on messages
deferrable initially deferred
for each row
execute procedure append_message_to_all(); The Alternatively making the all stream an eventually consistent projection is starting to sound like a better and better idea 😂 edit: This definitely wouldn't work because it would screw up the "append result" of append_to_stream |
Think I found a solution https://github.com/adamfur/Chaser. It runs 4 concurrent writers and one worker that chases the auto increment value. A lot of gaps occurs but it's doesn't cause any problems. I accumulated all the items auto increment values in the loop then compare to an aggregation to the actual content of the database. Always get the correct result. |
For anybody still interested in this topic, I stumbled across @evgeniy-khist and their PostgreSQL event sourcing reference implementation today. They have an interesting solution that uses transaction IDs to solve this problem, and a thorough discussion of their solution with beautiful sequence diagrams: https://github.com/evgeniy-khist/postgresql-event-sourcing#4-7-1. You should also note an important caveat that is not discussed in that section, but later in the document:
|
Why
When inserting messages into SqlStreamStore, it is possible to get gaps in the stream, e.g. by a transaction rolling back, or by a transaction not yet ready for commit.
In the first case, the gap is legitimate, but in the second case, any subscribers to the messages should wait until the gap has been filled. It is not possible, upon seeing a gap to canonically determine the scenario under which a gap exists, so SqlStreamStore uses the following algorithm:
DefaultReloadInterval
ms, and reread a batchThe reason for ignoring gaps in step 4 is that if the gap is legitimate (e.g. a tx was rolled back), not to ignore gaps would cause the subscriber to continuously reload a page.
The underlying issue is that after the
DefaultReloadInterval
when a batch will be reloaded, subsequent messages may be written to the message queue. These new messages may also contain gaps, but these new messages will never be checked for inconsistencies.e.g., consider batch size is 100
First read. 5 Messages in the messages table, batch size is 100, and message 3 has not yet committed (but will be)
Second Read. Now 10 messages from messages table, but message 7 has not been committed (but will be)
What
Alter the reloading strategy such that, upon reloading a page due to gaps, the reloaded page is re-validated, taking into account that a message may never exist:
The text was updated successfully, but these errors were encountered: