New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of appending events under normal and degraded network conditions #230
Improve performance of appending events under normal and degraded network conditions #230
Conversation
19d0592
to
36a9b82
Compare
503a561
to
58e4104
Compare
e93e2d2
to
e150529
Compare
VALUES | ||
<%= for i <- 0..(number_of_events - 1) do %> | ||
<%= unless i == 0 do %>,<% end %> | ||
($<%= i*9+3 %>, $<%= i*9+4 %>, $<%= i*9+5 %>, $<%= i*9+6 %>, $<%= i*9+7 %>, $<%= i*9+8 %>, $<%= i*9+9 %>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readers, can you explain these magic numbers? I would have guessed it was the positional argument, but there's 7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic numbers relate to the positions of the fields from the events to be inserted. In addition $1
is the stream id or stream uuid and $2
is the event count. The 9 is used as the offset for each event's fields (7 fields per event + $1
and $2
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meanings of the variables is found on the lines above, INSERT INTO (a, b, c) VALUES ($1, $2, $3), ($4, $5, $6)
is essentially what's happening here, where a
corresponds to $1
, and $4
, and so on. This is the same thing but a much longer list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation
lib/event_store/storage/appender.ex
Outdated
end | ||
|
||
stream_id_or_uuid = stream_id || stream_uuid | ||
parameters = [stream_id_or_uuid | [event_count | build_insert_parameters(events)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[stream_id_or_uuid, event_count] ++ build_insert_parameters(events)
is probably equivalent in terms of performance and a bit more legible, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be benchmarked to see whether there's a difference. The length of the insert parameters list is seven times the number of events, so a micro optimisation was used here over readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will make no difference, since lists are singly linked, only a
from a ++ b
has to be relinked. b
remains the same. I did benchmark it just now to be sure, doing [1, 2 | 500_items]
takes 2.55ns and [1, 2] ++ 500_items
takes 2.48ns, basically indistinguishable and a million times faster than our calls to the DB. So I'll edit for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran a small benchmark as well
##### With input large #####
Name ips average deviation median 99th %
concat 7.02 M 142.42 ns ±5708.31% 0 ns 1000 ns
prepend 6.32 M 158.19 ns ±6303.88% 0 ns 1000 ns
Comparison:
concat 7.02 M
prepend 6.32 M - 1.11x slower +15.77 ns
##### With input medium #####
Name ips average deviation median 99th %
concat 6.79 M 147.32 ns ±11783.25% 0 ns 1000 ns
prepend 6.54 M 152.97 ns ±11337.75% 0 ns 1000 ns
Comparison:
concat 6.79 M
prepend 6.54 M - 1.04x slower +5.65 ns
##### With input small #####
Name ips average deviation median 99th %
prepend 7.16 M 139.67 ns ±1950.54% 0 ns 1000 ns
concat 6.60 M 151.42 ns ±1906.08% 0 ns 1000 ns
Comparison:
prepend 7.16 M
concat 6.60 M - 1.08x slower +11.75 ns
Awesome work @derekkraan. Really impressive isolation of this problem and solution. Out of curiosity, how long did you spend on this? |
case any_version do | ||
true -> Statements.insert_events_any_version(schema, stream_id, event_count) | ||
false -> Statements.insert_events(schema, stream_id, event_count) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these two SQL statements be combined into one, possibly by passing the any_version
as an additional argument to the function?
Otherwise needing to keep both scripts updated and exhibiting identical behaviour for current and future changes is a concern. Having separate implementations for appending events to a stream using :any_version
vs other cases also means that the test coverage for the former won't be exhaustive enough. Using a single implementation means that more variations had been covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried that having a bunch of if
statements in these queries will lead to them being more difficult to decipher. Your SQL-fu has to already be pretty good to make sense of these queries. The structure is similar in some ways, but in other ways it is not.
I'm not sure if combining these two SQL queries with if statements will change the fundamentals of the test situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously appending events to a stream used the same SQL statements regardless of which concurrency checking was used (:any_version
, :no_stream
, :stream_exists
, or expected version). Most of the tests are using the expected version (e.g. appending events to a stream ensures they can be read from "all events").
By splitting the two into separate SQL statements we'd need to increase the test coverage for the :any_version
to cover all the expected behaviour. Otherwise we have no guarantee that there are no breaking changes. At a minimum we want to test that using :any_version
also links the events into the global $all
stream.
I can look into the refactoring if you don't have any more time to spare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have time, but I am just not sure how to make one SQL statement that does both. We don't fundamentally change the testing situation by leaving the SQL the same but adding if
statements.
Perhaps it makes sense to add at least a test to check that :any_version
links events into the global $all
stream.
I'll hit you up on slack to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't fundamentally change the testing situation by leaving the SQL the same but adding if statements.
If the only difference between the two approaches is the SQL to link the events into the source stream, and the SQL to link events to the global $all
stream is shared, then tests for one approach should also be valid for the others when verifying the all stream insertion and stream is correctly updated. Using entriely separate SQL statements means that any change in one has to be made and tested in the other.
@davydog187 thanks! Start to finish it took me about 2 weeks to find the issue and have the fix ready. Part of that 2 weeks was spent fixing other performance issues (unrelated to EventStore) that we found along the way, and I had some other unrelated client work as well. I'd say once I started looking at EventStore with a hairy eyeball, it was probably 4 days of searching and fixing. |
@derekkraan this has merge conflicts. @slashdotdash what would it take to move this PR forward? |
I'll be able to put the necessary work into this PR in March.
…On Fri, Feb 26, 2021, at 17:15, Dave Lucia wrote:
@derekkraan <https://github.com/derekkraan> this has merge conflicts.
@slashdotdash <https://github.com/slashdotdash> what would it take to move this PR forward?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#230 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD7PLMQ33Z4NINWG3RMJPDTA7CLRANCNFSM4UA6GFDA>.
|
I've merged this pull request as-is and it will be released in the future as v1.3.0 |
No description provided.