Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of appending events under normal and degraded network conditions #230

Merged

Conversation

derekkraan
Copy link
Contributor

No description provided.

@derekkraan derekkraan force-pushed the feature/append-events-statement branch from 19d0592 to 36a9b82 Compare November 24, 2020 14:55
@derekkraan derekkraan force-pushed the feature/append-events-statement branch from 503a561 to 58e4104 Compare November 25, 2020 09:50
@derekkraan derekkraan force-pushed the feature/append-events-statement branch from e93e2d2 to e150529 Compare December 1, 2020 11:22
@derekkraan derekkraan changed the base branch from feature/append-events-statement to master December 3, 2020 14:07
@derekkraan derekkraan changed the title Prepare stream when necessary in insert_event_batch Improve performance of appending events under normal and degraded network conditions Dec 4, 2020
VALUES
<%= for i <- 0..(number_of_events - 1) do %>
<%= unless i == 0 do %>,<% end %>
($<%= i*9+3 %>, $<%= i*9+4 %>, $<%= i*9+5 %>, $<%= i*9+6 %>, $<%= i*9+7 %>, $<%= i*9+8 %>, $<%= i*9+9 %>)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readers, can you explain these magic numbers? I would have guessed it was the positional argument, but there's 7?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic numbers relate to the positions of the fields from the events to be inserted. In addition $1 is the stream id or stream uuid and $2 is the event count. The 9 is used as the offset for each event's fields (7 fields per event + $1 and $2).

Copy link
Contributor Author

@derekkraan derekkraan Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meanings of the variables is found on the lines above, INSERT INTO (a, b, c) VALUES ($1, $2, $3), ($4, $5, $6) is essentially what's happening here, where a corresponds to $1, and $4, and so on. This is the same thing but a much longer list.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation

end

stream_id_or_uuid = stream_id || stream_uuid
parameters = [stream_id_or_uuid | [event_count | build_insert_parameters(events)]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[stream_id_or_uuid, event_count] ++ build_insert_parameters(events) is probably equivalent in terms of performance and a bit more legible, IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be benchmarked to see whether there's a difference. The length of the insert parameters list is seven times the number of events, so a micro optimisation was used here over readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make no difference, since lists are singly linked, only a from a ++ b has to be relinked. b remains the same. I did benchmark it just now to be sure, doing [1, 2 | 500_items] takes 2.55ns and [1, 2] ++ 500_items takes 2.48ns, basically indistinguishable and a million times faster than our calls to the DB. So I'll edit for readability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a small benchmark as well


##### With input large #####
Name              ips        average  deviation         median         99th %
concat         7.02 M      142.42 ns  ±5708.31%           0 ns        1000 ns
prepend        6.32 M      158.19 ns  ±6303.88%           0 ns        1000 ns

Comparison:
concat         7.02 M
prepend        6.32 M - 1.11x slower +15.77 ns

##### With input medium #####
Name              ips        average  deviation         median         99th %
concat         6.79 M      147.32 ns ±11783.25%           0 ns        1000 ns
prepend        6.54 M      152.97 ns ±11337.75%           0 ns        1000 ns

Comparison:
concat         6.79 M
prepend        6.54 M - 1.04x slower +5.65 ns

##### With input small #####
Name              ips        average  deviation         median         99th %
prepend        7.16 M      139.67 ns  ±1950.54%           0 ns        1000 ns
concat         6.60 M      151.42 ns  ±1906.08%           0 ns        1000 ns

Comparison:
prepend        7.16 M
concat         6.60 M - 1.08x slower +11.75 ns

@davydog187
Copy link

Awesome work @derekkraan. Really impressive isolation of this problem and solution. Out of curiosity, how long did you spend on this?

Comment on lines +122 to +125
case any_version do
true -> Statements.insert_events_any_version(schema, stream_id, event_count)
false -> Statements.insert_events(schema, stream_id, event_count)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two SQL statements be combined into one, possibly by passing the any_version as an additional argument to the function?

Otherwise needing to keep both scripts updated and exhibiting identical behaviour for current and future changes is a concern. Having separate implementations for appending events to a stream using :any_version vs other cases also means that the test coverage for the former won't be exhaustive enough. Using a single implementation means that more variations had been covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that having a bunch of if statements in these queries will lead to them being more difficult to decipher. Your SQL-fu has to already be pretty good to make sense of these queries. The structure is similar in some ways, but in other ways it is not.

I'm not sure if combining these two SQL queries with if statements will change the fundamentals of the test situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously appending events to a stream used the same SQL statements regardless of which concurrency checking was used (:any_version, :no_stream, :stream_exists, or expected version). Most of the tests are using the expected version (e.g. appending events to a stream ensures they can be read from "all events").

By splitting the two into separate SQL statements we'd need to increase the test coverage for the :any_version to cover all the expected behaviour. Otherwise we have no guarantee that there are no breaking changes. At a minimum we want to test that using :any_version also links the events into the global $all stream.

I can look into the refactoring if you don't have any more time to spare.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have time, but I am just not sure how to make one SQL statement that does both. We don't fundamentally change the testing situation by leaving the SQL the same but adding if statements.

Perhaps it makes sense to add at least a test to check that :any_version links events into the global $all stream.

I'll hit you up on slack to discuss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't fundamentally change the testing situation by leaving the SQL the same but adding if statements.

If the only difference between the two approaches is the SQL to link the events into the source stream, and the SQL to link events to the global $all stream is shared, then tests for one approach should also be valid for the others when verifying the all stream insertion and stream is correctly updated. Using entriely separate SQL statements means that any change in one has to be made and tested in the other.

@derekkraan
Copy link
Contributor Author

@davydog187 thanks! Start to finish it took me about 2 weeks to find the issue and have the fix ready. Part of that 2 weeks was spent fixing other performance issues (unrelated to EventStore) that we found along the way, and I had some other unrelated client work as well. I'd say once I started looking at EventStore with a hairy eyeball, it was probably 4 days of searching and fixing.

@davydog187
Copy link

@derekkraan this has merge conflicts.

@slashdotdash what would it take to move this PR forward?

@derekkraan
Copy link
Contributor Author

derekkraan commented Feb 26, 2021 via email

slashdotdash added a commit that referenced this pull request Mar 11, 2021
slashdotdash added a commit that referenced this pull request Mar 11, 2021
@slashdotdash slashdotdash merged commit 4b7f3fd into commanded:master Mar 11, 2021
@slashdotdash
Copy link
Member

slashdotdash commented Mar 11, 2021

I've merged this pull request as-is and it will be released in the future as v1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants