-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark: investigate interleaved table join performance #20586
Comments
cc: @RaduBerinde @arjunravinarayan @petermattis |
@richardwu This is a fantastic write up. One nit: you use the future tense ( I wonder why interleaved tables do not have the same performance as non-interleaved tables for |
@@ -242,9 +242,10 @@ func (irj *interleavedReaderJoiner) Run(ctx context.Context, wg *sync.WaitGroup)
break
}
- var helper postProcHelper
+ var helper *postProcHelper
helperIdx := -1
- for i, h := range irj.tablePosts {
+ for i := range irj.tablePosts {
+ h := &irj.tablePosts[i]
if desc.ID == h.tableID && index.ID == h.indexID {
helper = h
helperIdx = i Note that |
Looks like you were right: performance in |
@richardwu Not sure if you have the time in your last week, but it would be useful to add Go benchmarks for |
This is an excellent write-up! Far more thorough than I expected from our discussion! 🎉 Just one thing: please merge the interleave benchmark! It's an open PR now, and if you need anything unblocked that's currently blocking the merge (nothing AFAIK from our discussion on Thursday) let me know ASAP. |
Outline
In #19853, full planning support for interleaved table joins was introduced. We would like to quantify the performance improvement from the new
InterleavedReaderJoiner
in the distributed execution engine.There are four benchmark scenarios:
InterleavedReaderJoiner
Our expectations are:
Configuration
The most recent HEAD on master (d4f3b14) was used as the baseline. The patch was rebased ontop of this HEAD.
Four identical nodes were deployed using roachprod. The machine specs were:
n1-standard-4
The cluster was launched on nodes 1-3 with each binary (master and patch) and with a default replication factor 3. The following notable flags were used to start Cockroach (using the roachperf utility):
The interleave benchmark ran on the 4th node and targeted node 1 (gateway). This benchmark configured a hierarchy of tables with a specified number of rows each:
Concurrency was set to
2 * runtime.NumCPU() = 8
(i.e. 8 workers concurrently executed the same query). The random seed used to generate the data was left at the default value of42
. All sub-cases were ran at least twice for 60s (only results from two trials are shown below) and the data directories were wiped per subsequent sub-case.Cases
We benchmarked the four scenarios with different queries and table sizes.
Case 1 (Simple)
(1) non-interleaved (master)
(2) interleaved (master)
(3) non-interleaved (patch)
(4) interleaved (patch)
Summary of Case 1
Since all the data fit onto one node (albeit the gateway node (node 1) was not the leaseholder (node 3)) the performance gain can be attributed to the one-scan (4) vs. two-scan (2) optimization for interleaved tables.
We also see comparable performance for joins between interleaved and non-interleaved tables (although (4) had slightly higher tail latencies, which could be empirical noise).
Case 2 (multiple ranges)
We tried to structure the query such that it touched all three nodes and overlapped splits in both the non-interleaved and interleaved scenarios.
(1) non-interleaved (master)
(2) interleaved (master)
(3) non-interleaved (patch)
(4) interleaved (patch)
Summary of Case 2
The patch yielded tremendous performance improvements (+97% throughput, -48% latency) for joins on interleaved tables. This can be attributed to the one vs two scans and decreased node-to-node RPC calls since joins are localized. We can diff this improvement with the results from Case 1 to approximate the improvement form the localized joins within
InterleavedReaderJoiner
.We see that interleaved joins were also much better than non-interleaved joins (+74% throughput, -38% latency), which is expected and desired because of data locality. In fact, the non-interleaved case was rather pessimistic (for the interleaved joins) since the entire
merchant
table was on the gateway node: we should expect interleaved joins to perform even better in the average and optimal cases.Case 3 (Multiple range + grandchildren rows)
Case 2 verified the most optimistic scenario where one is joining on all tables in the interleaved hierarchy.
Case 3 verified a pessimistic case and an "average case" where we are joining on a minority and majority "domain" of data in the interleaved hierarchy, respectively.
(1) non-interleaved (master)
(2) interleaved (master)
(3) non-interleaved (patch)
(4) interleaved (patch)
Summary of Case 3
Intuitively, interleaved tables are not recommended over non-interleaved tables if the tables that are commonly joined (
merchant
andstore
in the pessimistic case) can both fit on one range or one node and make up a small proportion of the interleaved hierarchy.Also, interleaved tables are great if the join happens on tables that make up a majority of the hierarchy (
merchant
andvariant
). That being said: joining on interleaved tables was still more efficient than joining on non-interleaved tables even when the join occurred on a very small minority of the two tables (average case). This implies that the actual overhead of scanning the entire interleaved hierarchy was relatively insignificant compared to the improved locality and reduced RPC traffic that came with theInterleavedReaderJoiner
.Case 4 (Multiple range + 70ms latency)
Used the comcast utility to simulate 70ms trans-Atlantic latencies.
(1) non-interleaved (master)
(2) interleaved (master)
(3) non-interleaved (patch)
(4) interleaved (patch)
Summary of Case 4
Perhaps with shorter/smaller queries we can see more interesting results.
Case 5 (Multiple range + 160ms latency)
Simulate 160ms South East Asia to US latencies.
(1) non-interleaved (master)
(2) interleaved (master)
(3) non-interleaved (patch)
(4) interleaved (patch)
Summary of Case 5
Future Work
The text was updated successfully, but these errors were encountered: