-
Notifications
You must be signed in to change notification settings - Fork 474
Update perf best practices with interleave caveats #4273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -48,6 +48,8 @@ When a table is created, all columns are stored as a single column family. This | |||
|
|||
[Interleaving tables](interleave-in-parent.html) improves query performance by optimizing the key-value structure of closely related tables, attempting to keep data on the same key-value range if it's likely to be read and written together. This is particularly helpful if the tables are frequently joined on the columns that consist of the interleaving relationship. | |||
|
|||
However, the above is only true for tables where all operations (i.e., [`SELECT`](selection-queries.html), [`INSERT`](insert.html), [`UPDATE`](update.html), and [`DELETE`](delete.html)) are performed on a single value shared between both tables. After interleaving, operations that span multiple values, or that do not specify the interleaved parent ID, will be significantly slower than they were prior to interleaving the tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write (e.g., SELECT or INSERT)
instead of trying to list all of them. As it is now you're forgetting half a dozen already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing now, thanks.
Follow-up question: Does this information also apply to the following versions?
- 2.0.*
- 2.2
If so I'll add this information there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated this PR so this info is added to the 2.1 and 2.2 docs. Raphael, if you confirm it applies to 2.0, I'll add it there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I like this better.
But what of this phrasing "significantly slower"?
What does the word "significantly" mean? Significant according to which standard? Compared to what?
I think it may be useful to explain a little more: when the data is interleaved, queries that work on the parent table(s) will need to "skip over" the data in interleave children. This increases the read and write latencies to the parent in proportion to the number of interleaved values.
(Conversely: if there are only few interleaved values, the performance will not be "significantly" slower.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca689c8
to
3e522c8
Compare
3e522c8
to
20a3bb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @jseldess, @knz, and @rmloveland)
v2.1/performance-best-practices-overview.md, line 51 at r2 (raw file):
[Interleaving tables](interleave-in-parent.html) improves query performance by optimizing the key-value structure of closely related tables, attempting to keep data on the same key-value range if it's likely to be read and written together. This is particularly helpful if the tables are frequently joined on the columns that consist of the interleaving relationship. However, the above is only true for tables where all operations (e.g., [`SELECT`](selection-queries.html) or [`INSERT`](insert.html)) are performed on a single value shared between both tables. The following types of operations may actually become slower after interleaving:
nit: Use only one space after periods.
20a3bb0
to
fe267ac
Compare
Fixes #3397.