-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(813): Avoid materialization of product values in subscriptions #959
Conversation
e963a3b
to
faef813
Compare
Here are the numbers I get compared to latest master:
I haven't reviewed this patch yet, so I don't know where the regression in |
faef813
to
aa04dbe
Compare
e4d04bf
to
c693c13
Compare
aa04dbe
to
62f5534
Compare
Master benchmark on 14900k:
crntril/borrow-eq:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance numbers are good, so I would like to merge this as soon as possible.
However I'm particularly concerned about the extra lifetime that was added to the datastore traits. I'm totally fine with it, if the perf justifies it. But I am not aware of it being an issue, and so I really think we should revert those changes.
crates/core/src/db/cursor.rs
Outdated
pub struct IndexCursor<'a, R: RangeBounds<AlgebraicValue>> { | ||
pub table: DbTable, | ||
pub iter: IterByColRange<'a, R>, | ||
pub struct IndexCursor<'a, 'c, R: RangeBounds<AlgebraicValue>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know by how much this improves performance?
@@ -71,13 +71,13 @@ impl StateView for CommittedState { | |||
/// Returns an iterator, | |||
/// yielding every row in the table identified by `table_id`, | |||
/// where the values of `cols` are contained in `range`. | |||
fn iter_by_col_range<'a, R: RangeBounds<AlgebraicValue>>( | |||
fn iter_by_col_range<'a, 'c, R: RangeBounds<AlgebraicValue>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here?
lhs: DatabaseTableUpdate, | ||
) -> Result<impl Iterator<Item = ProductValue>, DBError> { | ||
) -> Result<(usize, impl 'a + Iterator<Item = ResPV>), DBError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usize for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usize
is the size estimate that RelOps
has. It doesn't exactly fit the size_hint
method of the Iterator
trait, so it's propagated here to be used in with_capacity
.
join_2 | ||
.into_iter() | ||
.chain(join_4) | ||
.chain(join_6) | ||
.map(TableOp::delete) | ||
.chain(join_1.into_iter().chain(join_5).map(TableOp::insert)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I find this easier to rectify with the above formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean; the reason this now has to use .push
and why we cannot .chain
is because we don't store temporary Vec
s. It isn't pretty, but temporary allocations seem wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What temporary allocations exactly? These are iterators so we should be able to chain them still right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are iterators over Result<PV, ErrorVM>
whereas the other ones are iterators over PV
.
let mut updates = | ||
Vec::with_capacity(join_2.len() + join_4_estimate + join_6.len() + join_1_estimate + join_5.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it, but I personally think it makes the code less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the two helper functions make the code slightly more readable but I don't disagree that this with_capacity
makes things less readable. Yet, we have these size estimates, so let's use them or lose them.
crates/core/src/vm.rs
Outdated
/// The column id for which the index is defined. | ||
pub index_col: ColId, | ||
/// The column ids for which the index is defined. | ||
pub index_cols: &'c ColList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support multi-column index joins, we need a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't; the field is always a singleton. This is just because a borrowed ColList
is needed now so we need to propagate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crates/vm/src/expr.rs
Outdated
@@ -602,7 +602,9 @@ impl IndexJoin { | |||
// In other words, when an index join has two delta tables. | |||
pub fn to_inner_join(self) -> QueryExpr { | |||
if self.return_index_rows { | |||
let col_lhs = self.index_side.head().fields[usize::from(self.index_col)].field.clone(); | |||
let col_lhs = self.index_side.head().fields[self.index_cols.head().idx()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a potential correctness issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field is a singleton list so it's semantically equivalent.
I, too, am a bit concerned about modifying the datastore trait in this way, if only because it may have implications on MVCC which I need to be aware of. Could you please make a post here (or better yet a comment in the codebase) which explains the purpose of this lifetime and why it is necessary? |
I'd really like for these two tickets to get separate PRs so we can measure the performance impact of the solutions separately. |
c693c13
to
45a3893
Compare
1772567
to
d7ad345
Compare
45a3893
to
b91049e
Compare
0d3f1c8
to
9154d83
Compare
9154d83
to
7f4cb81
Compare
|
Closes #813. A subscription will no longer materialize product values, for queries with read-only row operations. but instead it will serialize from bflatn straight to bsatn. Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>
7f4cb81
to
8f4058c
Compare
Description of Changes
API and ABI breaking changes
None
Expected complexity level and risk
2