Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Optimize query plan in iter_filtered_chunks #939

Merged
merged 1 commit into from
Mar 16, 2024

Conversation

Centril
Copy link
Contributor

@Centril Centril commented Mar 7, 2024

Description of Changes

There were two bugs that prevented us from using multi-col indexes in query!.
None of these were in find_sargable, rather, they were in:

  1. InstanceEnv::iter_filtered_chunks: it didn't call q.optimize(...) (so we may potentially see other improvements now).
  2. Header::column would not find the Column due to FieldName::Pos being passed.

I've verified that the query ends up being optimized to a Query::IndexScan and that this actually reaches the index with the right name in BTreeIndex::seek.

API and ABI breaking changes

None

Expected complexity level and risk

2 - There are other users of Header::column_pos_idx.

spacetimedb_vm::dsl::query(schema.as_ref()).with_select(filter_to_column_op(&schema.table_name, filter));
let query = spacetimedb_vm::dsl::query(schema.as_ref())
.with_select(filter_to_column_op(&schema.table_name, filter))
.optimize(&|table_id, table_name| stdb.row_count(table_id, table_name));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent.

Comment on lines 274 to 296
.or_else(|| {
self.fields
.get(*field)
.filter(|c| c.field.table() == table)
.map(|_| *field)
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could trigger the or_else scenario? It should not happened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of this method hasn't actually changed other than checking against the table name (which is why one test case had to change). If you look at the old pub fn column(...) implementation below, it was just scanning self.fields for an equal FieldName, but this came up surprisingly empty. So the || *field == pos bit was needed to make it work. The reason it was required is that the header stores field names but the query stores positions.

2024-03-07T18:52:35.195880Z ERROR crates/vm/src/expr.rs:891: select_best_index, header
= Header { table_name: "LocationState", fields: [
    Column { field: Name { table: "LocationState", field: "entity_id" }, algebraic_type: Builtin(U64), col_id: ColId(0) }, 
    Column { field: Name { table: "LocationState", field: "chunk_index" }, algebraic_type: Builtin(U64), col_id: ColId(1) },
    Column { field: Name { table: "LocationState", field: "x" }, algebraic_type: Builtin(I32), col_id: ColId(2) },
    Column { field: Name { table: "LocationState", field: "z" }, algebraic_type: Builtin(I32), col_id: ColId(3) },
    Column { field: Name { table: "LocationState", field: "dimension" }, algebraic_type: Builtin(U32), col_id: ColId(4) }], 
constraints: [([ColId(0)], Constraints { attr: ColumnAttribute(INDEXED | UNIQUE | PRIMARY_KEY) }), ([ColId(1)], Constraints { attr: ColumnAttribute(INDEXED) }), ([ColId(2), ColId(3), ColId(4)], Constraints { attr: ColumnAttribute(INDEXED) })] }

2024-03-07T18:52:35.195907Z ERROR crates/vm/src/expr.rs:892: select_best_index, op
= Cmp { op: Logic(And), lhs: Cmp { op: Logic(And), lhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 2 })), rhs: Field(Value(I32(80))) }, rhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 3 })), rhs: Field(Value(I32(80))) } }, rhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 4 })), rhs: Field(Value(U32(80))) } }

Comment on lines 149 to 226
/// Flattens a nested conjunction of AND expressions.
///
/// For example, `a = 1 AND b = 2 AND c = 3` becomes `[a = 1, b = 2, c = 3]`.
///
/// This helps with splitting the kinds of `queries`,
/// that *could* be answered by a `index`,
/// from the ones that need to be executed with a `scan`.
pub fn flatten_ands_ref(&self) -> ColumnOpRefFlat<'_> {
fn fill_vec<'a>(buf: &mut ColumnOpRefFlat<'a>, op: &'a ColumnOp) {
match op {
ColumnOp::Cmp {
op: OpQuery::Logic(OpLogic::And),
lhs,
rhs,
} => {
fill_vec(buf, lhs);
fill_vec(buf, rhs);
}
op => buf.push(op),
}
}
let mut buf = SmallVec::new();
fill_vec(&mut buf, self);
buf
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined into extract_fields, thereby avoiding a potential allocation.

@Centril Centril force-pushed the centril/fix-multi-col-index-query branch from fb0be32 to 95808f1 Compare March 7, 2024 12:18
@Centril Centril requested a review from mamcx March 7, 2024 12:41
@bfops bfops added include-in-perf-test test-with-bots Recommend to test under higher CCU and removed include-in-perf-test labels Mar 7, 2024
Copy link
Collaborator

@joshua-spacetime joshua-spacetime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I'd just prefer if we didn't include the module since it's not a proper test just yet.

@Centril Centril force-pushed the centril/fix-multi-col-index-query branch 3 times, most recently from 211be03 to 8e33bfb Compare March 15, 2024 01:08
@bfops bfops added release-any To be landed in any release window bugfix Fixes something that was expected to work differently include-in-perf-test labels Mar 15, 2024
Copy link
Collaborator

@jdetter jdetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted the results in discord - this is a substantial improvement against the master baseline 👍

@joshua-spacetime joshua-spacetime force-pushed the centril/fix-multi-col-index-query branch from 8e33bfb to ea76fb8 Compare March 15, 2024 23:44
@joshua-spacetime joshua-spacetime changed the title query!(...) : Fix use of multi-col indices fix: Optimize query plan in iter_filtered_chunks Mar 16, 2024
@joshua-spacetime joshua-spacetime force-pushed the centril/fix-multi-col-index-query branch from ea76fb8 to c23dfcb Compare March 16, 2024 07:01
@joshua-spacetime joshua-spacetime added this pull request to the merge queue Mar 16, 2024
Merged via the queue into master with commit 3f425a9 Mar 16, 2024
6 checks passed
@Centril Centril deleted the centril/fix-multi-col-index-query branch March 16, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes something that was expected to work differently release-any To be landed in any release window test-with-bots Recommend to test under higher CCU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants