-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Optimize query plan in iter_filtered_chunks #939
Conversation
spacetimedb_vm::dsl::query(schema.as_ref()).with_select(filter_to_column_op(&schema.table_name, filter)); | ||
let query = spacetimedb_vm::dsl::query(schema.as_ref()) | ||
.with_select(filter_to_column_op(&schema.table_name, filter)) | ||
.optimize(&|table_id, table_name| stdb.row_count(table_id, table_name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent.
crates/sats/src/relation.rs
Outdated
.or_else(|| { | ||
self.fields | ||
.get(*field) | ||
.filter(|c| c.field.table() == table) | ||
.map(|_| *field) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could trigger the or_else
scenario? It should not happened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of this method hasn't actually changed other than checking against the table name (which is why one test case had to change). If you look at the old pub fn column(...)
implementation below, it was just scanning self.fields
for an equal FieldName
, but this came up surprisingly empty. So the || *field == pos
bit was needed to make it work. The reason it was required is that the header stores field names but the query stores positions.
2024-03-07T18:52:35.195880Z ERROR crates/vm/src/expr.rs:891: select_best_index, header
= Header { table_name: "LocationState", fields: [
Column { field: Name { table: "LocationState", field: "entity_id" }, algebraic_type: Builtin(U64), col_id: ColId(0) },
Column { field: Name { table: "LocationState", field: "chunk_index" }, algebraic_type: Builtin(U64), col_id: ColId(1) },
Column { field: Name { table: "LocationState", field: "x" }, algebraic_type: Builtin(I32), col_id: ColId(2) },
Column { field: Name { table: "LocationState", field: "z" }, algebraic_type: Builtin(I32), col_id: ColId(3) },
Column { field: Name { table: "LocationState", field: "dimension" }, algebraic_type: Builtin(U32), col_id: ColId(4) }],
constraints: [([ColId(0)], Constraints { attr: ColumnAttribute(INDEXED | UNIQUE | PRIMARY_KEY) }), ([ColId(1)], Constraints { attr: ColumnAttribute(INDEXED) }), ([ColId(2), ColId(3), ColId(4)], Constraints { attr: ColumnAttribute(INDEXED) })] }
2024-03-07T18:52:35.195907Z ERROR crates/vm/src/expr.rs:892: select_best_index, op
= Cmp { op: Logic(And), lhs: Cmp { op: Logic(And), lhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 2 })), rhs: Field(Value(I32(80))) }, rhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 3 })), rhs: Field(Value(I32(80))) } }, rhs: Cmp { op: Cmp(Eq), lhs: Field(Name(Pos { table: "LocationState", field: 4 })), rhs: Field(Value(U32(80))) } }
crates/vm/src/expr.rs
Outdated
/// Flattens a nested conjunction of AND expressions. | ||
/// | ||
/// For example, `a = 1 AND b = 2 AND c = 3` becomes `[a = 1, b = 2, c = 3]`. | ||
/// | ||
/// This helps with splitting the kinds of `queries`, | ||
/// that *could* be answered by a `index`, | ||
/// from the ones that need to be executed with a `scan`. | ||
pub fn flatten_ands_ref(&self) -> ColumnOpRefFlat<'_> { | ||
fn fill_vec<'a>(buf: &mut ColumnOpRefFlat<'a>, op: &'a ColumnOp) { | ||
match op { | ||
ColumnOp::Cmp { | ||
op: OpQuery::Logic(OpLogic::And), | ||
lhs, | ||
rhs, | ||
} => { | ||
fill_vec(buf, lhs); | ||
fill_vec(buf, rhs); | ||
} | ||
op => buf.push(op), | ||
} | ||
} | ||
let mut buf = SmallVec::new(); | ||
fill_vec(&mut buf, self); | ||
buf | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined into extract_fields
, thereby avoiding a potential allocation.
fb0be32
to
95808f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, I'd just prefer if we didn't include the module since it's not a proper test just yet.
211be03
to
8e33bfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've posted the results in discord - this is a substantial improvement against the master baseline 👍
8e33bfb
to
ea76fb8
Compare
query!(...)
: Fix use of multi-col indicesea76fb8
to
c23dfcb
Compare
Description of Changes
There were two bugs that prevented us from using multi-col indexes in
query!
.None of these were in
find_sargable
, rather, they were in:InstanceEnv::iter_filtered_chunks
: it didn't callq.optimize(...)
(so we may potentially see other improvements now).Header::column
would not find theColumn
due toFieldName::Pos
being passed.I've verified that the query ends up being optimized to a
Query::IndexScan
and that this actually reaches the index with the right name inBTreeIndex::seek
.API and ABI breaking changes
None
Expected complexity level and risk
2 - There are other users of
Header::column_pos_idx
.