perf(813): Avoid materialization of product values in subscriptions #959

Centril · 2024-03-12T20:35:16Z

Description of Changes

Fixes Query evaluators should return iterators #813.

Benchmarking full-scan: Collecting 100 samples in estimated 
full-scan               time:   [162.84 ms 164.37 ms 166.88 ms]
                        change: [-47.765% -47.089% -46.260%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking query-indexes-multi: Collecting 100 samples in 
query-indexes-multi     time:   [1.1162 µs 1.1202 µs 1.1274 µs]
                        change: [-6.4601% -5.3782% -4.0002%] (p = 0.00 < 0.05)
                        Performance has improved.

API and ABI breaking changes

None

Expected complexity level and risk

2

joshua-spacetime · 2024-03-12T23:56:30Z

Here are the numbers I get compared to latest master:

full-scan               time:   [82.200 ms 82.378 ms 82.581 ms]
                        change: [-16.491% -16.056% -15.598%] (p = 0.00 < 0.05)
                        Performance has improved.

full-join               time:   [236.58 µs 236.81 µs 237.07 µs]
                        change: [-8.2742% -8.0187% -7.7901%] (p = 0.00 < 0.05)
                        Performance has improved.

incr-select             time:   [666.93 ns 667.28 ns 667.64 ns]
                        change: [+53.585% +53.879% +54.170%] (p = 0.00 < 0.05)
                        Performance has regressed.

incr-join               time:   [3.6968 µs 3.7051 µs 3.7123 µs]
                        change: [-18.685% -18.431% -18.173%] (p = 0.00 < 0.05)
                        Performance has improved.

I haven't reviewed this patch yet, so I don't know where the regression in incr-select is coming from. I'll get back to you once I have more information. But pretty good numbers otherwise.

jdetter · 2024-03-13T23:45:20Z

Master benchmark on 14900k:

    Finished bench [optimized + debuginfo] target(s) in 0.15s
     Running benches/subscription.rs (/home/ubuntu/SpacetimeDB/target/release/deps/subscription-7c24f5921892496c)
Benchmarking full-scan: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 29.7s, or reduce sample count to 10.
full-scan               time:   [243.00 ms 243.98 ms 244.95 ms]

full-join               time:   [686.00 µs 686.16 µs 686.33 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

incr-select             time:   [481.07 ns 481.26 ns 481.45 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

incr-join               time:   [6.4075 µs 6.4135 µs 6.4199 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

query-indexes-multi     time:   [925.65 ns 925.95 ns 926.25 ns]

crntril/borrow-eq:

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.0s, or reduce sample count to 40.
full-scan               time:   [113.10 ms 113.30 ms 113.50 ms]
                        change: [-53.768% -53.563% -53.363%] (p = 0.00 < 0.05)
                        Performance has improved.

full-join               time:   [573.18 µs 573.49 µs 573.83 µs]
                        change: [-16.468% -16.422% -16.372%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

incr-select             time:   [455.01 ns 455.35 ns 455.72 ns]
                        change: [-5.4337% -5.3479% -5.2623%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

incr-join               time:   [5.0606 µs 5.0649 µs 5.0692 µs]
                        change: [-21.097% -21.008% -20.923%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

query-indexes-multi     time:   [843.18 ns 843.48 ns 843.79 ns]
                        change: [-8.9223% -8.8640% -8.8035%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

joshua-spacetime

The performance numbers are good, so I would like to merge this as soon as possible.

However I'm particularly concerned about the extra lifetime that was added to the datastore traits. I'm totally fine with it, if the perf justifies it. But I am not aware of it being an issue, and so I really think we should revert those changes.

joshua-spacetime · 2024-03-14T01:29:35Z

crates/core/src/db/cursor.rs

-pub struct IndexCursor<'a, R: RangeBounds<AlgebraicValue>> {
-    pub table: DbTable,
-    pub iter: IterByColRange<'a, R>,
+pub struct IndexCursor<'a, 'c, R: RangeBounds<AlgebraicValue>> {


Do you know by how much this improves performance?

joshua-spacetime · 2024-03-14T01:30:06Z

crates/core/src/db/datastore/locking_tx_datastore/committed_state.rs

@@ -71,13 +71,13 @@ impl StateView for CommittedState {
    /// Returns an iterator,
    /// yielding every row in the table identified by `table_id`,
    /// where the values of `cols` are contained in `range`.
-    fn iter_by_col_range<'a, R: RangeBounds<AlgebraicValue>>(
+    fn iter_by_col_range<'a, 'c, R: RangeBounds<AlgebraicValue>>(


Similarly here?

crates/core/src/host/instance_env.rs

joshua-spacetime · 2024-03-14T01:42:28Z

crates/core/src/subscription/subscription.rs

        lhs: DatabaseTableUpdate,
-    ) -> Result<impl Iterator<Item = ProductValue>, DBError> {
+    ) -> Result<(usize, impl 'a + Iterator<Item = ResPV>), DBError> {


What is the usize for?

The usize is the size estimate that RelOps has. It doesn't exactly fit the size_hint method of the Iterator trait, so it's propagated here to be used in with_capacity.

joshua-spacetime · 2024-03-14T01:45:56Z

crates/core/src/subscription/subscription.rs

-            join_2
-                .into_iter()
-                .chain(join_4)
-                .chain(join_6)
-                .map(TableOp::delete)
-                .chain(join_1.into_iter().chain(join_5).map(TableOp::insert)),


Personally I find this easier to rectify with the above formula.

Not sure what you mean; the reason this now has to use .push and why we cannot .chain is because we don't store temporary Vecs. It isn't pretty, but temporary allocations seem wasteful.

What temporary allocations exactly? These are iterators so we should be able to chain them still right?

They are iterators over Result<PV, ErrorVM> whereas the other ones are iterators over PV.

joshua-spacetime · 2024-03-14T01:48:21Z

crates/core/src/subscription/subscription.rs

+        let mut updates =
+            Vec::with_capacity(join_2.len() + join_4_estimate + join_6.len() + join_1_estimate + join_5.len());


I get it, but I personally think it makes the code less readable.

So I think the two helper functions make the code slightly more readable but I don't disagree that this with_capacity makes things less readable. Yet, we have these size estimates, so let's use them or lose them.

crates/core/src/vm.rs

joshua-spacetime · 2024-03-14T01:53:18Z

crates/core/src/vm.rs

-    /// The column id for which the index is defined.
-    pub index_col: ColId,
+    /// The column ids for which the index is defined.
+    pub index_cols: &'c ColList,


If we support multi-column index joins, we need a test for it.

We don't; the field is always a singleton. This is just because a borrowed ColList is needed now so we need to propagate it.

This is probably something I would revert. I think it's fine to keep changes related to #813 and #833 in this patch, but this change, that is, passing a borrowed ColList, if we are going to make it, probably should be in a separate patch.

joshua-spacetime · 2024-03-14T01:56:49Z

crates/vm/src/expr.rs

@@ -602,7 +602,9 @@ impl IndexJoin {
    // In other words, when an index join has two delta tables.
    pub fn to_inner_join(self) -> QueryExpr {
        if self.return_index_rows {
-            let col_lhs = self.index_side.head().fields[usize::from(self.index_col)].field.clone();
+            let col_lhs = self.index_side.head().fields[self.index_cols.head().idx()]


This seems like a potential correctness issue.

The field is a singleton list so it's semantically equivalent.

cloutiertyler · 2024-03-14T05:21:35Z

I, too, am a bit concerned about modifying the datastore trait in this way, if only because it may have implications on MVCC which I need to be aware of. Could you please make a post here (or better yet a comment in the codebase) which explains the purpose of this lifetime and why it is necessary?

gefjon · 2024-03-14T14:18:21Z

I'd really like for these two tickets to get separate PRs so we can measure the performance impact of the solutions separately.

joshua-spacetime · 2024-03-15T22:13:18Z

full-scan               time:   [84.373 ms 84.635 ms 84.909 ms]
                        change: [-16.825% -16.324% -15.837%] (p = 0.00 < 0.05)
                        Performance has improved.

full-join               time:   [241.59 µs 242.01 µs 242.56 µs]
                        change: [-13.306% -12.923% -12.537%] (p = 0.00 < 0.05)
                        Performance has improved.

Closes #813. A subscription will no longer materialize product values, for queries with read-only row operations. but instead it will serialize from bflatn straight to bsatn. Co-authored-by: Mazdak Farrokhzad <twingoow@gmail.com>

Centril force-pushed the centril/borrow-qe branch from e963a3b to faef813 Compare March 12, 2024 21:00

Centril requested a review from joshua-spacetime March 12, 2024 22:25

Centril changed the title ~~[WIP] Splitting this up~~ Don't clone QueryExpr; Make eval + iter_filtered_chunks avoid PVs; Related cleanup Mar 12, 2024

Centril changed the title ~~Don't clone QueryExpr; Make eval + iter_filtered_chunks avoid PVs; Related cleanup~~ Don't clone QueryExpr; Make eval + iter_filtered_chunks avoid PVs Mar 12, 2024

This was referenced Mar 13, 2024

perf(828): Compile selections ahead of time for incremental eval #963

Closed

perf(816): Compile inner joins ahead of time for incremental evaluation #964

Merged

Centril force-pushed the centril/borrow-qe branch from faef813 to aa04dbe Compare March 13, 2024 22:47

Centril force-pushed the centril/fix-933 branch from e4d04bf to c693c13 Compare March 13, 2024 22:47

Centril force-pushed the centril/borrow-qe branch from aa04dbe to 62f5534 Compare March 13, 2024 22:57

Centril requested a review from gefjon March 13, 2024 23:47

joshua-spacetime requested changes Mar 14, 2024

View reviewed changes

joshua-spacetime mentioned this pull request Mar 14, 2024

build_query, IndexScan, mem table: fix bug, avoid ColumnOp #960

Closed

Centril force-pushed the centril/fix-933 branch from c693c13 to 45a3893 Compare March 14, 2024 19:39

Centril force-pushed the centril/borrow-qe branch from 1772567 to d7ad345 Compare March 14, 2024 19:44

Centril added the abi-break A PR that makes an ABI breaking change label Mar 14, 2024

Centril force-pushed the centril/fix-933 branch from 45a3893 to b91049e Compare March 14, 2024 22:58

Centril removed the abi-break A PR that makes an ABI breaking change label Mar 14, 2024

Centril force-pushed the centril/borrow-qe branch 2 times, most recently from 0d3f1c8 to 9154d83 Compare March 14, 2024 23:45

Centril changed the title ~~Don't clone QueryExpr; Make eval + iter_filtered_chunks avoid PVs~~ Don't clone QueryExpr; Make eval avoid PVs Mar 14, 2024

joshua-spacetime force-pushed the centril/borrow-qe branch from 9154d83 to 7f4cb81 Compare March 15, 2024 22:05

joshua-spacetime changed the base branch from centril/fix-933 to master March 15, 2024 22:06

joshua-spacetime changed the title ~~Don't clone QueryExpr; Make eval avoid PVs~~ perf(813): Avoid materialization of product values in subscriptions Mar 15, 2024

joshua-spacetime approved these changes Mar 15, 2024

View reviewed changes

joshua-spacetime force-pushed the centril/borrow-qe branch from 7f4cb81 to 8f4058c Compare March 15, 2024 22:18

joshua-spacetime enabled auto-merge March 15, 2024 22:25

joshua-spacetime added this pull request to the merge queue Mar 15, 2024

Merged via the queue into master with commit 755457a Mar 15, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(813): Avoid materialization of product values in subscriptions #959

perf(813): Avoid materialization of product values in subscriptions #959

Centril commented Mar 12, 2024 •

edited by joshua-spacetime

Loading

joshua-spacetime commented Mar 12, 2024

jdetter commented Mar 13, 2024

joshua-spacetime left a comment

joshua-spacetime Mar 14, 2024

joshua-spacetime Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

joshua-spacetime Mar 14, 2024

joshua-spacetime Mar 14, 2024

Centril Mar 14, 2024

cloutiertyler commented Mar 14, 2024

gefjon commented Mar 14, 2024

joshua-spacetime commented Mar 15, 2024

		let mut updates =
		Vec::with_capacity(join_2.len() + join_4_estimate + join_6.len() + join_1_estimate + join_5.len());

perf(813): Avoid materialization of product values in subscriptions #959

perf(813): Avoid materialization of product values in subscriptions #959

Conversation

Centril commented Mar 12, 2024 • edited by joshua-spacetime Loading

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

joshua-spacetime commented Mar 12, 2024

jdetter commented Mar 13, 2024

joshua-spacetime left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloutiertyler commented Mar 14, 2024

gefjon commented Mar 14, 2024

joshua-spacetime commented Mar 15, 2024

Centril commented Mar 12, 2024 •

edited by joshua-spacetime

Loading