-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/colexec/colbuilder: consider pooling render operators #88525
Comments
Here's a fresh profile. I'm worried I let the server keep running without load for too long such that actually these allocations are bogus 😓 |
The first profile does look suspicious, the second one not as much but still. Marking as a GA-blocker since I believe there are some regressions in allocations for simple cases due to #83689. |
@ajwerner do you remember what tpcc parameters you used? |
88530: colbuilder: avoid some allocations r=yuzefovich a=yuzefovich This commit removes some of the allocations for errors that are returned when we need to fallback to wrapping row-by-row processors. ``` name old time/op new time/op delta RenderPlanning/rows=1/renders=1-24 386µs ± 2% 391µs ± 2% ~ (p=0.143 n=10+10) RenderPlanning/rows=1/renders=16-24 624µs ± 1% 604µs ± 1% -3.24% (p=0.000 n=10+10) RenderPlanning/rows=1/renders=256-24 3.10ms ± 1% 3.06ms ± 1% -1.22% (p=0.000 n=10+10) RenderPlanning/rows=1/renders=4096-24 61.1ms ± 1% 60.8ms ± 1% -0.60% (p=0.011 n=9+9) RenderPlanning/rows=8/renders=1-24 437µs ± 1% 435µs ± 2% ~ (p=0.549 n=10+9) RenderPlanning/rows=8/renders=16-24 927µs ± 1% 903µs ± 1% -2.59% (p=0.000 n=10+10) RenderPlanning/rows=8/renders=256-24 7.14ms ± 1% 7.11ms ± 1% -0.48% (p=0.043 n=9+10) RenderPlanning/rows=8/renders=4096-24 127ms ± 2% 127ms ± 2% ~ (p=0.436 n=10+10) RenderPlanning/rows=64/renders=1-24 641µs ± 2% 636µs ± 1% ~ (p=0.165 n=10+10) RenderPlanning/rows=64/renders=16-24 3.01ms ± 1% 3.02ms ± 1% ~ (p=0.143 n=10+10) RenderPlanning/rows=64/renders=256-24 39.1ms ± 1% 39.1ms ± 1% ~ (p=0.796 n=10+10) RenderPlanning/rows=64/renders=4096-24 667ms ± 2% 667ms ± 2% ~ (p=0.912 n=10+10) RenderPlanning/rows=512/renders=1-24 1.94ms ± 2% 1.97ms ± 1% +1.70% (p=0.002 n=10+9) RenderPlanning/rows=512/renders=16-24 19.7ms ± 1% 19.7ms ± 1% ~ (p=0.631 n=10+10) RenderPlanning/rows=512/renders=256-24 319ms ± 1% 318ms ± 1% ~ (p=0.393 n=10+10) RenderPlanning/rows=512/renders=4096-24 5.46s ± 1% 5.43s ± 1% -0.58% (p=0.035 n=10+9) ``` Informs: #88525. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Alright, I'm not sure how the first profile was obtained - I tried running TPCC and then also left the cluster idle, but I don't see anything similar. In particular, large allocations for I did find multiple improvements though, so thanks for filing the issue! |
Feel free to close this whenever you feel done with it. Thanks for chasing! |
I'm going to remove the release-blocker label for now. Feel free to close if you see fit @yuzefovich. |
There is a perf / allocations regression due to #83689: namely that if we wrap a row-by-row processor into the vectorized flow but try to handle the render expressions through the columnar operators, yet we fail to do that (e.g. due to an unsupported expression), we will create an additional row-by-row processor for that rendering. Before #83689 we would create a single row-by-row processor, but now we will have two, which can increase the number of allocations. Ideally we would check whether native vectorized rendering is possible before splitting out the rendering as done in #83689, but it's not that simple. So instead I have some improvements in #88608 that should alleviate (at least partially) the increase in allocations. Many queries could be affected (for example, mutations with RETURNING clause with some rendering in that clause). What's worse is that due to #85822 even for the supported render expressions we can actually fallback to handling them via the row-by-row processor due to #85822, so I still consider it a GA-blocker (not a beta blocker). |
88582: vendor: bump Pebble to 829c25fa5db3 r=nicktrav a=jbowens ``` 829c25fa db: fix RangeKeyChanged and -WithLimit interaction 0f464e9c sstable: fix interaction between bpf and monotonic bounds optimization 46a6a539 sstable: remove commented code 9a552270 db: fix TestRangeKeyMaskingRandomized 8418ebc2 build: bump min go version to 1.17 5f8eb821 *: remove references to `ioutil` 05a4b29c db: expand iter_histories test coverage 2855ba7c db: refactor TestRangeKeys into TestIterHistories ``` Release note: None 88615: bench: add a benchmark for update with assigment casts r=yuzefovich a=yuzefovich Informs: #88525. Release note: None 88673: stats: fix missing autostats on cluster startup r=msirek a=msirek Fixes #87247 Previously, automatic statistics may fail to be collected on tables with no stats at cluster startup when the `sql.stats.automatic_collection.enabled` cluster setting is false but table storage parameter `sql_stats_automatic_collection_enabled` is true. Method `ensureAllTables` does not put entries into the settingOverrides map, and since storage parameter info is not normally looked up in the table header in auto stats processing, we fall back on the cluster setting to determine if auto stats should be collected. To address this, this patch modifies auto stats to flag whether the current batch of tables saved in the mutationCounts map comes from cluster startup processing, and if so, ensures that storage parameters controlling auto stats are always looked up in the table header during that processing. Release note (bug fix): This patch fixes missing automatic statistics collection at cluster startup when the `sql.stats.automatic_collection.enabled` cluster setting is false, but there are tables with storage parameter `sql_stats_automatic_collection_enabled` set to true. 88718: storage: add log scoping to MVCC benchmarks r=erikgrinaker a=jbowens This prevents logs from being interleaved within benchmark output when using `--stream-output` with `dev bench`. Since the Pebble logs are chatty, this might have a non-neglible impact on the benchmark performance. I'll backport this to prior releases so that comparisons between the HEADs of the various release branches are comparing apples to apples. Release note: None 88724: ui: fix ui on dbconsole table detail page with very long index name r=amyyq2 a=amyyq2 Previously, when there was a very long index name, it would run off the edge of the summary card and cause the Index Stats table to scroll for a long time. This change wraps the index name text so the user can see the entirety of the name on the summary card and in the table. This change also fixed the width of the Index Stats table to be aligned with the other summary cards on the page. <img width="1203" alt="cluster-ui" src="https://user-images.githubusercontent.com/54999459/192319099-71c7734d-3ed1-4ee0-b6ee-568bd46f0dba.png"> Partially Fixes #86559 Release justification: Category 2: Bug fixes and low-risk updates to new functionality Release note: None 88730: cli: deprecate `debug unsafe-remove-dead-replicas` r=aliher1911 a=erikgrinaker This patch marks `unsafe-remove-dead-replicas` as deprecated, and refers users to the new `recover` commands. It will be removed in v23.1. Resolves #86543. Release note (cli change): The `debug unsafe-remove-dead-replicas` CLI command has been deprecated, and will be removed in v23.1. Users should use the new `debug recover` set of commands instead. Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Amy Qian <amy.qian@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Alright, I'm removing the blocker label since my concern about the impact of #85822 on mutation statements was incorrect. In particular, the logic from #85822 would only kick in if we have an "estimated row count" during the vectorized planning, and we only propagate that information from the optimizer for the table readers. |
88076: admission: Split stats for WorkQueueMetrics r=irfansharif a=andrewbaptist This commit splits the WorkQueueMetric stats into priority. Because there are a lot of new stats that might be generated as a result of this splitting, only the "important" priorities are collected and considered for each queue. Release note: None 88289: cdc: avoid deadlock on error in pubsub sink r=HonoreDB a=HonoreDB #88130 introduced a deadlock when an attempt to create a topic fails -- the goroutine tries to acquire a lock in order to record the error, but it already has it in order to write to the map. This PR releases the lock while creating the topic, which should also help with performance a bit on startup. Fixes #85374 Release note (bug fix): Fixed a bug preventing pubsub changefeeds from retrying. 88491: opt: do not plan unnecessary paired semi- and anti- lookup joins r=mgartner a=mgartner This commit fixes an issue where the optimizer would plan a paired semi or anti lookup join in cases when a single lookup join would suffice. This only occurred in rare cases when the join filter contained a tautology or contradiction that could not be normalized to true or false in the canonical query plan, but could be eliminated from the filters when building a lookup join. If the tautology or contradiction referenced a column not covered by the lookup index, the optimizer mistakenly assumed that the index was not covering and planned a paired join. Now the optimizer can recognize that the index is actually covering, because the referenced column is not needed to evaluate the filters, and a single lookup join is planned. Fixes #87306 Release note (performance improvement): The optimizer now explores plans with a single lookup join expressions in rare cases where it previously planned two lookup join expressions. 88714: ttl: test multiple TTL storage param changes r=rafiss,ajwerner a=ecwall fixes #88560 Add tests to cover multiple changes to TTL storage params. Release note: None 88756: sql: only record index reads on non-internal sessions r=THardy98 a=THardy98 Resolves: #77598 Previously, explicit `CREATE INDEX` queries were causing unintended index reads in our `index_usage_statistics` subsystem. The unintended index reads were caused by an index validation query from an internal executor that did not use an internal planner. However, the internal executor does set its session data as internal. This change adds a check to only record index reads when the planner is not internal AND the session is not internal. Release note (bug fix): Fixed unintended recordings of index reads caused by internal executor/queries. 88773: sql: support `SHOW GRANTS` for functions r=chengxiong-ruan a=chengxiong-ruan Backport resolves #88495 Release note (sql change): Previously `SHOW GRANTS` only supports db, schema, table and types. This commit add supports for UDFs, so that `SHOW GRANTS` returns UDFs privileges infos, and statements like `SHOW GRANTS ON FUNCTION <udf name/signatures>` are now supported Full function signature must be provided if the function name is not unique. Release justification: low risk GA blocker. 88831: bench: add a benchmark for INSERT RETURNING r=yuzefovich a=yuzefovich Informs #88525. Release note: None Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com> Co-authored-by: Aaron Zinger <zinger@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Evan Wall <wall@cockroachlabs.com> Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Is your feature request related to a problem? Please describe.
Likely fallout from #83689, we now see 5+% of bytes allocated in tpcc due to these operators:
Describe the solution you'd like
We pool other operators, we could pool these.
Jira issue: CRDB-19943
The text was updated successfully, but these errors were encountered: