adapter: Refactor how we store query plans in the catalog#35834
adapter: Refactor how we store query plans in the catalog#35834ggevay merged 3 commits intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
ebb0a8f to
b55299a
Compare
| /// reason we end up with two identical notices being dropped by the same | ||
| /// call, the result will contain only one instance of that notice. | ||
| #[mz_ore::instrument(level = "trace")] | ||
| pub fn drop_plans_and_metainfos( |
There was a problem hiding this comment.
(This is getting replaced by drop_optimizer_notices. Plans don't need separate dropping anymore.)
d2db776 to
374b9f8
Compare
mtabebe
left a comment
There was a problem hiding this comment.
This change makes sense to me, a few thoughts on testing:
- is there a test that would test behaviour of a drop with cascade?
- is there any test for drop after a restart, or migration (I care less about this, but I'm just curious)
| CatalogItem::Index(idx) => idx.optimized_plan = Some(Arc::new(plan)), | ||
| CatalogItem::MaterializedView(mv) => mv.optimized_plan = Some(Arc::new(plan)), | ||
| CatalogItem::ContinualTask(ct) => ct.optimized_plan = Some(Arc::new(plan)), | ||
| other => panic!("set_optimized_plan called on {:?}", other.typ()), |
There was a problem hiding this comment.
Would it be useful to also panic with the id?
There was a problem hiding this comment.
(Same comment for other places with the panic)
There was a problem hiding this comment.
Yes, thx, changed the 3 places
| /// Set the optimized plan for the item identified by `id`. | ||
| /// | ||
| /// # Panics | ||
| /// If the item is not an `Index`, `MaterializedView`, or |
There was a problem hiding this comment.
I'm curious if there is a way to enforce this outside of the match? Maybe it isn't necessary since you have follow up work planned?
There was a problem hiding this comment.
Well, I don't really see a way to enforce it outside at the moment. But yeah, some time later I'd like to make these plan fields non-optional, and just populate them already when we create the catalog item (mentioned also here), and then we'd just delete these setters.
|
|
||
| // Clean up plans and optimizer notices for items that | ||
| // were retracted but not replaced (i.e., truly dropped). | ||
| let dropped_entries: Vec<CatalogEntry> = retractions |
There was a problem hiding this comment.
Does this make dropping happen atomically with the catalog operation now?
There was a problem hiding this comment.
Yes, in the sense that it now happens inside the transact_inner call in transact, whereas before the PR it was happening after the tx.commit that is after the transact_inner call.
f846ee4 to
d22addd
Compare
Move optimized_plan, physical_plan, and dataflow_metainfo from the CatalogPlans side-car maps onto the actual catalog objects (Index, MaterializedView, ContinualTask), eliminating the CatalogPlans struct entirely. Key changes: - Add transient plan fields (Option, #[serde(skip)]) to Index, MaterializedView, and ContinualTask. These are recomputed during bootstrap and must be skipped to avoid Testdrive catalog consistency mismatches. - Add accessor methods on CatalogItem and item_mut() on CatalogEntry. - Move notices_by_dep_id and all plan setter/getter/drop methods into CatalogState, improving encapsulation and snapshot consistency. - Automate optimizer notice cleanup: apply_updates now collects truly-dropped entries after each timestamp group, cleans up notices_by_dep_id, and generates notice retraction builtin table updates inline. This eliminates the manual cleanup calls from Catalog::transact. - Introduce pack_optimizer_notice_updates (unresolved form for apply_updates); pack_optimizer_notices now delegates to it. - Update all construction sites to initialize plan fields as None; update apply_replacement to copy plan fields. Co-authored-by: Junie <junie@jetbrains.com>
d22addd to
5b2aabd
Compare
|
TFTR! I've added some tests, see last commit. |
5b2aabd to
d65857f
Compare
Since MaterializeInc#35834 the `optimized_plan`, `physical_plan`, and `dataflow_metainfo` fields live on the `CatalogItem` itself rather than in the separate `CatalogPlans` side table. However, `parse_item_inner` hardcoded `None` for those fields when reconstructing a `CatalogItem` from `create_sql`. A `RENAME` on a materialized view (or index / continual task) goes through the retract+add path in `apply_item_update`, which calls `deserialize_item` → `parse_item` → `parse_item_inner` with the previous `CatalogItem` as `previous_item`. The plan fields would be silently dropped, so a subsequent `EXPLAIN OPTIMIZED PLAN FOR MATERIALIZED VIEW ...` would fail with "cannot find dataflow metainformation for materialized view ... in catalog". Fix: carry the three plan fields from `previous_item` through to the newly-reconstructed item. This is done as a post-match stamp to avoid duplicating the logic into each of the three per-variant construction sites and to keep the change local. Also add a regression test to `test/sqllogictest/rename.slt` that exercises `ALTER MATERIALIZED VIEW ... RENAME TO ...` and `ALTER INDEX ... RENAME TO ...` followed by `EXPLAIN OPTIMIZED / PHYSICAL PLAN`. Co-authored-by: Junie <junie@jetbrains.com>
Since MaterializeInc#35834 the `optimized_plan`, `physical_plan`, and `dataflow_metainfo` fields live on the `CatalogItem` itself rather than in the separate `CatalogPlans` side table. However, `parse_item_inner` hardcoded `None` for those fields when reconstructing a `CatalogItem` from `create_sql`. A `RENAME` on a materialized view (or index / continual task) goes through the retract+add path in `apply_item_update`, which calls `deserialize_item` → `parse_item` → `parse_item_inner` with the previous `CatalogItem` as `previous_item`. The plan fields would be silently dropped, so a subsequent `EXPLAIN OPTIMIZED PLAN FOR MATERIALIZED VIEW ...` would fail with "cannot find dataflow metainformation for materialized view ... in catalog". Fix: carry the three plan fields from `previous_item` through to the newly-reconstructed item. This is done as a post-match stamp to avoid duplicating the logic into each of the three per-variant construction sites and to keep the change local. Also add a regression test to `test/sqllogictest/rename.slt` that exercises `ALTER MATERIALIZED VIEW ... RENAME TO ...` and `ALTER INDEX ... RENAME TO ...` followed by `EXPLAIN OPTIMIZED / PHYSICAL PLAN`. Co-authored-by: Junie <junie@jetbrains.com>
Since MaterializeInc#35834 the `optimized_plan`, `physical_plan`, and `dataflow_metainfo` fields live on the `CatalogItem` itself rather than in the separate `CatalogPlans` side table. However, `parse_item_inner` hardcoded `None` for those fields when reconstructing a `CatalogItem` from `create_sql`. A `RENAME` on a materialized view (or index / continual task) goes through the retract+add path in `apply_item_update`, which calls `deserialize_item` → `parse_item` → `parse_item_inner` with the previous `CatalogItem` as `previous_item`. The plan fields would be silently dropped, so a subsequent `EXPLAIN OPTIMIZED PLAN FOR MATERIALIZED VIEW ...` would fail with "cannot find dataflow metainformation for materialized view ... in catalog". Fix: carry the three plan fields from `previous_item` through to the newly-reconstructed item. This is done as a post-match stamp to avoid duplicating the logic into each of the three per-variant construction sites and to keep the change local. Also add a regression test to `test/sqllogictest/rename.slt` that exercises `ALTER MATERIALIZED VIEW ... RENAME TO ...` and `ALTER INDEX ... RENAME TO ...` followed by `EXPLAIN OPTIMIZED / PHYSICAL PLAN`. Co-authored-by: Junie <junie@jetbrains.com>
Fixes MaterializeInc/database-issues#11316 Since #35834 the `optimized_plan`, `physical_plan`, and `dataflow_metainfo` fields live on the `CatalogItem` itself rather than in the separate `CatalogPlans` side table. However, `parse_item_inner` hardcoded `None` for those fields when reconstructing a `CatalogItem` from `create_sql`. A `RENAME` on a materialized view (or index / continual task) goes through the retract+add path in `apply_item_update`, which calls `deserialize_item` → `parse_item` → `parse_item_inner` with the previous `CatalogItem` as `previous_item`. The plan fields would be silently dropped, so a subsequent `EXPLAIN OPTIMIZED PLAN FOR MATERIALIZED VIEW ...` would fail with "cannot find dataflow metainformation for materialized view ... in catalog". Fix: carry the three plan fields from `previous_item` through to the newly-reconstructed item. This is done as a post-match stamp to avoid duplicating the logic into each of the three per-variant construction sites and to keep the change local. (Note: I'm still planning to make the plans inside catalog items non-optional, as discussed [here](https://materializeinc.slack.com/archives/C08A62E0751/p1774627385747809?thread_ts=1774623679.171309&cid=C08A62E0751), so this is a temporary fix.) Co-authored-by: Junie <junie@jetbrains.com>
This is preparation for moving those catalog operations into the catalog implications framework that involve query plans, such as
CREATE MATERIALIZED VIEW.This PR solves a long-standing weirdness in our catalog: We used to have query plans and other metainfo in a separate part of the catalog (
CatalogPlans), instead of inside the catalog items. This happened only due to historical reasons. This PR fixes this.I recommend reviewing commit by commit.
The 1st is just some renaming to avoid confusion between locally optimized plans (with just
optimize_mir_localwithout view inlining) and fully (globally) optimized plans (that is, after view inlining, withoptimize_dataflow).The 2nd commit is the main thing, see commit msg for details.
(Note that even after this PR, we are still adding the plans in the side effect closure of the catalog transactions that adds the catalog items, e.g., in
create_materialized_view_finish. #35837 will be a follow-up PR, which will instead add the plans to thecatalog::Ops that comprise the catalog transactions.)Nightly: https://buildkite.com/materialize/nightly/builds/15973, but there is a staggering amount of unrelated redness, so it's hard to read.