Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql+storage: subsource dependency inversion v2 #26556

Merged

Conversation

sploiselle
Copy link
Contributor

@sploiselle sploiselle commented Apr 10, 2024

Motivation

This PR adds a known-desirable feature. Progresses #24773
This PR fixes a known bug. Fixes #26465

Tips for reviewer

This PR un-inverts subsource dependencies (for data-bearing subsources; progress subsources remain untouched). The gist of how we do this is whenever we add a subsource to a source:

  1. Modify the IngestionDescription's SourceDesc to contain a reference to the subsource (e.g. adding it to the PG publication tables and generating its table casts)
  2. Modify the ingestion's source_exports to contain the association betwen with GlobalId and the output_index.

This is essentially the same process we underwent before, but we now do this in response to a create_collection whose DataSource is DataSource::IngestionExport (a new enum which describes subsources).

If we are adding a subsource via ALTER SOURCE...ADD SUBSOURCE, we will also reconcile the source's state such that it contains references only to tables referenced by existing subsources.

When dropping subsources, we will now just drop the source via DROP SOURCE (i.e. ALTER SOURCE...DROP SUBSOURCE goes away). When dropping sources, we detect if they're DataSource::IngestionExport and if so, we remove the source_export and re-render the ingestion.

This has the slightly annoying side effect that we have to do a bunch of work to produce roundtrippable SHOW CREATE SOURCE statements, but all of that work is less code than ALTER SOURCE...DROP SUBSOURCE was, so it's a win on the balance.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • Let users remove subsources from PostgreSQL and MySQL sources using DROP SOURCE. We previously required the command ALTER SOURCE...DROP SUBSOURCE or ALTER SOURCE...DROP TABLE; however, those commands have been deprecated in favor of DROP SOURCE.
    • Require users to specify CASCADE to drop PostgreSQL and MySQL sources with active subsources (assuming they want to drop the subsources, as well). Previously, dropping a PostgreSQL or MySQL source automatically dropped all of its subsources.
    • Allow ALTER...OWNER on a primary source's subsources, i.e. a source and its subsources may now have different owners.

@sploiselle sploiselle requested a review from a team April 10, 2024 06:37
@sploiselle sploiselle requested a review from a team as a code owner April 10, 2024 06:37
@sploiselle sploiselle requested a review from a team April 10, 2024 06:37
@sploiselle sploiselle requested review from a team and morsapaes as code owners April 10, 2024 06:37
@sploiselle sploiselle marked this pull request as draft April 10, 2024 06:37
Copy link

shepherdlybot bot commented Apr 10, 2024

Risk Score:83 / 100 Bug Hotspots:11 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability 🔍 Detected
  • (Required) QA Review
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The risk score for this pull request is 83, which falls into the high-risk category. This is due, in part, to the average line count and the number of executable lines within files modified. There are also 11 files that have been recently prone to bugs. While the repository's observed bug trend is decreasing, the predicted trend is on the rise. It's important to note that, historically, pull requests with these characteristics are 122% more likely to introduce bugs compared to the repository's baseline.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/catalog.rs 98
../coord/message_handler.rs 95
../src/rbac.rs 93
../src/pure.rs 98
../statement/ddl.rs 95
../src/parser.rs 98
../catalog/state.rs 92
../src/coord.rs 100
../coord/ddl.rs 94
../src/lib.rs 99
../plan/error.rs 92

@def- def- self-requested a review April 10, 2024 06:59
@sploiselle sploiselle marked this pull request as ready for review April 10, 2024 12:19
@sploiselle sploiselle changed the title Source export unresolveitemname sql+storage: subsource dependency inversion v2 Apr 10, 2024
Copy link
Contributor

@rjobanp rjobanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got through the first bits of the change (SourceExport structure, purification of create / alter source) - I like how much simpler a lot of the purification logic is!

src/storage-client/src/controller.rs Outdated Show resolved Hide resolved
src/storage-client/src/controller.rs Outdated Show resolved Hide resolved
src/storage-controller/src/lib.rs Show resolved Hide resolved
src/storage-controller/src/lib.rs Outdated Show resolved Hide resolved
src/storage-controller/src/lib.rs Outdated Show resolved Hide resolved
src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved
src/sql/src/pure.rs Outdated Show resolved Hide resolved
src/sql/src/pure.rs Outdated Show resolved Hide resolved
src/sql/src/pure.rs Outdated Show resolved Hide resolved
@jkosh44 jkosh44 mentioned this pull request Apr 10, 2024
5 tasks
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get through the entire PR yet but wanted to leave some feedback. It would be super useful if we could chat, just tossed some time on your calendar!

src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Show resolved Hide resolved
src/adapter/src/catalog/open.rs Outdated Show resolved Hide resolved
src/adapter/src/catalog/state.rs Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner.rs Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner.rs Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/sequencer/inner.rs Outdated Show resolved Hide resolved
src/catalog/protos/objects.proto Show resolved Hide resolved
@sploiselle
Copy link
Contributor Author

@rjobanp @ParkMyCar tftr––I gauchely left the comments replying to your feedback without pushing the changes yet. Was afraid of my browser losing the feedback before I could rebase and push everything up. Apologies! Working on getting everything addressed today.

@sploiselle sploiselle force-pushed the source-export-unresolveitemname branch 2 times, most recently from b75f323 to 43f0f20 Compare April 12, 2024 00:06
@sploiselle
Copy link
Contributor Author

@rjobanp @ParkMyCar Implemented everything y'all suggested.

@ParkMyCar I amended the migration to update comment GlobalIds, as well. If I can get an explicit approval from you on everything that would be great. I won't merge until the storage team also concludes they're happy with the work.

@MaterializeInc/storage ready for another review. I am OOO for the next week, though, so no rush but I need the feedback in by April 19 so I can address and merge when I return.

Copy link
Contributor

@petrosagg petrosagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a summary of my review since I will be out for two weeks while this is being worked on.

I went over all the code and feel like I have a good grasp of what's going on for most of it. That said, the big changes in lib.rs in the controller and the the new way sources with subsources are planned is a little less clear to me. This is a very big PR and changes the code in fundamental ways so I would want to see some long text somewhere (PR description, a markdown doc, a comment) that explains how everything works. Something that goes over what happens to a source statement as it is typed by the user, goes through purification, gets planned, sequenced etc. There are a lot of transformations along the way that are only evident by intense studying of the code. The same request applies for an ALTER SOURCE statement. We now have a SET ADD SUBSOURCE internal statement that it is still unclear to me what it does.

Beyond that, we definitely want to create a tracking issue that contains all the todo items that are to be performed after this gets released. I tried to mark the ones I saw with comments but I may have missed some. If we don't create a tracking issue we will probably forget/miss them.

Unfortunately I can't give an approved review at the current state of the PR as some of the comments need to be worked on. So if merging this while I'm away becomes a pressing priority it will have to be at Sean's discretion.

test/testdrive/statistics-deletion.td Outdated Show resolved Hide resolved
test/testdrive/mz-depends.td Show resolved Hide resolved
test/sqllogictest/object_ownership.slt Show resolved Hide resolved
test/restart/mzcompose.py Show resolved Hide resolved
test/restart/mzcompose.py Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/storage-controller/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a SLT or TD test that creates a source with subsources and then selects from mz_objects and asserts their global ids? The goal being we can assert GlobalIds for subsources are indeed in sorted order

src/adapter/src/catalog/migrate.rs Show resolved Hide resolved
src/adapter/src/catalog/migrate.rs Outdated Show resolved Hide resolved
src/adapter/src/coord/ddl.rs Outdated Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Outdated Show resolved Hide resolved
src/sql-parser/src/ast/defs/statement.rs Show resolved Hide resolved
src/sql-parser/tests/testdata/create Show resolved Hide resolved
src/sql/src/plan/error.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/ddl.rs Outdated Show resolved Hide resolved
src/sql/src/plan/statement/show.rs Show resolved Hide resolved
test/sqllogictest/object_ownership.slt Show resolved Hide resolved
Sean Loiselle and others added 14 commits April 26, 2024 10:46
The adapter previously called create_collections once per source
when creating sources. However, the new subsource structure is
more amenable to just creating all sources at once.

For instance, each call to create_collection that creates a
subsource modifies the source's definition and requests that it's
rescheduled. Doing this once for each subsource is wasteful when
we know that before we will let the source run at a steady state
we plan to repeatedly interrupt it.

Note that this is only a performance optimization and does not
have any semantic implications.
sploiselle pushed a commit to sploiselle/materialize that referenced this pull request Apr 26, 2024
@sploiselle sploiselle force-pushed the source-export-unresolveitemname branch from 9cf3daa to 2894d02 Compare April 26, 2024 14:49
@sploiselle
Copy link
Contributor Author

@RobinClowers This change is ready to merge––any extra coordination we need on the frontend?

@RobinClowers
Copy link
Contributor

@sploiselle Great! I have a draft PR that handles it, so I'll just need to update it to target the correct release. My understanding is that we will cut v0.98.0 next week, and presumably this will be in it? Feel free to merge, I'll make sure to get my PR merged early next week so we are ready for it.

@sploiselle sploiselle force-pushed the source-export-unresolveitemname branch from 2894d02 to 1818e3a Compare April 26, 2024 18:12
Sean Loiselle and others added 6 commits April 26, 2024 15:52
@sploiselle sploiselle force-pushed the source-export-unresolveitemname branch from 1818e3a to 9b3ed03 Compare April 26, 2024 19:52
@sploiselle sploiselle merged commit 65c8685 into MaterializeInc:main Apr 26, 2024
83 checks passed
@sploiselle sploiselle deleted the source-export-unresolveitemname branch April 26, 2024 20:38
@sploiselle sploiselle mentioned this pull request May 10, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/sources/pg: allow uncastable types in publication if they are not referenced
8 participants