-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql+storage: subsource dependency inversion v2 #26556
sql+storage: subsource dependency inversion v2 #26556
Conversation
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The risk score for this pull request is 83, which falls into the high-risk category. This is due, in part, to the average line count and the number of executable lines within files modified. There are also 11 files that have been recently prone to bugs. While the repository's observed bug trend is decreasing, the predicted trend is on the rise. It's important to note that, historically, pull requests with these characteristics are 122% more likely to introduce bugs compared to the repository's baseline. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just got through the first bits of the change (SourceExport structure, purification of create / alter source) - I like how much simpler a lot of the purification logic is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't get through the entire PR yet but wanted to leave some feedback. It would be super useful if we could chat, just tossed some time on your calendar!
@rjobanp @ParkMyCar tftr––I gauchely left the comments replying to your feedback without pushing the changes yet. Was afraid of my browser losing the feedback before I could rebase and push everything up. Apologies! Working on getting everything addressed today. |
b75f323
to
43f0f20
Compare
@rjobanp @ParkMyCar Implemented everything y'all suggested.
@ParkMyCar I amended the migration to update comment @MaterializeInc/storage ready for another review. I am OOO for the next week, though, so no rush but I need the feedback in by April 19 so I can address and merge when I return. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a summary of my review since I will be out for two weeks while this is being worked on.
I went over all the code and feel like I have a good grasp of what's going on for most of it. That said, the big changes in lib.rs
in the controller and the the new way sources with subsources are planned is a little less clear to me. This is a very big PR and changes the code in fundamental ways so I would want to see some long text somewhere (PR description, a markdown doc, a comment) that explains how everything works. Something that goes over what happens to a source statement as it is typed by the user, goes through purification, gets planned, sequenced etc. There are a lot of transformations along the way that are only evident by intense studying of the code. The same request applies for an ALTER SOURCE
statement. We now have a SET ADD SUBSOURCE
internal statement that it is still unclear to me what it does.
Beyond that, we definitely want to create a tracking issue that contains all the todo items that are to be performed after this gets released. I tried to mark the ones I saw with comments but I may have missed some. If we don't create a tracking issue we will probably forget/miss them.
Unfortunately I can't give an approved review at the current state of the PR as some of the comments need to be worked on. So if merging this while I'm away becomes a pressing priority it will have to be at Sean's discretion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a SLT or TD test that creates a source with subsources and then selects from mz_objects
and asserts their global ids? The goal being we can assert GlobalIds for subsources are indeed in sorted order
43f0f20
to
176b9ac
Compare
176b9ac
to
acbac79
Compare
The adapter previously called create_collections once per source when creating sources. However, the new subsource structure is more amenable to just creating all sources at once. For instance, each call to create_collection that creates a subsource modifies the source's definition and requests that it's rescheduled. Doing this once for each subsource is wasteful when we know that before we will let the source run at a steady state we plan to repeatedly interrupt it. Note that this is only a performance optimization and does not have any semantic implications.
9cf3daa
to
2894d02
Compare
@RobinClowers This change is ready to merge––any extra coordination we need on the frontend? |
@sploiselle Great! I have a draft PR that handles it, so I'll just need to update it to target the correct release. My understanding is that we will cut v0.98.0 next week, and presumably this will be in it? Feel free to merge, I'll make sure to get my PR merged early next week so we are ready for it. |
2894d02
to
1818e3a
Compare
The PgCdc check's dependency structure did not introduce any cross-source dependencies. Introducing cross-source dependencies, though, adds another dimension to the upgrade check, which would have caught some bugs in the subsource dependency inversion tests.
1818e3a
to
9b3ed03
Compare
Motivation
This PR adds a known-desirable feature. Progresses #24773
This PR fixes a known bug. Fixes #26465
Tips for reviewer
This PR un-inverts subsource dependencies (for data-bearing subsources; progress subsources remain untouched). The gist of how we do this is whenever we add a subsource to a source:
IngestionDescription
'sSourceDesc
to contain a reference to the subsource (e.g. adding it to the PG publication tables and generating its table casts)source_exports
to contain the association betwen withGlobalId
and theoutput_index
.This is essentially the same process we underwent before, but we now do this in response to a
create_collection
whoseDataSource
isDataSource::IngestionExport
(a new enum which describes subsources).If we are adding a subsource via
ALTER SOURCE...ADD SUBSOURCE
, we will also reconcile the source's state such that it contains references only to tables referenced by existing subsources.When dropping subsources, we will now just drop the source via
DROP SOURCE
(i.e.ALTER SOURCE...DROP SUBSOURCE
goes away). When dropping sources, we detect if they'reDataSource::IngestionExport
and if so, we remove thesource_export
and re-render the ingestion.This has the slightly annoying side effect that we have to do a bunch of work to produce roundtrippable
SHOW CREATE SOURCE
statements, but all of that work is less code thanALTER SOURCE...DROP SUBSOURCE
was, so it's a win on the balance.Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.DROP SOURCE
. We previously required the commandALTER SOURCE...DROP SUBSOURCE
orALTER SOURCE...DROP TABLE
; however, those commands have been deprecated in favor ofDROP SOURCE
.CASCADE
to drop PostgreSQL and MySQL sources with active subsources (assuming they want to drop the subsources, as well). Previously, dropping a PostgreSQL or MySQL source automatically dropped all of its subsources.ALTER...OWNER
on a primary source's subsources, i.e. a source and its subsources may now have different owners.