New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coord,persist: teach coord about persisted streams that are involved in sources #9403
coord,persist: teach coord about persisted streams that are involved in sources #9403
Conversation
They are table-specific and we want to introduce a SourcePersistDetails.
…in sources Before, the coordinator would only know the "prefix" that was used to form the names of persistent streams that are involved in, say, the persistent Kafka source. Now, we keep a mapping from a _logical name_ to the physical stream name in the coordinator, for all persisted streams that are involved in a source. This also changes what persist information is sent to workers. Before, we would send a single persist name. Now, we send a new `SourcePersistDesc`, which contains the mapping of streams. In future commits, we will add more information to this. We need to do this to allow the coordinator to: - determine a common seal timestamp when restarting - negotiate/figure out if a requested as_of matches the compaction frontier/since of a persisted source - compact persisted streams when necessary We will do these additional changes in follow-up commits.
I think this breaks the builtin table, because we now render the |
Also: Happy to change all the mentions of persisted stream to persisted collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I don't love this, but I don't have enough background to have a specific counter-proposal. can we schedule a time to chat and whiteboard some other approaches?
@@ -197,14 +197,18 @@ impl PersisterWithConfig { | |||
} | |||
} | |||
|
|||
pub fn details(&self, id: GlobalId, pretty: &str) -> Result<Option<PersistDetails>, Error> { | |||
pub fn details( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's call this table_details now (ditto for table_details_from_name). maybe for the stuff in catalog.rs too?
Closing in favour of #9449 |
Before, the coordinator would only know the "prefix" that was used to
form the names of persistent streams that are involved in, say, the
persistent Kafka source.
Now, we keep a mapping from a logical name to the physical stream name
in the coordinator, for all persisted streams that are involved in a
source.
This also changes what persist information is sent to workers. Before,
we would send a single persist name. Now, we send a new
SourcePersistDesc
, which contains the mapping of streams. In futurecommits, we will add more information to this.
We need to do this to allow the coordinator to:
frontier/since of a persisted source
We will do these additional changes in follow-up commits.
Tips for reviewer
The contentious change is how I manually serialize (and deserialize) a mapping of names to the one
persist_name
field that we store in the catalog. I'm doing this in order to not break that, but could be convinced to do a more elaborate thing.Also, there are at least two
TODO:...
s in the changes that are questions, please take a look at them.Checklist
user-facing behavior changes.