Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coord,persist: teach coord about persisted streams that are involved in sources #9403

Closed

Conversation

aljoscha
Copy link
Contributor

@aljoscha aljoscha commented Dec 2, 2021

Before, the coordinator would only know the "prefix" that was used to
form the names of persistent streams that are involved in, say, the
persistent Kafka source.

Now, we keep a mapping from a logical name to the physical stream name
in the coordinator, for all persisted streams that are involved in a
source.

This also changes what persist information is sent to workers. Before,
we would send a single persist name. Now, we send a new
SourcePersistDesc, which contains the mapping of streams. In future
commits, we will add more information to this.

We need to do this to allow the coordinator to:

  • determine a common seal timestamp when restarting
  • negotiate/figure out if a requested as_of matches the compaction
    frontier/since of a persisted source
  • compact persisted streams when necessary

We will do these additional changes in follow-up commits.

Tips for reviewer

The contentious change is how I manually serialize (and deserialize) a mapping of names to the one persist_name field that we store in the catalog. I'm doing this in order to not break that, but could be convinced to do a more elaborate thing.

Also, there are at least two TODO:...s in the changes that are questions, please take a look at them.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any
    user-facing behavior changes.

They are table-specific and we want to introduce a SourcePersistDetails.
…in sources

Before, the coordinator would only know the "prefix" that was used to
form the names of persistent streams that are involved in, say, the
persistent Kafka source.

Now, we keep a mapping from a _logical name_ to the physical stream name
in the coordinator, for all persisted streams that are involved in a
source.

This also changes what persist information is sent to workers. Before,
we would send a single persist name. Now, we send a new
`SourcePersistDesc`, which contains the mapping of streams. In future
commits, we will add more information to this.

We need to do this to allow the coordinator to:
 - determine a common seal timestamp when restarting
 - negotiate/figure out if a requested as_of matches the compaction
   frontier/since of a persisted source
 - compact persisted streams when necessary

We will do these additional changes in follow-up commits.
@aljoscha
Copy link
Contributor Author

aljoscha commented Dec 2, 2021

I think this breaks the builtin table, because we now render the persist_name differently, that is we print the mappings instead of a single name. I'll look into fixing that.

@aljoscha
Copy link
Contributor Author

aljoscha commented Dec 2, 2021

Also: Happy to change all the mentions of persisted stream to persisted collection.

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I don't love this, but I don't have enough background to have a specific counter-proposal. can we schedule a time to chat and whiteboard some other approaches?

@@ -197,14 +197,18 @@ impl PersisterWithConfig {
}
}

pub fn details(&self, id: GlobalId, pretty: &str) -> Result<Option<PersistDetails>, Error> {
pub fn details(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this table_details now (ditto for table_details_from_name). maybe for the stuff in catalog.rs too?

@aljoscha
Copy link
Contributor Author

aljoscha commented Dec 6, 2021

Closing in favour of #9449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants