Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support precreated Kafka topics on locked-down brokers #8378

Closed
elindsey opened this issue Sep 21, 2021 · 5 comments
Closed

Support precreated Kafka topics on locked-down brokers #8378

elindsey opened this issue Sep 21, 2021 · 5 comments
Labels
A-STORAGE Topics related to the Storage layer C-feature Category: new feature or request

Comments

@elindsey
Copy link
Contributor

elindsey commented Sep 21, 2021

Some customers would like to run materialized in a configuration where topics are precreated in Kafka, the broker disallows materialized from creating new topics, and materialize's sinks are created with reuse_topic.

This is currently unsupported.
Our logic today is:

  1. Attempt to create topic
  2. If we receive a TopicAlreadyExists error, consider creation successful
  3. Do a metadata query to confirm that the created topic matches what we expect

In a scenario where the Kafka broker has disallowed topic creation, step 1 will result in a denied error - even if the topic exists.
Example:

ERROR: error registering kafka topic for sink: error creating topic foo-sink for sink: TopicAuthorizationFailed (Broker: Topic authorization failed)

Logic will need to be adjusted to something that both handles the topic creation race condition and supports locked-down brokers, such as:

  1. Metadata query to check for topic existence
  2. If doesn't exist, attempt to create topic
  3. If we receive a TopicAlreadyExists error, consider creation successful
  4. Do a metadata query to confirm that the created topic matches what we expect

It will be important to test this in a plausible user configuration to make sure that whatever we put in as that initial metadata query isn't also going to be ACL denied.

@elindsey elindsey added the C-feature Category: new feature or request label Sep 21, 2021
@elindsey
Copy link
Contributor Author

Note that there are two places in code where creation happens: register_kafka_topic in the sink_connector and create_topic in the kafka-util admin.rs.

As part of this task, we should refactor it a bit so this logic can live in one place. One option might be having the sink_connector use our kafka-utils.

@quodlibetor
Copy link
Contributor

we should probably have the same semantics for that as we end up implementing the sqs part of s3 sources with sqs notifications.

That is: similar syntax for using existing vs creating new, similar defaults, similar error and cleanup behavior.

@nmeagan11 nmeagan11 added this to Icebox in Storage (Old) Dec 13, 2021
cjubb39 added a commit that referenced this issue Jan 7, 2022
Simplify the places we create kafka topics to just one. I think this might potentially fix a latent bug because the way we created sink topics (and their corresponding consistency topics) may in some cases create the topic with the wrong number of partitions (see comment in ::kafka_util::admin::create_topic_helper explaining automatic topic creation)

Motivation
Seemed like a decent first step for #8378.
@ruf-io
Copy link
Contributor

ruf-io commented Mar 8, 2022

Heroku Kafka also works this way.

So, even when topics are pre-created by the user, attempting to create SINKs to Heroku Kafka fails at step (1) with the error:

ERROR:  error registering kafka topic for sink: Error creating topic highvalusers for sink: Admin operation error: TopicAuthorizationFailed (Broker: Topic authorization failed): Admin operation error: TopicAuthorizationFailed (Broker: Topic authorization failed)

Not too sure about the popularity of Kafka on Heroku, but just flagging it here.

@elindsey
Copy link
Contributor Author

Note - it’s possible that the metadata check in precreated topics should differ from the metadata query when mz creates them (for example, does the sink actually have a dependency on knowing the partition count for data topics or can it just use whatever topic is present however it’s configured).

@cjubb39 cjubb39 added the A-sink label Mar 30, 2022
@nmeagan11 nmeagan11 removed this from Icebox in Storage (Old) Aug 11, 2022
@nmeagan11 nmeagan11 added the A-STORAGE Topics related to the Storage layer label Aug 11, 2022
@sploiselle
Copy link
Contributor

Retiring old issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-STORAGE Topics related to the Storage layer C-feature Category: new feature or request
Projects
None yet
Development

No branches or pull requests

6 participants