Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design: Add a design doc for exposing source keys to dataflow #6661

Merged

Conversation

quodlibetor
Copy link
Contributor

@quodlibetor quodlibetor commented May 5, 2021

Rendered

Kafka is unique in our supported sources (although not all possible sources,
e.g. [DynamoDB Streams][ddb]) in that it partitions messages into a Key part and
a Value part. The Key is intended to be the equivalent of a primary key in a
database, and it is commonly used that way by Kafka users.

An important aspect of the Key is that it is a separate data section -- it never
shares backing storage with the Value.

Materialize does not support accessing the Key part of messages from our SQL
layer, which has been mentioned as a pain point on several occasions. This is a
design to resolve that pain point.

Copy link
Contributor

@cirego cirego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very much in favor of this!

doc/developer/design/20210505_kafka_keys_in_sql.md Outdated Show resolved Hide resolved
doc/developer/design/20210505_kafka_keys_in_sql.md Outdated Show resolved Hide resolved
@quodlibetor
Copy link
Contributor Author

I've updated the proposed syntax/semantics to not always nest things in a record if the upstream type is not already a record (only relevant for formats text/bytes), and I also added an open question to the end about if we should default to using the name key for the same formats (text/bytes).

Copy link
Contributor

@umanwizard umanwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine overall. Thanks for putting this together!

Comment on lines +76 to +78
CREATE SOURCE <source-name> FROM KAFKA BROKER '<>'
<format>
[INCLUDE KEY [AS <key-column-name>]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the INCLUDE KEY should be syntactically part of the "Kafka" section, since it is a property of Kafka brokers specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mean move the INCLUDE KEY to before the <format> specification? like so:

CREATE SOURCE <source-name> FROM KAFKA BROKER '<>'
[INCLUDE KEY [AS <key-column-name>]]
<format>

Technically the key depends on the combination of the fact that it's Kafka (or a future key-supporting source type) and the format defining how to handle Keys. Conceptually I think that this change seems fine, but I want to think on it for a bit and write out some examples.


```sql
CREATE SOURCE avro_avro FROM KAFKA BROKER '...' TOPIC '...'
KEY FORMAT AVRO USING SCHEMA '{"type": "record", "name": "boring", "fields": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally not write INCLUDE KEY in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, don't know how that happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants