New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: Add a design doc for exposing source keys to dataflow #6661
design: Add a design doc for exposing source keys to dataflow #6661
Conversation
742a455
to
b0d0953
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very much in favor of this!
I've updated the proposed syntax/semantics to not always nest things in a record if the upstream type is not already a record (only relevant for formats text/bytes), and I also added an open question to the end about if we should default to using the name |
c309674
to
194e831
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine overall. Thanks for putting this together!
CREATE SOURCE <source-name> FROM KAFKA BROKER '<>' | ||
<format> | ||
[INCLUDE KEY [AS <key-column-name>]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the INCLUDE KEY
should be syntactically part of the "Kafka" section, since it is a property of Kafka brokers specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean move the INCLUDE KEY
to before the <format>
specification? like so:
CREATE SOURCE <source-name> FROM KAFKA BROKER '<>'
[INCLUDE KEY [AS <key-column-name>]]
<format>
Technically the key depends on the combination of the fact that it's Kafka (or a future key-supporting source type) and the format defining how to handle Keys. Conceptually I think that this change seems fine, but I want to think on it for a bit and write out some examples.
|
||
```sql | ||
CREATE SOURCE avro_avro FROM KAFKA BROKER '...' TOPIC '...' | ||
KEY FORMAT AVRO USING SCHEMA '{"type": "record", "name": "boring", "fields": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intentionally not write INCLUDE KEY
in this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, don't know how that happened.
In writing out some examples it became more clear that the guaranteed-record semantics for the "provide a name" branch was somewhat unergonomic.
194e831
to
b2742e8
Compare
Rendered
Kafka is unique in our supported sources (although not all possible sources,
e.g. [DynamoDB Streams][ddb]) in that it partitions messages into a Key part and
a Value part. The Key is intended to be the equivalent of a primary key in a
database, and it is commonly used that way by Kafka users.
An important aspect of the Key is that it is a separate data section -- it never
shares backing storage with the Value.
Materialize does not support accessing the Key part of messages from our SQL
layer, which has been mentioned as a pain point on several occasions. This is a
design to resolve that pain point.