Support STREAMs without key column defined #4678

big-andy-coates · 2020-03-02T15:51:58Z

Currently, if the user does not provide a key column in their stream definition, e.g.

CREATE STREAM FOO (COL0 INT, COL1 STRING) WITH (...);

Then ksqlDB adds a default ROWKEY STRING KEY column to the schema of the stream.

However, this is not ideal. If the user omitted the key column because the data truly has no key, or KSQL is not capable of deserializing the key, then bad things might happen if ksql assumes it is a string. e.g. it will let the user join this stream on ROWKEY to anything with a STRING column.

We should allow users to not define a key column for streams!

To aid migration, we should first implement #4679 first, which removes the default and instead fails any command missing a key column. Once this is released, forcing users to update their existing statements to define a key column, we can then look to implement this change in a release or two's time.

The text was updated successfully, but these errors were encountered:

implements: [KLIP-29](confluentinc#5530) fixes: confluentinc#5303 fixes: confluentinc#4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key.

* feat: explicit keys implements: [KLIP-29](#5530) fixes: #5303 fixes: #4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key. Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>

big-andy-coates added this to Backlog in Primitive Keys via automation Mar 2, 2020

big-andy-coates moved this from Backlog to To do in Primitive Keys Apr 24, 2020

big-andy-coates added this to the 0.10.0 milestone Apr 24, 2020

big-andy-coates modified the milestones: 0.10.0, 0.11.0 May 14, 2020

big-andy-coates moved this from To do to Backlog in Primitive Keys May 14, 2020

big-andy-coates moved this from Backlog to In progress in Primitive Keys Jun 3, 2020

big-andy-coates mentioned this issue Jun 3, 2020

Require KEY column in C* statements #4679

Closed

big-andy-coates mentioned this issue Jun 3, 2020

Explicit keys #5533

Merged

2 tasks

big-andy-coates closed this as completed in #5533 Jun 3, 2020

Primitive Keys automation moved this from In progress to Done Jun 3, 2020

big-andy-coates self-assigned this Jul 8, 2020

big-andy-coates modified the milestones: 0.11.0, 0.10.0 Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support STREAMs without key column defined #4678

Support STREAMs without key column defined #4678

big-andy-coates commented Mar 2, 2020 •

edited

Loading

Support STREAMs without key column defined #4678

Support STREAMs without key column defined #4678

Comments

big-andy-coates commented Mar 2, 2020 • edited Loading

big-andy-coates commented Mar 2, 2020 •

edited

Loading