Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support STREAMs without key column defined #4678

Closed
big-andy-coates opened this issue Mar 2, 2020 · 0 comments · Fixed by #5533
Closed

Support STREAMs without key column defined #4678

big-andy-coates opened this issue Mar 2, 2020 · 0 comments · Fixed by #5533
Assignees
Milestone

Comments

@big-andy-coates
Copy link
Contributor

big-andy-coates commented Mar 2, 2020

Currently, if the user does not provide a key column in their stream definition, e.g.

CREATE STREAM FOO (COL0 INT, COL1 STRING) WITH (...);

Then ksqlDB adds a default ROWKEY STRING KEY column to the schema of the stream.

However, this is not ideal. If the user omitted the key column because the data truly has no key, or KSQL is not capable of deserializing the key, then bad things might happen if ksql assumes it is a string. e.g. it will let the user join this stream on ROWKEY to anything with a STRING column.

We should allow users to not define a key column for streams!

To aid migration, we should first implement #4679 first, which removes the default and instead fails any command missing a key column. Once this is released, forcing users to update their existing statements to define a key column, we can then look to implement this change in a release or two's time.

@big-andy-coates big-andy-coates added this to Backlog in Primitive Keys via automation Mar 2, 2020
@big-andy-coates big-andy-coates moved this from Backlog to To do in Primitive Keys Apr 24, 2020
@big-andy-coates big-andy-coates added this to the 0.10.0 milestone Apr 24, 2020
@big-andy-coates big-andy-coates modified the milestones: 0.10.0, 0.11.0 May 14, 2020
@big-andy-coates big-andy-coates moved this from To do to Backlog in Primitive Keys May 14, 2020
@big-andy-coates big-andy-coates moved this from Backlog to In progress in Primitive Keys Jun 3, 2020
big-andy-coates added a commit to big-andy-coates/ksql that referenced this issue Jun 3, 2020
implements: [KLIP-29](confluentinc#5530)

fixes: confluentinc#5303
fixes: confluentinc#4678

This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement.

BREAKING CHANGE

`CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided.

For example, a statement such as:

```sql
CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');
```

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

```sql
CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');
```

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

```sql
-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');
```

`CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column.

For example:

```sql
CREATE STREAM BAR (NAME STRING) WITH (...);
```

The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`.
With this change the above statement will result in a stream with only the `NAME STRING` column.

Streams will no KEY column will be serialized to Kafka topics with a `null` key.
@big-andy-coates big-andy-coates mentioned this issue Jun 3, 2020
2 tasks
Primitive Keys automation moved this from In progress to Done Jun 3, 2020
big-andy-coates added a commit that referenced this issue Jun 3, 2020
* feat: explicit keys

implements: [KLIP-29](#5530)

fixes: #5303
fixes: #4678

This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement.

BREAKING CHANGE

`CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided.

For example, a statement such as:

```sql
CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');
```

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

```sql
CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');
```

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

```sql
-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');
```

`CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column.

For example:

```sql
CREATE STREAM BAR (NAME STRING) WITH (...);
```

The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`.
With this change the above statement will result in a stream with only the `NAME STRING` column.

Streams will no KEY column will be serialized to Kafka topics with a `null` key.


Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>
stevenpyzhang pushed a commit that referenced this issue Jun 5, 2020
* feat: explicit keys

implements: [KLIP-29](#5530)

fixes: #5303
fixes: #4678

This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement.

BREAKING CHANGE

`CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided.

For example, a statement such as:

```sql
CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');
```

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

```sql
CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');
```

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

```sql
-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');
```

`CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column.

For example:

```sql
CREATE STREAM BAR (NAME STRING) WITH (...);
```

The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`.
With this change the above statement will result in a stream with only the `NAME STRING` column.

Streams will no KEY column will be serialized to Kafka topics with a `null` key.

Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>
@big-andy-coates big-andy-coates self-assigned this Jul 8, 2020
@big-andy-coates big-andy-coates modified the milestones: 0.11.0, 0.10.0 Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

1 participant