Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: partition-by primitive key support #4098

Merged

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Dec 10, 2019

Description

NOTE: This was stacked on top of #4096

Fixes: #4092

WIP: This commit gets PARTITION BY clauses working with primitive key types. However, it does disable a couple of join until #4094 has been completed.

BREAKING CHANGE: A PARTITION BY now changes the SQL type of ROWKEY in the output schema of a query.

For example, consider:

CREATE STREAM INPUT (ROWKEY STRING KEY, ID INT) WITH (...);
CREATE STREAM OUTPUT AS SELECT ROWKEY AS NAME FROM INPUT PARTITION BY ID;

Previously, the above would have resulted in an output schema of ROWKEY STRING KEY, NAME STRING, where ROWKEY would have stored the string representation of the integer from the ID column. With this commit the output schema will be ROWKEY INT KEY, NAME STRING.

Testing done

Suitable QTT tests added / updated.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

First of a few commits to start introducing support for primitive keys in different query types.

This commit opens the door for CT/CS statements with primitive keys, (`STRING`, `INT`, `BIGINT`, `BOOLEAN` and `DOUBLE`), and for using those sources in non-join, non-aggregate and non-partition-by queries.
Fixes: confluentinc#4092

WIP: This commit gets `PARTITION BY` clauses working with primitive key types. However, it does disable a couple of join until confluentinc#4094 has been completed.

BREAKING CHANGE: A `PARTITION BY` now changes the SQL type of `ROWKEY` in the output schema of a query.

For example, consider:

```sql
CREATE STREAM INPUT (ROWKEY STRING KEY, ID INT) WITH (...);
CREATE STREAM OUTPUT AS SELECT ROWKEY AS NAME FROM INPUT PARTITION BY ID;
```

Previously, the above would have resulted in an output schema of `ROWKEY STRING KEY, NAME STRING`, where `ROWKEY` would have stored the string representation of the integer from the `ID` column.  With this commit the output schema will be `ROWKEY INT KEY, NAME STRING`.
@big-andy-coates big-andy-coates requested a review from a team as a code owner December 10, 2019 14:30
Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@big-andy-coates big-andy-coates merged commit 7addf88 into confluentinc:master Dec 10, 2019
@big-andy-coates big-andy-coates deleted the partition_by_primitives branch December 10, 2019 22:48
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

final LogicalSchema sourceSchema,
final StreamSelectKey step
) {
final ExpressionTypeManager expressionTypeManager =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out of this class? Ideally this class should just be routing to something else that owns the schema transformation. We can move it to the step builder for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Primitive keys: Support INT, BIGINT, DOUBLE and STRING in PARTITION BY
3 participants