Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for using primitive types in joins. #4132

Merged
merged 4 commits into from Dec 17, 2019

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Dec 13, 2019

Description

Fixes: #4094

BREAKING CHANGE: Some existing joins may now fail and the type of ROWKEY in the result schema of joins may have changed.

When ROWKEY was always a STRING it was possible to join an INTEGER column with a BIGINT column. This is no longer the case. A JOIN requires the join columns to be of the same type. (See #4130 which tracks adding support for being able to CAST join criteria).

Where joining on two INT columns would previously have resulted in a schema containing ROWKEY STRING KEY, it would not result in ROWKEY INT KEY.

NOTE: a fair few tests needed to be fixed as they were now doing invalid joins or expecting a STRING key. I also removed some tests that were failing as they duplicated existing QTT tests.

Testing done

usual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

Adds support for using primitive types in joins.

BREAKING CHANGE: Some existing joins may now fail and the type of `ROWKEY` in the result schema of joins may have changed.

When `ROWKEY` was always a `STRING` it was possible to join an `INTEGER` column with a `BIGINT` column.  This is no longer the case. A `JOIN` requires the join columns to be of the same type. (See confluentinc#4130 which tracks adding support for being able to `CAST` join criteria).

Where joining on two `INT` columns would previously have resulted in a schema containing `ROWKEY STRING KEY`, it would not result in `ROWKEY INT KEY`.
@big-andy-coates big-andy-coates requested a review from a team as a code owner December 13, 2019 16:00
Comment on lines -255 to -285
final Optional<Column> keyColumn = schemaKStream
.getKeyField()
.resolve(schemaKStream.getSchema());

final ColumnRef rowKey = ColumnRef.of(
tableName,
SchemaUtil.ROWKEY_NAME
);

final boolean namesMatch = keyColumn
.map(field -> field.ref().equals(joinFieldName))
.orElse(false);

if (namesMatch || joinFieldName.equals(rowKey)) {
return (SchemaKTable) schemaKStream;
}

if (!keyColumn.isPresent()) {
throw new KsqlException(
"Source table (" + tableName.name() + ") has no key column defined. "
+ "Only 'ROWKEY' is supported in the join criteria."
);
}

throw new KsqlException(
"Source table (" + tableName.toString(FormatOptions.noEscape()) + ") key column ("
+ keyColumn.get().ref().toString(FormatOptions.noEscape()) + ") "
+ "is not the column used in the join criteria ("
+ joinFieldName.toString(FormatOptions.noEscape()) + "). "
+ "Only the table's key column or 'ROWKEY' is supported in the join criteria."
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are now done from the constructor.

Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @big-andy-coates ! I think it LGTM.
A couple of questions:

  1. Do we want to test with other data types as well besides BIGINT?
  2. What is the decision regarding casting? Currently, it fails if the columns are not of the same type.
    I will leave it up to you if we need another +1 from someone more experienced.

@@ -421,33 +413,6 @@ public void testSelectWithExpression() {

}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why did you remove these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there are QTT tests that cover the same thing.

@vpapavas vpapavas self-assigned this Dec 16, 2019
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agavra
Copy link
Contributor

agavra commented Dec 16, 2019

Currently, it fails if the columns are not of the same type.

@vpapavas - I think this is OK for now, I think it makes sense to require explicit casting for non-equivalent types of fields.

@big-andy-coates
Copy link
Contributor Author

@vpapavas

Hey @big-andy-coates ! I think it LGTM.
A couple of questions:

  1. Do we want to test with other data types as well besides BIGINT?

I'll ensure that's done.

  1. What is the decision regarding casting? Currently, it fails if the columns are not of the same type.

There's an issue tracking that work: #4130

Thanks for the review!

@big-andy-coates big-andy-coates merged commit d595985 into confluentinc:master Dec 17, 2019
@big-andy-coates big-andy-coates deleted the primitive_joins branch December 17, 2019 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Primitive keys: Support INT, BIGINT, DOUBLE and STRING in JOIN
3 participants