Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly set key when partition by ROWKEY and join on non-ROWKEY #4090

Merged
merged 1 commit into from
Dec 10, 2019

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Dec 10, 2019

fixes #4053

Description

Previously, if we were to join on a non-rowkey field in the source schema, and then partition by the original rowkey, the second operation was ignored. This fixes the issue by making sure that the alias is correct. This is a temporary fix and will need to be changed when we change how we handle aliasing.

Testing done

  • new QTT tests
  • checked for unnecessary repartitions

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from rodesai December 10, 2019 00:20
@agavra agavra requested a review from a team as a code owner December 10, 2019 00:20
Copy link
Contributor

@purplefox purplefox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is fixing #4053, then it should mean we're applying the PARTITION BY to the output of the JOIN, so something like this should be possible right:

-- Given:
CREATE STREAM L (A STRING, B STRING) WITH (kafka_topic='LEFT', value_format='JSON', KEY='A');
CREATE STREAM R (C STRING, D STRING) WITH (kafka_topic='RIGHT', value_format='JSON', KEY='C');

-- Then can partition by an alias used in the projection:
CREATE STREAM OUTPUT AS SELECT L.ROWKEY AS RK, R.ROWKEY FROM L JOIN R WITHIN 10 SECONDS ON L.A = R.C PARTITION BY RK

Can you add a test to add if one doesn't already exist please?

Otherwise, LGTM.

@agavra
Copy link
Contributor Author

agavra commented Dec 10, 2019

@big-andy-coates - what you are suggesting is not possible. The source schema is the join schema, not the projection schema; the join schema is the union of all of the sources' fields. In your example, RK is not in the source join schema, but in the projection schema that is applied after the partition by step.

@agavra agavra merged commit 6c80941 into confluentinc:master Dec 10, 2019
@agavra agavra deleted the partition_fix branch December 10, 2019 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JOIN on non-ROWKEY and PARTITION BY ROWKEY is broken
3 participants