Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't join on a nested field value #4051

Closed
rmoff opened this issue Dec 5, 2019 · 4 comments
Closed

Can't join on a nested field value #4051

rmoff opened this issue Dec 5, 2019 · 4 comments
Labels
Milestone

Comments

@rmoff
Copy link
Contributor

rmoff commented Dec 5, 2019

ksqlDB 0.6

Stream:

ksql> DESCRIBE TWEETS;

Name                 : TWEETS
 Field                | Type
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ROWTIME              | BIGINT           (system)
 ROWKEY               | VARCHAR(STRING)  (system)
 CREATEDAT            | BIGINT
 ID                   | BIGINT
 TEXT                 | VARCHAR(STRING)
 USER                 | STRUCT<ID BIGINT, NAME VARCHAR(STRING), SCREENNAME VARCHAR(STRING), LOCATION VARCHAR(STRING), DESCRIPTION VARCHAR(STRING), CONTRIBUTORSENABLED BOOLEAN, PROFILEIMAGEURL VARCHAR(STRING), BIGGERPROFILEIMAGEURL VARCHAR(STRING), MINIPROFILEIMAGEURL VARCHAR(STRING), ORIGINALPROFILEIMAGEURL VARCHAR(STRING), PROFILEIMAGEURLHTTPS VARCHAR(STRING), BIGGERPROFILEIMAGEURLHTTPS VARCHAR(STRING), MINIPROFILEIMAGEURLHTTPS VARCHAR(STRING), ORIGINALPROFILEIMAGEURLHTTPS VARCHAR(STRING), DEFAULTPROFILEIMAGE BOOLEAN, URL VARCHAR(STRING), PROTECTED BOOLEAN, FOLLOWERSCOUNT INTEGER, PROFILEBACKGROUNDCOLOR VARCHAR(STRING), PROFILETEXTCOLOR VARCHAR(STRING), PROFILELINKCOLOR VARCHAR(STRING), PROFILESIDEBARFILLCOLOR VARCHAR(STRING), PROFILESIDEBARBORDERCOLOR VARCHAR(STRING), PROFILEUSEBACKGROUNDIMAGE BOOLEAN, DEFAULTPROFILE BOOLEAN, SHOWALLINLINEMEDIA BOOLEAN, FRIENDSCOUNT INTEGER, CREATEDAT BIGINT, FAVOURITESCOUNT INTEGER, UTCOFFSET INTEGER, TIMEZONE VARCHAR(STRING), PROFILEBACKGROUNDIMAGEURL VARCHAR(STRING), PROFILEBACKGROUNDIMAGEURLHTTPS VARCHAR(STRING), PROFILEBANNERURL VARCHAR(STRING), PROFILEBANNERRETINAURL VARCHAR(STRING), PROFILEBANNERIPADURL VARCHAR(STRING), PROFILEBANNERIPADRETINAURL VARCHAR(STRING), PROFILEBANNERMOBILEURL VARCHAR(STRING), PROFILEBANNERMOBILERETINAURL VARCHAR(STRING), PROFILEBACKGROUNDTILED BOOLEAN, LANG VARCHAR(STRING), STATUSESCOUNT INTEGER, GEOENABLED BOOLEAN, VERIFIED BOOLEAN, TRANSLATOR BOOLEAN, LISTEDCOUNT INTEGER, FOLLOWREQUESTSENT BOOLEAN, WITHHELDINCOUNTRIES ARRAY<VARCHAR(STRING)>>
 RETWEET              | BOOLEAN
[…]

Table:

ksql> describe alert_criteria;

Name                 : ALERT_CRITERIA
 Field          | Type
--------------------------------------------
 ROWTIME        | BIGINT           (system)
 ROWKEY         | VARCHAR(STRING)  (system)
 SCREENNAME     | VARCHAR(STRING)
 CAPTURE_TWEETS | BOOLEAN
--------------------------------------------

Try to join:

ksql> SELECT T.*
>      FROM TWEETS T
>           LEFT OUTER JOIN ALERT_CRITERIA A
>           ON T.USER->SCREENNAME = A.SCREENNAME
>     WHERE A.CAPTURE_TWEETS = TRUE
>EMIT CHANGES;
Line: 4, Col: 34 : Invalid comparison expression 'T.USER->SCREENNAME' in join '(T.USER->SCREENNAME = A.SCREENNAME)'. Joins must only contain a field comparison.

Workaround for now is to materialise the nested field to the top level:

ksql> CREATE STREAM TWEETS2 AS SELECT USER->SCREENNAME AS USER_SCREENNAME, * FROM TWEETS;

 Message
------------------------------------------------------------------------------------
 Stream TWEETS2 created and running. Created by query with query ID: CSAS_TWEETS2_9
------------------------------------------------------------------------------------

and join on that instead. Not great because we have to duplicate the input data just for this.

@rmoff rmoff added the bug label Dec 5, 2019
@apurvam
Copy link
Contributor

apurvam commented Dec 6, 2019

The reason for this is that the struct dereference is actually implemented as evaluating an expression, and the join criteria doesn't support expression evaluation.

@derekjn derekjn added this to the 0.10.0 milestone May 13, 2020
@derekjn
Copy link
Contributor

derekjn commented Jun 10, 2020

This should be fixed now that joins support expressions.

@dttouchdata
Copy link

Is this fixed only for stream-table join? I am getting similar error when i do table-table join on struct column of the right table. We are on confluent cloud (i.e., we must be on latest ksqldb). Error message: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got HD.TEL_ID = H.ROWKEY->ID. Statement: select hd.* from H_BUSINESS_T hd inner join TELS_T h on hd.tel_id = h.rowkey->id emit changes;

@ccachor
Copy link

ccachor commented Jun 9, 2023

Is this fixed only for stream-table join? I am getting similar error when i do table-table join on struct column of the right table. We are on confluent cloud (i.e., we must be on latest ksqldb). Error message: Invalid join condition: table-table joins require to join on the primary key of the right input table. Got HD.TEL_ID = H.ROWKEY->ID. Statement: select hd.* from H_BUSINESS_T hd inner join TELS_T h on hd.tel_id = h.rowkey->id emit changes;

Have you found a work around for this? I'm using a table setup as well using AVRO and I'm seeing the same issue. I even tried creating a struct for the foreign key to match but I'm not seeing any results in the query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants