Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CREATE TABLE hangs with null key in the source topic #266

Closed
math-g opened this issue Aug 30, 2017 · 8 comments
Closed

CREATE TABLE hangs with null key in the source topic #266

math-g opened this issue Aug 30, 2017 · 8 comments
Assignees
Labels

Comments

@math-g
Copy link

math-g commented Aug 30, 2017

I am trying to create tables from my own data via Connect / JDBC (JSON converted with default /etc/kafka/connect-standalone.properties) and it seems the Connect's output topic that I use as a source for the KSQL CREATE TABLE statement needs a key to make it work (I had to use a ValueToKey transform in Connect so the topic's key is not null). Otherwise, the table is created but the SELECT * FROM table hangs.

I have also tried the WITH KEY option at table creation, without success. If I use CREATE STREAM I don't need this key.

Can you confirm or infirm this need of a key in the source topic for table creation ?

@math-g
Copy link
Author

math-g commented Aug 30, 2017

If indeed messages with null key are dropped as for KTable, could you give a way to select the key from values when creating the table ? Maybe that's the purpose of WITH KEY but I did not make it work.
Otherwise, when used with Kafka Connect-JDBC and ValueToKey transformations are not possible (because not all the tables have the same name for the id column, and the ValueToKey transformation doesn't allows such a per table id_colunm name mapping), before being able to use CREATE TABLE in KSQL, you will need to code a Kafka Stream to create a KStream for each Kafka Connect output topic (JDBC tables) and select key for each.

@hjafarpour hjafarpour self-assigned this Aug 30, 2017
@hjafarpour
Copy link
Contributor

hjafarpour commented Aug 30, 2017

@math-g Yes, you are right. The reason you see the SELECT * FROM table not showing any results over a topic with null key is that KSQL similar to KTable in Kafka Streams API needs keys for the underlying topic for TABLE and will drop the messages with null topic.
To define one of the columns as key you can use PARTITION BY clause for stream.
SO in your example, for the topic that was generated by connect, you can define a stream first over it with CREATE STREAM DDL statement and then create a new STREAM using CREATE STREAM AS SELECT statement and add PARTITION BY clause to the end. This will create a new stream with the selected key. Please refer here for PARTITION BY example: "https://github.com/confluentinc/ksql/blob/0.1.x/docs/examples.md#examples"
Now you can define your table over the topic that your new stream created since the key in that topic won't be null.

@math-g
Copy link
Author

math-g commented Aug 30, 2017

Ok thanks, I was able to make the SELECT FROM TABLE work. Interesting to see that the 'WITH kafka_topic' property works as input for CREATE and as output for CREATE AS SELECT.
But the partitioned stream was empty so when I create the table, I can only see the new messages and I don't have a table with all the data from beginning. Is there a way to avoid that ?

Do you also know if with Kafka Connect, you can apply a per table ValueToKey transformation ? That would be simpler overall.

@hjafarpour
Copy link
Contributor

hjafarpour commented Aug 30, 2017

@math-g by default KSQL reads topic from the current offset. One exception is the Stream-Table join where the table will be read from the beginning.
You can use the following statement to change this so KSQL would read topics from the begining:

SET 'auto.offset.reset'='earliest';

For more details please refer to https://github.com/confluentinc/ksql/blob/0.1.x/docs/examples.md#examples
I'll get back to you on the connect question.

@math-g
Copy link
Author

math-g commented Aug 30, 2017

Thanks, I already had set this variable, actually, it doesn't seem to work for the intermediary stream, for which the output topic only seem to receive the new messages.

Ok I will await you reply about Kafka Connect.

@hjafarpour
Copy link
Contributor

@math-g about Connect: It is possible to apply an SMT to populate or override the key in a message, but ValueToKey will duplicate the value into the key. Another SMT might work for a single table and specific fields in the value, but for more than 1 table probably requires a custom SMT.

@makgroup
Copy link

makgroup commented Jan 8, 2018

@hjafarpour hjafarpour : i am facing same issue, where SELECT * FROM table hanging.
loaded simple data account(id, name) by using /etc/kafka/connect-standalone.properties.
test-oracle-jdbc-ACCOUNTS created. i am able to pull the data from TOPIC.

but

CREATE STREAM ACCOUNT_INFO (id INTEGER, name varchar) WITH (kafka_topic='test-oracle-jdbc-ACCOUNTS', value_format='JSON');
CREATE TABLE ACCOUNT_DATA (id INTEGER, name varchar) WITH (kafka_topic='test-oracle-jdbc-ACCOUNTS', KEY='ID', value_format='JSON');

stream, table selection hanging. can you please help me on this

@rmoff
Copy link
Contributor

rmoff commented Jun 21, 2018

See #1405 - closing.

@rmoff rmoff closed this as completed Jun 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants