[CALCITE-2913] Adapter for Apache Kafka #1127

mingmxu · 2019-03-22T21:53:38Z

Add an adapter to expose Kafka topics as STREAM tables.

KafkaTableFactory is used here so end users need to specify table-topic mapping one-by-one.

JIRA: https://issues.apache.org/jira/browse/CALCITE-2913

zinking · 2019-04-03T00:45:32Z

I was hoping to see some sql test in this new adapter though, could that be mocked?

mingmxu · 2019-04-03T03:51:30Z

There're some test cases in KafkaAdapterTest with org.apache.kafka.clients.consumer.MockConsumer. Is that what you refers to?

asereda-gs · 2019-05-10T23:15:51Z

kafka/pom.xml

+    </dependency>
+    <dependency>
+      <groupId>com.google.auto.value</groupId>
+      <artifactId>auto-value</artifactId>


I realize this is a compile time dependency but I'm not sure we want to depend on AutoValue.

Generally we try to minimize dependencies.

If you really need it we should discuss on dev list what is appropriate codegen library for Calcite.

auto-value is not necessary, removed;

asereda-gs · 2019-05-10T23:17:28Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaBaseTable.java

+          consumer.subscribe(Collections.singletonList(tableOptions.topicName()));
+
+          return new KafkaMessageEnumerator(consumer, tableOptions.rowConverter());
+        } catch (Exception e) {


Do we need to catch coarse-grained Exception ? Perhaps just KafkaException

No exception(besides RuntimeException) here actually, removed.

asereda-gs · 2019-05-11T01:02:34Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaBaseTable.java

+public abstract class KafkaBaseTable implements ScannableTable {
+  final KafkaTableOptions tableOptions;
+
+  public KafkaBaseTable(final KafkaTableOptions tableOptions) {


pls make all constructors package-private for now. Until API is stable people should use KafkaTableFactory only.

asereda-gs · 2019-05-11T01:02:55Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaMessageEnumerator.java

+  private LinkedList<ConsumerRecord<K, V>> bufferedRecords = new LinkedList<>();
+  private ConsumerRecord<K, V> curRecord;
+
+  public KafkaMessageEnumerator(final Consumer consumer,


package-private constructor

asereda-gs · 2019-05-11T01:03:44Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaMessageEnumerator.java

+ * @param <V>: type for Kafka message value,
+ *           refer to {@link ConsumerConfig#VALUE_DESERIALIZER_CLASS_CONFIG};
+ */
+public class KafkaMessageEnumerator<K, V> implements Enumerator {


What is the type (T) for Enumerator<T> ?

updated with Object[], refers to an array of elements in a row.

asereda-gs · 2019-05-11T02:26:44Z

site/_docs/kafka_adapter.md

+
+The Kafka adapter exposes a Kafka topic as a STREAM table, so it can be query using
+[Calcite Stream SQL]({{ site.baseurl }}/docs/stream.html). Note that the adapter will not attempt to scan all topics,
+instead users need to configure tables one-by-one.


Can you mention that one topic is one table (one-to-one mapping).

added as

Note that the adapter will not attempt to scan all topics,
instead users need to configure tables manually, one Kafka stream table is mapping to one Kafka topic.

asereda-gs · 2019-05-11T02:33:00Z

site/_docs/kafka_adapter.md

+          "name": "TABLE_NAME",
+          "type": "custom",
+          "factory": "org.apache.calcite.adapter.kafka.KafkaTableFactory",
+          "row.converter": "com.example.CustKafkaRowConverter",


note to myself. check if attributes with dot (.) is a good idea

asereda-gs · 2019-05-11T02:34:38Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaTableConstants.java

+/**
+ * Parameter constants used to define a Kafka table.
+ */
+public class KafkaTableConstants {


make all constants package private.
use interface instead of class for defining constants

asereda-gs · 2019-05-11T02:44:13Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaRowConverter.java

+   * @param topicName, Kafka topic name;
+   * @return row type
+   */
+  RelDataType rowDataType(String topicName);


From configuration I see that you define one converter per topic. If so, why do you need this method (rowDataType(String)) ?

I don't put row schema in constructor as KafkaRowConverter is defined as interface. Usually we use an external metadata system to manage Kafka message schema, calling rowDataType(String topic) would be more clear IMO.

I feel like current interface is both one-to-one (between topic and row) and one-to-many (between topic and RelDataType). It seems more logical to have the following:

interface KafkaRowConverter { RelDataType rowDataType(); Object[] toRow(ConsumerRecord<K, V> message); }

Ie have separate KafkaRowConverter per topic.

Another option is to provide RelDataType as constructor argument to KafkaTable.

Pls let me know if you agree with such API.

Traditionally we have been using single Function1<Consumer<K,V>, Object[]> for such transformations.

@asereda-gs sry that I was lost in the thread, the proposed interface looks good to me. If it's not a block issue, I would like to close this PR and update it in a new task(CALCITE-3080), to handle user-specified columns together. In this way, there would be two provided implementations: 1) to handle user-specified columns, 2) to support external row metadata;

OK if CALCITE-3080 addresses it separately.

asereda-gs · 2019-05-11T02:47:04Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaTableFactory.java

+            (String) operand.get(KafkaTableConstants.SCHEMA_ROW_CONVERTER))
+            .newInstance();
+      } catch (InstantiationException | IllegalAccessException | ClassNotFoundException e) {
+        throw new RuntimeException(e);


Can you give more details in exception message (eg. Failed to create table $T in schema $S)? In current context you have table name and config.

updated with detailed error message

asereda-gs · 2019-05-12T14:56:23Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaMessageEnumerator.java

+    }
+
+    curRecord = bufferedRecords.removeFirst();
+    return true;


Probably it is also a good idea to check for DataContext.Variable.CANCEL_FLAG if user explicitly unsubscribes.

See CsvStreamScannableTable for examples

asereda-gs · 2019-05-13T17:35:24Z

pls squash and rebase all commits.
I will make one more (or two) code review pass(es)

mingmxu · 2019-05-13T17:53:02Z

Surely, rebased from github/master and merged into one commit.

asereda-gs · 2019-05-14T16:59:35Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaTableFactory.java

+        );
+      } catch (ClassNotFoundException e) {
+        throw new RuntimeException(
+            String.format(Locale.getDefault(), "Class '%s' is not found.",


What I meant here is more like

catch (CheckedExeption e) { final String details = String.format("Failed to create table %s (topic %s) in schema %s", name, topicName, schema,getName()); throw new RuntimeException(details, e); // instead of just: throw new RuntimeException(e); }

You don't need to catch each exception individually.

ah I got you, let me update it

mingmxu · 2019-05-14T17:32:51Z

reformatted, please take another look, will squash into 1 after review.

asereda-gs · 2019-05-14T18:35:45Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaTableFactory.java

+      } catch (ClassNotFoundException | NoSuchMethodException | IllegalAccessException
+          | InstantiationException | InvocationTargetException e) {
+        final String details = String.format(
+            Locale.getDefault(),


use Locale.ROOT

asereda-gs · 2019-05-14T19:05:05Z

Pls mention (in docs) that ,currently, Kafka Adapter is in beta (preview phase) and we might change public API or even remove it. I would like to stay flexible first couple of versions.

asereda-gs · 2019-05-14T19:07:00Z

It would be nice to have examples with some JSON messages (using native calcite json functions).

mingmxu · 2019-05-14T19:29:23Z

I've updated a note in document.

For JSON examples, I would defer to my next task, mostly to leverage RelDataType rowType. Examples to show JSON functions(over a STRING Kafka message field) is not necessary here IMO.

asereda-gs · 2019-05-14T19:31:46Z

kafka/src/test/java/org/apache/calcite/adapter/kafka/package-info.java

+ */
+
+/**
+ * Query provider that reads from files and web pages in various formats.


This doesn't look like Kafka package description

asereda-gs · 2019-05-14T19:33:10Z

kafka/src/test/java/org/apache/calcite/adapter/kafka/KafkaMockConsumer.java

+
+    for (int idx = 0; idx < 10; ++idx) {
+      addRecord(new ConsumerRecord<byte[], byte[]>("testtopic",
+          0, idx, ("mykey" + idx).getBytes(Charset.forName("UTF-8")),


Use StandardCharsets.UTF_8

mingmxu · 2019-05-14T19:37:50Z

Back to your comment here, do you think a vote is required to add a new adapter? Ideally I don't want to add it today and remove it days later.

Kafka Adapter is in beta (preview phase) and we might change public API or even remove it.

julianhyde · 2019-05-14T20:01:36Z

I don't think a vote is required for a new adapter - it is just a code change, albeit a significant one. That said, I think we should get consensus on the dev list that it's a good idea. It will add to the work load of every future release manager.

I've added some high-level review comments to https://issues.apache.org/jira/browse/CALCITE-2913. I will encourage other committers to make high-level comments in JIRA also. @asereda-gs is doing an excellent job of reviewing line-by-line (thank you Andrei!)

asereda-gs · 2019-05-14T20:01:47Z

sqlline

@@ -37,7 +37,7 @@ if [ ! -f target/fullclasspath.txt ]; then
 fi

 CP=
-for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb server spark splunk geode example/csv example/function; do
+for module in core cassandra druid elasticsearch2 elasticsearch5 file mongodb server spark splunk geode example/csv example/function kafka; do


Julian's comment:

you've put kafka at the end, but it should be in alphabetical order

Putting it at the end is, of course, the modest thing to do.

I learned a while ago that being modest (especially when editing files with lots of unit test methods) creates what DBMS folks call a hot-spot. Everyone adds at the end, so we get merge conflicts. Make your change in the most logical place (which is alphabetical if there is no other organizing principle).

asereda-gs · 2019-05-14T20:12:19Z

kafka/pom.xml

+  <artifactId>calcite-kafka</artifactId>
+  <packaging>jar</packaging>
+  <name>calcite kafka</name>
+  <description>Calcite provider that reads from kafka topics</description>


Pls change to

Kafka Adapter. Exposes kafka topic(s) as stream table(s).

asereda-gs · 2019-05-14T20:16:32Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaStreamTable.java

+        if (tableOptions.getConsumerParams() != null) {
+          consumerConfig.putAll(tableOptions.getConsumerParams());
+        }
+        Consumer consumer = new KafkaConsumer<>(consumerConfig);


Can we subscribe as late as possible (inside Enumerator on first next) ?

prepare would be the best place, next looks odd to me.

Probably it is fine as is for now. It can be improved later-on.

mingmxu · 2019-05-14T20:57:34Z

there're so many small commits now, @asereda-gs can you take a quick look and I would do a squash then?

asereda-gs · 2019-05-14T21:03:58Z

@xumingmin you can force push a squashed commit (no need to do it separately). Just rebase locally

mingmxu · 2019-05-14T21:28:43Z

is there a template on how to write commit message? I don't find it in http://calcite.apache.org/develop/#contributing.

asereda-gs · 2019-05-14T21:30:32Z

Commit your change to your branch, and use a comment that starts with the JIRA case number, like this:
...
If you are not a committer, add your name in parentheses at the end of the message.

Also check history of existing commits

mingmxu · 2019-05-14T21:33:57Z

got you, make a minor change as

[CALCITE-2913] Add Kafka Adapter (Mingmin Xu)
Expose an Apache Kafka topic as a stream table.

kafka consumer is not well-known.

asereda-gs · 2019-05-14T23:07:21Z

@xumingmin can you please address Julian's comments in JIRA about proper attribution (especially in documentation) :

Make sure to respect Kafka's branding. Each page in the documentation where Kafka is mentioned, the first mention should call it "Apache Kafka". I'd change the name of this case/commit to "Adapter for Apache Kafka".

asereda-gs · 2019-05-14T23:58:07Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaTableOptions.java

+    return this;
+  }
+
+  public Map getConsumerParams() {


What are key/value types ? Map<String, String> ?

yes, should

@xumingmin can you please address Julian's comments in JIRA about proper attribution (especially in documentation) :

Make sure to respect Kafka's branding. Each page in the documentation where Kafka is mentioned, the first mention should call it "Apache Kafka". I'd change the name of this case/commit to "Adapter for Apache Kafka".

I forget some places, let me go though it again

What are key/value types ? Map<String, String> ?

Should be, let me specify it explicitly.

wangzzu · 2019-05-15T02:23:19Z

site/_docs/kafka_adapter.md

+ how to decode Kafka message to Calcite row. [KafkaRowConverterImpl]({{ site.apiRoot }}/org/apache/calcite/adapter/kafka/KafkaRowConverterImpl.html)
+ is used if not provided;
+
+2. More consumer settings can be added in parameter `consumer.paras`;


it should be consumer.params

yes, let me update it.

wangzzu · 2019-05-15T03:07:26Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaMessageEnumerator.java

+      bufferedRecords.add(record);
+    }
+
+    consumer.commitSync();


If you use commitSync() api, you should set enable.auto.commit to false before create KafkaConsumer instance.

I would remove this line directly. If users need auto-commit, they can set in consumer.params

wangzzu · 2019-05-16T06:12:52Z

@xumingmin Whether it is necessary to support user consume from the special timestamp?

mingmxu · 2019-05-16T11:43:10Z

@wangzzu I would like this feature in a following task, to support both from/to timestamp/offset.

Create https://issues.apache.org/jira/browse/CALCITE-3073 for tracking.

wangzzu · 2019-05-17T07:33:29Z

@xumingmin 👍

yuqi1129 · 2019-05-19T14:47:16Z

kafka/src/test/java/org/apache/calcite/adapter/kafka/KafkaRowConverterTest.java

+   * @return fields in the row
+   */
+  @Override public Object[] toRow(final ConsumerRecord<String, String> message) {
+    Object[] fields = new Object[4];


Why you take a array of size 4 not 3?

true, thanks for pointing out.

yuqi1129 · 2019-05-20T01:01:01Z

kafka/src/main/java/org/apache/calcite/adapter/kafka/KafkaRowConverterImpl.java

+   */
+  @Override public RelDataType rowDataType(final String topicName) {
+    final RelDataTypeFactory typeFactory =
+        new SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT);


topicName is useless, is there any reason that pass to this method as parameter

We may update the way to define row schema, refer to thread https://github.com/apache/calcite/pull/1127/files#r283079680. Please let me know if it's a block issue.

mingmxu · 2019-05-23T08:49:31Z

@asereda-gs any idea when can we close this PR? I notice that Calcite 1.20.0 release is coming.

Btw, I doubt there's enough time to merge CALCITE-3080, may be better idea to make below change now, any thoughts?

interface KafkaRowConverter {
RelDataType rowDataType();
Object[] toRow(ConsumerRecord<K, V> message);
}

asereda-gs · 2019-05-23T15:11:01Z

@xumingmin it is fine if we redefine KafkaRowConverter interface after 1.20 (Ie address CALCITE-3080).

It is the reason why I prefer to mention in 1.20 release notes that Kafka Adapter is currently in "preview mode" (some API might change).

Pls rebase this PR (there are some conflicts)

julianhyde · 2019-05-23T17:39:45Z

I see high-level discussions about scheduling and dependency on CALCITE-3080 going on here. As I said previously, please have these discussions in JIRA not git.

Expose an Apache Kafka topic as a stream table.

mingmxu · 2019-05-24T16:59:38Z

@asereda-gs rebased from apache/master.
@julianhyde , thanks for reminding, let me copy the thread to JIRA.

mingmxu force-pushed the kafkaadapter branch 3 times, most recently from 4e9bf04 to 9c0bf4e Compare March 22, 2019 23:05

mingmxu force-pushed the kafkaadapter branch 2 times, most recently from 93cd668 to 58c0ea8 Compare April 23, 2019 02:28

asereda-gs self-requested a review May 10, 2019 23:11

asereda-gs reviewed May 11, 2019

View reviewed changes

mingmxu force-pushed the kafkaadapter branch from 856d6f6 to c9023fc Compare May 11, 2019 08:25

asereda-gs reviewed May 13, 2019

View reviewed changes

mingmxu force-pushed the kafkaadapter branch from b82d35c to b6ab5a4 Compare May 13, 2019 17:52

asereda-gs reviewed May 14, 2019

View reviewed changes

mingmxu force-pushed the kafkaadapter branch from 8f13a13 to 5df2c38 Compare May 14, 2019 21:06

mingmxu force-pushed the kafkaadapter branch from f8a12c0 to ea45ee6 Compare May 14, 2019 21:34

asereda-gs changed the title ~~[CALCITE-2913] add a KafkaAdapter for Stream~~ [CALCITE-2913] Add a Kafka Adapter May 14, 2019

asereda-gs changed the title ~~[CALCITE-2913] Add a Kafka Adapter~~ [CALCITE-2913] Add Kafka Adapter May 14, 2019

mingmxu force-pushed the kafkaadapter branch from ea45ee6 to 2c03d8f Compare May 14, 2019 22:10

asereda-gs added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label May 14, 2019

asereda-gs changed the title ~~[CALCITE-2913] Add Kafka Adapter~~ [CALCITE-2913] Add Apache Kafka Adapter May 14, 2019

asereda-gs reviewed May 14, 2019

View reviewed changes

mingmxu changed the title ~~[CALCITE-2913] Add Apache Kafka Adapter~~ [CALCITE-2913] Adapter for Apache Kafka May 15, 2019

mingmxu force-pushed the kafkaadapter branch from 2c03d8f to b7a7419 Compare May 15, 2019 00:51

wangzzu reviewed May 15, 2019

View reviewed changes

mingmxu force-pushed the kafkaadapter branch 2 times, most recently from bee7541 to 12d6d8c Compare May 16, 2019 02:02

yuqi1129 reviewed May 20, 2019

View reviewed changes

mingmxu force-pushed the kafkaadapter branch from 12d6d8c to f0e0f27 Compare May 20, 2019 09:01

[CALCITE-2913] Adapter for Apache Kafka (Mingmin Xu)

5599eb1

Expose an Apache Kafka topic as a stream table.

mingmxu force-pushed the kafkaadapter branch from f0e0f27 to 5599eb1 Compare May 24, 2019 16:56

asereda-gs merged commit ca6dc99 into apache:master May 24, 2019

mingmxu deleted the kafkaadapter branch September 28, 2020 08:18

[CALCITE-2913] Adapter for Apache Kafka #1127

[CALCITE-2913] Adapter for Apache Kafka #1127

Conversation

mingmxu commented Mar 22, 2019 • edited by hsyuan

zinking commented Apr 3, 2019

mingmxu commented Apr 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asereda-gs May 14, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asereda-gs commented May 13, 2019

mingmxu commented May 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmxu commented May 14, 2019

Choose a reason for hiding this comment

asereda-gs commented May 14, 2019

asereda-gs commented May 14, 2019

mingmxu commented May 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmxu commented May 14, 2019

julianhyde commented May 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmxu commented May 14, 2019

asereda-gs commented May 14, 2019 • edited

mingmxu commented May 14, 2019

asereda-gs commented May 14, 2019 • edited

mingmxu commented May 14, 2019

asereda-gs commented May 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangzzu commented May 16, 2019

mingmxu commented May 16, 2019 • edited

wangzzu commented May 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmxu commented May 23, 2019

asereda-gs commented May 23, 2019

julianhyde commented May 23, 2019

mingmxu commented May 24, 2019

mingmxu commented Mar 22, 2019 •

edited by hsyuan

asereda-gs May 14, 2019 •

edited

asereda-gs commented May 14, 2019 •

edited

asereda-gs commented May 14, 2019 •

edited

mingmxu commented May 16, 2019 •

edited