pinot_kafka_0.9.0.1 module #4252

npawar · 2019-05-28T23:21:27Z

Step 3 of #3998
As discussed in the issue, we are planning to move out kafka stream implementation related classes into its own module pinot-kafka-0.9.0.1. This should enable folks to plug in implementations for other kafka versions.
TODO: the pinot-integration-tests, pinot-tools and pinot-perf still depend on pinot-kafka-0.9.0.1 for the tests and dummy setup. We need to figure out a way by which we do not need to specify the dependency, but be able to inject the classes depending on the version we wish to use.

codecov-io · 2019-05-28T23:56:41Z

Codecov Report

Merging #4252 into master will increase coverage by 0.06%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master    #4252      +/-   ##
============================================
+ Coverage     67.24%   67.31%   +0.06%     
  Complexity       20       20              
============================================
  Files          1041     1039       -2     
  Lines         51564    51535      -29     
  Branches       7229     7228       -1     
============================================
+ Hits          34675    34689      +14     
+ Misses        14531    14475      -56     
- Partials       2358     2371      +13

Impacted Files	Coverage Δ	Complexity Δ
...ore/realtime/impl/kafka/SimpleConsumerFactory.java	`0% <ø> (ø)`	`0 <0> (?)`
...altime/impl/kafka/KafkaStreamConfigProperties.java	`25% <ø> (ø)`	`0 <0> (?)`
...ealtime/impl/kafka/SimpleConsumerMessageBatch.java	`100% <ø> (ø)`	`0 <0> (?)`
...altime/impl/kafka/KafkaStreamMetadataProvider.java	`44.7% <ø> (ø)`	`0 <0> (?)`
...altime/impl/kafka/KafkaPartitionLevelConsumer.java	`68.57% <ø> (ø)`	`0 <0> (?)`
...ealtime/impl/kafka/KafkaHighLevelStreamConfig.java	`74.57% <ø> (ø)`	`0 <0> (?)`
...re/realtime/impl/kafka/KafkaConnectionHandler.java	`76.04% <ø> (ø)`	`0 <0> (?)`
.../core/realtime/impl/kafka/ConsumerAndIterator.java	`90% <ø> (ø)`	`0 <0> (?)`
...core/realtime/impl/kafka/KafkaConsumerManager.java	`58.82% <ø> (ø)`	`0 <0> (?)`
...t/core/realtime/impl/kafka/KafkaBrokerWrapper.java	`47.05% <ø> (ø)`	`0 <0> (?)`
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d540d95...0d2645f. Read the comment docs.

snleee · 2019-05-29T00:24:51Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

mcvsubbu · 2019-05-29T01:05:13Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

snleee · 2019-05-29T01:16:43Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.

For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

mcvsubbu · 2019-05-29T01:17:55Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.

For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

snleee · 2019-05-29T01:33:30Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.
For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

I see. As long as we don't add too many modules and have a clear guideline, I'm fine with it. I just thought that associating a specific version number to the module name is not the flexible approach if we want to bump up Kafka versions frequently. Thanks for the quick comment!

mcvsubbu · 2019-05-29T16:36:02Z

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.
For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

I see. As long as we don't add too many modules and have a clear guideline, I'm fine with it. I just thought that associating a specific version number to the module name is not the flexible approach if we want to bump up Kafka versions frequently. Thanks for the quick comment!

When the new kafka module is added (hopefully by @ananthdurai ) we will have a README in the new dir saying that attempts should be made to bump kafka revisions as needed until kafka interface changes. We can even name the directory as kafka-2.1-or-higher.

mcvsubbu · 2019-05-30T17:29:40Z

I am fine with this. @ananthdurai , @kishoreg anyone have any opinions? @ananthdurai are you good in terms of implementing a new adapter for newer versions of kafka? Please follow the conversation in this PR for naming conventions and decide accordingly.

kishoreg · 2019-05-30T17:58:30Z

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.

Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

npawar · 2019-05-30T18:10:42Z

FYI @chenboat @jamesyfshao This directly affects the kafka implementation you currently use. There should be no impact, but please take a look

jamesyfshao · 2019-05-30T19:20:03Z

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

npawar · 2019-05-30T22:16:30Z

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.

Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

Sure, we can consider keeping it just 0.9, 1.0 etc, will check regarding the kafka guarantee about no changes in minor versions.

Regarding sub folder for all implementations, are you suggesting we put all implementations in 1 module (pinot-streams?)

npawar · 2019-05-30T22:18:02Z

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

Yes, you can create a new pinot-kafka-1.0 when you decide to work with newer version. We don't have any immediate plans of investigating it. It would be great if you can help out with that

kishoreg · 2019-05-31T00:22:11Z

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.
Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

Sure, we can consider keeping it just 0.9, 1.0 etc, will check regarding the kafka guarantee about no changes in minor versions.

Regarding sub folder for all implementations, are you suggesting we put all implementations in 1 module (pinot-streams?)

@npawar Thanks. I am suggesting one module per implementation but just group them under one directory. Similar to the way Flink has organized its connectors. https://github.com/apache/flink/tree/master/flink-connectors. pinot-connectors is actually a pretty good name for the subfolder and we can have pinot-connector-kafka-0.9 module.

mcvsubbu · 2019-05-31T00:40:29Z

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

+1 to what @npawar said, but the naming of the folders are still up in the air.

For example, if we name the directory 1.0.10 and it so happens that kafka does not change any APIs, and it is also valid for 2.0 then what happens?

snleee · 2019-05-31T01:02:08Z

@mcvsubbu kafka-1.0 module will import 1.0.x library. It may work with kafka-2.0 if there is no API change but we don't guarantee it. For explicit 2.0 support, we can create another module kafka-2.0 that imports 2.0.x and we can guarantee that it will work with kafka 2.0.

@npawar I vote +1 for following flink's approach to organize our modules. It's very clean. I also vote for pinot-kafka-0.9 instead of pinot-kafka-0.9.x.

mcvsubbu · 2019-05-31T01:10:21Z

@mcvsubbu kafka-1.0 module will import 1.0.x library. It may work with kafka-2.0 if there is no API change but we don't guarantee it. For explicit 2.0 support, we can create another module kafka-2.0 that imports 2.0.x and we can guarantee that it will work with kafka 2.0.

@npawar I vote +1 for following flink's approach to organize our modules. It's very clean. I also vote for pinot-kafka-0.9 instead of pinot-kafka-0.9.x.

In other words, you are suggesting that we copy over all files even if nothing changes, and keep doing this for every (major) kafka release.

snleee · 2019-05-31T01:24:37Z

@mcvsubbu Instead of copying code, we can document that 1.0 module is compatible with 2.0 if we verified that they are compatible. I didn't want to guarantee 1.0 would work with 2.0 because we might not want to test and track compatibility for every single kafka version. We can add module on demand when we need the new one after API change.

My suggestion was more about naming since I didn't like to put specific version because it seemed not to be the convention.

npawar · 2019-05-31T18:46:41Z

The flink model looks good. I've changed the PR to follow that convention. Please take a look @mcvsubbu @kishoreg @snleee

snleee · 2019-05-31T19:02:31Z

pinot-connectors/pom.xml

+    </dependency>
+  </dependencies>
+
+</project>


snleee · 2019-05-31T19:11:46Z

pom.xml

@@ -55,6 +55,7 @@
    <module>pinot-azure-filesystem</module>
    <module>pinot-orc</module>
    <module>pinot-parquet</module>
+    <module>pinot-connectors</module>


Not related to this part but have a question with the main pom.xml file. In my understanding, we currently specify kafka.version = 0.9.0.1 in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.

Not related to this part but have a question with the main pom.xml file. In my understanding, we currently specify kafka.version = 0.9.0.1 in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.

I added some notes in Issue #3998

For this iteration itself, I can move it to the pinot-connector-kafka-0.9's pom.xml, something like this: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/pom.xml
What say? @snleee

@npawar Does that mean that we can pass kafka.version as a property to mvn compilation command and pick default one (0.9 in our case) when it's not given?

If that's the case, it looks good to me 👍

This is one step in that direction. For now, I've just removed it from main pom, and kept it only in the connector.pom as a property. Next step will be to figure out a way to inject into it from the main pom

mcvsubbu

lgtm

kishoreg · 2019-05-31T23:36:15Z

Thanks. This looks good. Please write up this convention when you get a chance. Will be good to follow this convention for readers and deep storage.

snleee

LGTM. Thanks for working on this and addressing all the comments!

npawar changed the title ~~Kafka 0 9 0 1 module~~ pinot_kafka_0.9.0.1 module May 28, 2019

npawar requested a review from mcvsubbu May 29, 2019 00:00

mcvsubbu approved these changes May 30, 2019

View reviewed changes

npawar added 2 commits May 30, 2019 17:18

pinot-kafka-0.9.0.1 module

4b406bc

Revert controller port change

19fe8e6

Change module name to pinot-kafka-0.9.x

65b3034

npawar force-pushed the kafka_0_9_0_1_module branch from 065d920 to 65b3034 Compare May 31, 2019 00:49

Move pinot-kafka-0.9.x to pinot-connectors/pinot-connector-kafka-0.9

0da22a3

snleee reviewed May 31, 2019

View reviewed changes

mcvsubbu mentioned this pull request May 31, 2019

Upgrade to use Kafka release 2.1.1 #3998

Closed

mcvsubbu reviewed May 31, 2019

View reviewed changes

snleee approved these changes May 31, 2019

View reviewed changes

Remove kafka dependency from main pom.xml

0d2645f

npawar merged commit 00c2204 into apache:master Jun 1, 2019

npawar deleted the kafka_0_9_0_1_module branch June 1, 2019 00:38

npawar mentioned this pull request Jun 10, 2019

Package name changes for classes which moved from kafka->stream #4298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pinot_kafka_0.9.0.1 module #4252

pinot_kafka_0.9.0.1 module #4252

npawar commented May 28, 2019

codecov-io commented May 28, 2019 •

edited

snleee commented May 29, 2019 •

edited

mcvsubbu commented May 29, 2019

snleee commented May 29, 2019

mcvsubbu commented May 29, 2019

snleee commented May 29, 2019

mcvsubbu commented May 29, 2019

mcvsubbu commented May 30, 2019

kishoreg commented May 30, 2019

npawar commented May 30, 2019

jamesyfshao commented May 30, 2019

npawar commented May 30, 2019

npawar commented May 30, 2019

kishoreg commented May 31, 2019

mcvsubbu commented May 31, 2019

snleee commented May 31, 2019 •

edited

mcvsubbu commented May 31, 2019

snleee commented May 31, 2019 •

edited

npawar commented May 31, 2019

snleee May 31, 2019

snleee May 31, 2019

mcvsubbu May 31, 2019

npawar May 31, 2019

snleee May 31, 2019

npawar Jun 1, 2019

mcvsubbu left a comment

kishoreg commented May 31, 2019

snleee left a comment

pinot_kafka_0.9.0.1 module #4252

pinot_kafka_0.9.0.1 module #4252

Conversation

npawar commented May 28, 2019

codecov-io commented May 28, 2019 • edited

Codecov Report

snleee commented May 29, 2019 • edited

mcvsubbu commented May 29, 2019

snleee commented May 29, 2019

mcvsubbu commented May 29, 2019

snleee commented May 29, 2019

mcvsubbu commented May 29, 2019

mcvsubbu commented May 30, 2019

kishoreg commented May 30, 2019

npawar commented May 30, 2019

jamesyfshao commented May 30, 2019

npawar commented May 30, 2019

npawar commented May 30, 2019

kishoreg commented May 31, 2019

mcvsubbu commented May 31, 2019

snleee commented May 31, 2019 • edited

mcvsubbu commented May 31, 2019

snleee commented May 31, 2019 • edited

npawar commented May 31, 2019

snleee May 31, 2019

Choose a reason for hiding this comment

snleee May 31, 2019

Choose a reason for hiding this comment

mcvsubbu May 31, 2019

Choose a reason for hiding this comment

npawar May 31, 2019

Choose a reason for hiding this comment

snleee May 31, 2019

Choose a reason for hiding this comment

npawar Jun 1, 2019

Choose a reason for hiding this comment

mcvsubbu left a comment

Choose a reason for hiding this comment

kishoreg commented May 31, 2019

snleee left a comment

Choose a reason for hiding this comment

codecov-io commented May 28, 2019 •

edited

snleee commented May 29, 2019 •

edited

snleee commented May 31, 2019 •

edited

snleee commented May 31, 2019 •

edited