Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinot_kafka_0.9.0.1 module #4252

Merged
merged 5 commits into from Jun 1, 2019
Merged

Conversation

npawar
Copy link
Contributor

@npawar npawar commented May 28, 2019

Step 3 of #3998
As discussed in the issue, we are planning to move out kafka stream implementation related classes into its own module pinot-kafka-0.9.0.1. This should enable folks to plug in implementations for other kafka versions.
TODO: the pinot-integration-tests, pinot-tools and pinot-perf still depend on pinot-kafka-0.9.0.1 for the tests and dummy setup. We need to figure out a way by which we do not need to specify the dependency, but be able to inject the classes depending on the version we wish to use.

@npawar npawar changed the title Kafka 0 9 0 1 module pinot_kafka_0.9.0.1 module May 28, 2019
@codecov-io
Copy link

codecov-io commented May 28, 2019

Codecov Report

Merging #4252 into master will increase coverage by 0.06%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4252      +/-   ##
============================================
+ Coverage     67.24%   67.31%   +0.06%     
  Complexity       20       20              
============================================
  Files          1041     1039       -2     
  Lines         51564    51535      -29     
  Branches       7229     7228       -1     
============================================
+ Hits          34675    34689      +14     
+ Misses        14531    14475      -56     
- Partials       2358     2371      +13
Impacted Files Coverage Δ Complexity Δ
...ore/realtime/impl/kafka/SimpleConsumerFactory.java 0% <ø> (ø) 0 <0> (?)
...altime/impl/kafka/KafkaStreamConfigProperties.java 25% <ø> (ø) 0 <0> (?)
...ealtime/impl/kafka/SimpleConsumerMessageBatch.java 100% <ø> (ø) 0 <0> (?)
...altime/impl/kafka/KafkaStreamMetadataProvider.java 44.7% <ø> (ø) 0 <0> (?)
...altime/impl/kafka/KafkaPartitionLevelConsumer.java 68.57% <ø> (ø) 0 <0> (?)
...ealtime/impl/kafka/KafkaHighLevelStreamConfig.java 74.57% <ø> (ø) 0 <0> (?)
...re/realtime/impl/kafka/KafkaConnectionHandler.java 76.04% <ø> (ø) 0 <0> (?)
.../core/realtime/impl/kafka/ConsumerAndIterator.java 90% <ø> (ø) 0 <0> (?)
...core/realtime/impl/kafka/KafkaConsumerManager.java 58.82% <ø> (ø) 0 <0> (?)
...t/core/realtime/impl/kafka/KafkaBrokerWrapper.java 47.05% <ø> (ø) 0 <0> (?)
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d540d95...0d2645f. Read the comment docs.

@npawar npawar requested a review from mcvsubbu May 29, 2019 00:00
@snleee
Copy link
Contributor

snleee commented May 29, 2019

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

@mcvsubbu
Copy link
Contributor

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

@snleee
Copy link
Contributor

snleee commented May 29, 2019

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.

For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

@mcvsubbu
Copy link
Contributor

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.

For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

@snleee
Copy link
Contributor

snleee commented May 29, 2019

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.
For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

I see. As long as we don't add too many modules and have a clear guideline, I'm fine with it. I just thought that associating a specific version number to the module name is not the flexible approach if we want to bump up Kafka versions frequently. Thanks for the quick comment!

@mcvsubbu
Copy link
Contributor

What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. 0.9.x, 1.0.x,2.0.x...) instead of having each module for each specific Kafka version.

In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure.

I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version.
For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for 2.1.2 and copy more or less the same code? or are we going to change the package module name?

Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound?

I see. As long as we don't add too many modules and have a clear guideline, I'm fine with it. I just thought that associating a specific version number to the module name is not the flexible approach if we want to bump up Kafka versions frequently. Thanks for the quick comment!

When the new kafka module is added (hopefully by @ananthdurai ) we will have a README in the new dir saying that attempts should be made to bump kafka revisions as needed until kafka interface changes. We can even name the directory as kafka-2.1-or-higher.

@mcvsubbu
Copy link
Contributor

I am fine with this. @ananthdurai , @kishoreg anyone have any opinions? @ananthdurai are you good in terms of implementing a new adapter for newer versions of kafka? Please follow the conversation in this PR for naming conventions and decide accordingly.

@kishoreg
Copy link
Member

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.

Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

@npawar
Copy link
Contributor Author

npawar commented May 30, 2019

FYI @chenboat @jamesyfshao This directly affects the kafka implementation you currently use. There should be no impact, but please take a look

@jamesyfshao
Copy link
Contributor

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

@npawar
Copy link
Contributor Author

npawar commented May 30, 2019

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.

Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

Sure, we can consider keeping it just 0.9, 1.0 etc, will check regarding the kafka guarantee about no changes in minor versions.

Regarding sub folder for all implementations, are you suggesting we put all implementations in 1 module (pinot-streams?)

@npawar
Copy link
Contributor Author

npawar commented May 30, 2019

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

Yes, you can create a new pinot-kafka-1.0 when you decide to work with newer version. We don't have any immediate plans of investigating it. It would be great if you can help out with that

@kishoreg
Copy link
Member

I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors.
Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder.

Sure, we can consider keeping it just 0.9, 1.0 etc, will check regarding the kafka guarantee about no changes in minor versions.

Regarding sub folder for all implementations, are you suggesting we put all implementations in 1 module (pinot-streams?)

@npawar Thanks. I am suggesting one module per implementation but just group them under one directory. Similar to the way Flink has organized its connectors. https://github.com/apache/flink/tree/master/flink-connectors. pinot-connectors is actually a pretty good name for the subfolder and we can have pinot-connector-kafka-0.9 module.

@mcvsubbu
Copy link
Contributor

Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version?

We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat

+1 to what @npawar said, but the naming of the folders are still up in the air.

For example, if we name the directory 1.0.10 and it so happens that kafka does not change any APIs, and it is also valid for 2.0 then what happens?

@snleee
Copy link
Contributor

snleee commented May 31, 2019

@mcvsubbu kafka-1.0 module will import 1.0.x library. It may work with kafka-2.0 if there is no API change but we don't guarantee it. For explicit 2.0 support, we can create another module kafka-2.0 that imports 2.0.x and we can guarantee that it will work with kafka 2.0.

@npawar I vote +1 for following flink's approach to organize our modules. It's very clean. I also vote for pinot-kafka-0.9 instead of pinot-kafka-0.9.x.

@mcvsubbu
Copy link
Contributor

@mcvsubbu kafka-1.0 module will import 1.0.x library. It may work with kafka-2.0 if there is no API change but we don't guarantee it. For explicit 2.0 support, we can create another module kafka-2.0 that imports 2.0.x and we can guarantee that it will work with kafka 2.0.

@npawar I vote +1 for following flink's approach to organize our modules. It's very clean. I also vote for pinot-kafka-0.9 instead of pinot-kafka-0.9.x.

In other words, you are suggesting that we copy over all files even if nothing changes, and keep doing this for every (major) kafka release.

@snleee
Copy link
Contributor

snleee commented May 31, 2019

@mcvsubbu Instead of copying code, we can document that 1.0 module is compatible with 2.0 if we verified that they are compatible. I didn't want to guarantee 1.0 would work with 2.0 because we might not want to test and track compatibility for every single kafka version. We can add module on demand when we need the new one after API change.

My suggestion was more about naming since I didn't like to put specific version because it seemed not to be the convention.

@npawar
Copy link
Contributor Author

npawar commented May 31, 2019

The flink model looks good. I've changed the PR to follow that convention. Please take a look @mcvsubbu @kishoreg @snleee

</dependency>
</dependencies>

</project>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

@@ -55,6 +55,7 @@
<module>pinot-azure-filesystem</module>
<module>pinot-orc</module>
<module>pinot-parquet</module>
<module>pinot-connectors</module>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this part but have a question with the main pom.xml file. In my understanding, we currently specify kafka.version = 0.9.0.1 in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this part but have a question with the main pom.xml file. In my understanding, we currently specify kafka.version = 0.9.0.1 in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.

I added some notes in Issue #3998

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this iteration itself, I can move it to the pinot-connector-kafka-0.9's pom.xml, something like this: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/pom.xml
What say? @snleee

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npawar Does that mean that we can pass kafka.version as a property to mvn compilation command and pick default one (0.9 in our case) when it's not given?

If that's the case, it looks good to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one step in that direction. For now, I've just removed it from main pom, and kept it only in the connector.pom as a property. Next step will be to figure out a way to inject into it from the main pom

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kishoreg
Copy link
Member

Thanks. This looks good. Please write up this convention when you get a chance. Will be good to follow this convention for readers and deep storage.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for working on this and addressing all the comments!

@npawar npawar merged commit 00c2204 into apache:master Jun 1, 2019
@npawar npawar deleted the kafka_0_9_0_1_module branch June 1, 2019 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants