New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pinot_kafka_0.9.0.1 module #4252
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4252 +/- ##
============================================
+ Coverage 67.24% 67.31% +0.06%
Complexity 20 20
============================================
Files 1041 1039 -2
Lines 51564 51535 -29
Branches 7229 7228 -1
============================================
+ Hits 34675 34689 +14
+ Misses 14531 14475 -56
- Partials 2358 2371 +13
Continue to review full report at Codecov.
|
What is the plan on the naming for the module for different Kafka version implementation? I think that we should keep one implementation for each major version (e.g. |
In principle yes, but are there guarantees that kafka does not change the api within these smaller releases? I am not sure. |
I would still prefer to keep one package per major version. If we need to bump up the minor version and kafka is not backward compatible, we can change our code to use new api. I don't like the approach of creating module for each version. For example, let's say that we want to bump up the version from 2.1.1 to 2.1.2 and they are backward compatible (or not). Are we going to create another module for |
Each time they change API, we create a new version. If they never change API starting 2.1, we are good forever. How does that sound? |
I see. As long as we don't add too many modules and have a clear guideline, I'm fine with it. I just thought that associating a specific version number to the module name is not the flexible approach if we want to bump up Kafka versions frequently. Thanks for the quick comment! |
When the new kafka module is added (hopefully by @ananthdurai ) we will have a README in the new dir saying that attempts should be made to bump kafka revisions as needed until kafka interface changes. We can even name the directory as kafka-2.1-or-higher. |
I am fine with this. @ananthdurai , @kishoreg anyone have any opinions? @ananthdurai are you good in terms of implementing a new adapter for newer versions of kafka? Please follow the conversation in this PR for naming conventions and decide accordingly. |
I missed the comment from @snleee of naming. I agree with him and prefer 0.9.x and I think Kafka guarantees that API's won't change in minor versions. The problem with naming it 0.9.0.1 is that its not a common convention and users might think that it works only for that version of Kafka which is not the case. See Flink naming convention for example https://github.com/apache/flink/tree/master/flink-connectors. Another suggestion is to create a subfolder for all implementations if possible. We have too many modules at the root pinot-orc, pinot-parquet, pinot-hadoop-filesytem. If it's not too hard, we can move kafka module into a subfolder. |
FYI @chenboat @jamesyfshao This directly affects the kafka implementation you currently use. There should be no impact, but please take a look |
Thanks for the update, we have some plans to upgrade to later version of Kafka (> 1.0). I think we should open a new module named like pinot-kafka-1.0 when we work on the new version? We also have plans to migrate away from the scala-based kafka simple consumer to the new consumer model as it is deprecated since 0.10 and the new consumer has compatible API as well more features. Do you guys have any work planned for it or we can investigate more into this effort? @chenboat |
Sure, we can consider keeping it just 0.9, 1.0 etc, will check regarding the kafka guarantee about no changes in minor versions. Regarding sub folder for all implementations, are you suggesting we put all implementations in 1 module (pinot-streams?) |
Yes, you can create a new pinot-kafka-1.0 when you decide to work with newer version. We don't have any immediate plans of investigating it. It would be great if you can help out with that |
@npawar Thanks. I am suggesting one module per implementation but just group them under one directory. Similar to the way Flink has organized its connectors. https://github.com/apache/flink/tree/master/flink-connectors. pinot-connectors is actually a pretty good name for the subfolder and we can have pinot-connector-kafka-0.9 module. |
+1 to what @npawar said, but the naming of the folders are still up in the air. For example, if we name the directory 1.0.10 and it so happens that kafka does not change any APIs, and it is also valid for 2.0 then what happens? |
065d920
to
65b3034
Compare
@mcvsubbu @npawar I vote +1 for following flink's approach to organize our modules. It's very clean. I also vote for |
In other words, you are suggesting that we copy over all files even if nothing changes, and keep doing this for every (major) kafka release. |
@mcvsubbu Instead of copying code, we can document that My suggestion was more about naming since I didn't like to put specific version because it seemed not to be the convention. |
</dependency> | ||
</dependencies> | ||
|
||
</project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line
@@ -55,6 +55,7 @@ | |||
<module>pinot-azure-filesystem</module> | |||
<module>pinot-orc</module> | |||
<module>pinot-parquet</module> | |||
<module>pinot-connectors</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this part but have a question with the main pom.xml
file. In my understanding, we currently specify kafka.version = 0.9.0.1
in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this part but have a question with the main
pom.xml
file. In my understanding, we currently specifykafka.version = 0.9.0.1
in the main pom.xml file. After we add the new connector, each connector module will need to pull different kafka version. It will be good if we can start to think about the issue and how we will be managing that.
I added some notes in Issue #3998
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this iteration itself, I can move it to the pinot-connector-kafka-0.9's pom.xml, something like this: https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/pom.xml
What say? @snleee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npawar Does that mean that we can pass kafka.version
as a property to mvn compilation command and pick default one (0.9 in our case) when it's not given?
If that's the case, it looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one step in that direction. For now, I've just removed it from main pom, and kept it only in the connector.pom as a property. Next step will be to figure out a way to inject into it from the main pom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks. This looks good. Please write up this convention when you get a chance. Will be good to follow this convention for readers and deep storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on this and addressing all the comments!
Step 3 of #3998
As discussed in the issue, we are planning to move out kafka stream implementation related classes into its own module
pinot-kafka-0.9.0.1
. This should enable folks to plug in implementations for other kafka versions.TODO: the pinot-integration-tests, pinot-tools and pinot-perf still depend on pinot-kafka-0.9.0.1 for the tests and dummy setup. We need to figure out a way by which we do not need to specify the dependency, but be able to inject the classes depending on the version we wish to use.