-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11482] Thrift support for KafkaTableProvider #13572
Conversation
a42657c
to
73397c9
Compare
@TheNeuralBit Is installing the |
5eae861
to
a203a09
Compare
R: @piotr-szuberski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I've left few comments, mostly cosmetic.
I don't have any knowledge about Thrift though so I'd suggest adding someone who took part in it.
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProvider.java
Outdated
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #13572 +/- ##
==========================================
- Coverage 82.74% 82.73% -0.02%
==========================================
Files 466 466
Lines 57518 57518
==========================================
- Hits 47594 47586 -8
- Misses 9924 9932 +8
Continue to review full report at Codecov.
|
.../java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderThriftIT.java
Outdated
Show resolved
Hide resolved
bd1aded
to
c0680a4
Compare
@piotr-szuberski I pushed a fixup to address all your comments. I'll double check if I missed anything, but please have another look. |
3f24ab1
to
5ab68f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've left only one comment.
...test/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderTest.java
Show resolved
Hide resolved
...c/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/BeamKafkaThriftTable.java
Show resolved
Hide resolved
Squashed it all in a single commit and rebased on top of master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more requests. This is looking good overall
.../java/org/apache/beam/sdk/extensions/sql/meta/provider/kafka/KafkaTableProviderThriftIT.java
Outdated
Show resolved
Hide resolved
import org.apache.beam.sdk.values.Row; | ||
import org.apache.beam.sdk.values.TypeDescriptor; | ||
|
||
public final class SqlRows { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting close to the abstraction that I was discussing in your other PR. A couple suggestions for this class:
- Let's make it
@Internal
, we don't want Beam users to use this and expect backwards compatibility. It's only public so other Beam modules can use it. - I think it belongs in the schemas package
- Drop SQL from the name. Right now this is only used in SQL, but we will want to use this infra outside of SQL at some point (which is why I don't think it belongs in the sql package). Maybe something like
RowMessages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a tiny step towards whatever abstractions we want to end up with, as it doesn't even cover Avro yet.
I'm still trying to wrap my head around how would a proper data format abstraction look like in Beam.
I mean, instead of FileIO
, AvroIO
, KafkaIO
, ThriftIO
plus Kafka tables for each data format, what if we took the format out of the IO as an entirely different concern and provide the ability to mix those freely. If IOs are just [File, Kafka, PubSub, Socket, whatever]
, and data formats are [Avro, Protobuf, CSV, Thrift, whatever]
, we can (in theory) allow any combination like PubSub+Thrift or File+Proto with no extra cost (except for incompatible candidates, like Kafka+parquet, perhaps).
Do you think this is worth considering (for 3.0, of course)? I am aware that such an abstraction is really difficult to get right (if at all feasible), but I still think it may be worth trying, as we should at least end up reducing the number of data_source+data_format combinations that are now modelled as new datasource types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this sounds ideal. As far as 3.0 - I think it would make sense to start working on it in 2.x versions as an @Experimental
API. We shouldn't remove the existing IOs prior to a major version change, but it would be fine to start work on an alternative. This would be similar to (and related to) the SchemaIOProvider
concept we had a Google intern work on last summer:
Would you have any interest in working on an effort like this?
CC: @robinyqiu a design like this would be relevant for your work on IOs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more than happy to get involved. Most likely, I'll need to join your slack channel, and for that I need an apache.org email, apparently. I'll see how I can sort that out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you can get an invite at https://s.apache.org/slack-invite (more info here: https://infra.apache.org/slack.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds easy. Not sure who's supposed to invite me, but I'll ask around. If you can help with that, I'd be grateful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can invite you as a guest. I think https://s.apache.org/slack-invite is supposed to be self-service but the page indicates it breaks it often. Can you send me a message at bhulette@apache.org so I know what email to use?
The branch has already been cut so we'd need to cherrypick it to get it into 2.27.0. Usually that's reserved for bugfixes, but it's up to the discretion of the release manager |
3e452e0
to
a73e3c2
Compare
Run SQL Postcommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @ccciudatu!
@chrlarsen would you have time to take a look? Specifically it would be good to check over the changes to |
@TheNeuralBit definitely I'll have a look. |
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Outdated
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Show resolved
Hide resolved
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftSchema.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The ThriftSchema
changes look to be in line with what the Thrift IDL specifies, thanks @ccciudatu!
@piotr-szuberski @TheNeuralBit @chrlarsen Thanks for looking into this! I resolved merge conflicts for
|
1ef342d
to
7051745
Compare
Run SQL PostCommit |
@ccciudatu Thanks for contribution! For the future PRs, please, don't use your local master (or main) branch, create a dedicated feature branch for changes. Perhaps, it might help: |
@aromanenko-dev whoops sorry about that, I didn't notice this PR was opened from a master branch. Just curious, did it cause any problems in this case? |
@TheNeuralBit Afaict, it should not cause big problems after it was merged (only it won't be possible to delete a "feature" branch in fork and it will rewrite a commit history I guess). Though, until merge, all pushes to fork master branch will appear in the PR which likely won't be acceptable and better avoid this. |
This is a follow-up for #13428 to add Thrift support in
KafkaTableProvider
, since we now have all the building blocks for aBeamKafkaThriftTable
.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.