-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11065] Apache Beam pipeline example to ingest from Apache Kafka to Google Pub/Sub #13112
Conversation
…ecret storage with url and token
…ctored Kafka configuration
Also pinging @azurezyq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel strongly that this example should drop the "Beam template" terminology. "templates" (both flex, and classic) are a Dataflow feature, and the concept does not have meaning in the Beam context.
A Dataflow template is just a Beam pipeline that has been packaged into a container. This allows us (Dataflow) to provide a convenient way for users to run multiple instances of this pipeline (the template) just by specifying a few parameters, which are mapped to the pipeline options. Beam doesn't provide any features that can do that mapping, which is why I take issue with the "Beam template" concept.
That doesn't mean that something like this isn't useful in Apache Beam, it absolutely is! But for Beam the analogue is just an example pipeline, like the other pipelines in examples/
. The language in the README suggesting that users modify the pipeline to try out different behavior is exactly the sort of thing we encourage people to do with the other example pipelines - for example in the wordcount walkthrough.
Fortunately I don't think there's much to do to make this into a Beam example:
- Move the code to
examples/
, in theorg.apache.beam.examples
package. - Remove Dataflow-template specific files (I think this is just kafka_to_pubsub_metadata.json)
- Remove directions for running this as a Dataflow template, and consider replacing them with directions for running on Dataflow (and other runners) directly.
Thanks for the review @TheNeuralBit @ilya-kozyrev these suggestions make sense. What we should aim to do is to provide an example of how a Beam pipeline is constructed. In the Read me file, we can add that this Beam pipeline was also created as a Dataflow Flex Template and people can learn how to write something like this in the Dataflow templates repo, we can then provide a link to your templated hosted there. |
This is a good idea. There could be a pointer the other direction as well (from the Dataflow template to the Beam example). In case anyone discovers the flex template and is interested in learning how to run a similar pipeline on other runners |
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think actually the right place to put this would be in examples/java/complete/kafka-to-pubsub
. Then we should link to it from apache/beam/examples/complete/README.md
(sorry I wasn't more clear on this before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I accidentally hit submit early on the other review. I have a few more comments.
examples/kafka-to-pubsub/src/main/java/org/apache/beam/examples/KafkaToPubsub.java
Outdated
Show resolved
Hide resolved
.apply("createValues", Values.create()) | ||
.apply("writeAvrosToPubSub", PubsubIO.writeAvros(AvroDataClass.class)); | ||
|
||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having this PUBSUB path? The README and javadoc only discuss the AVRO path. I think we should just have that one and remove the enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why it may seem invaluable for the example, thank you for noticing this!
I suppose it is worth having the PUBSUB path because this example works out-of-the-box with it. For the AVRO path, the user has to add some code to make it work and also to understand what and how should be changed - the PUBSUB path doesn't require it.
I also updated the README file to highlight the value of it.
Co-authored-by: Brian Hulette <hulettbh@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all your work on this!
… to Google Pub/Sub (apache#13112) * add initial template and dependencies * Added flex template creation with metadata support and instructions * added new gradle modules for templates * moved metadata to template location, reverted examples build.gradle * Moved KafkaToPubsub to template, implemented options in separate package * Added package-info.java to new packages * Reverted build.gradle to master branch state * fixed JAVADOC and metadata * Added the Read Me section with a step-by-step guide * Update README.md * Readme fixes regarding comments * Update README.md * Update README.md * Fixed typos in README.md * refactored README.md added case to run template locally * Update README.md * fix build script for dataflow in README.md * Added unit test and fixed metadata file * Added Licenses and style fixes * Added support for retrieving Kafka credentials from HashiCorp Vault secret storage with url and token * Updated README.md and metadata with parameters for Vault access; refactored Kafka configuration * Style fix * Added description for Vault parameters in metadata * FIX trailing whitespaces in README.md * FIX. Blank line contains whitespace README.md * Update README.md * Refactored to examples folder * Added conversion from JSON into PubsubMessage and extracted all transformations from the pipeline class into the separate class * Whitespacelint fix * Updated README.md and output formats * Update README.md * Update README.md * Added support for SSL and removed outputFormat option * Added avro usage example * Added ssl to AVRO reader * FIX whitespaces. * added readme/docs regarding of Avro * README.md and javadoc fixes * Added Vault's response JSON schema description * Style fix * Refactoring. * Fixed ssl parameters * Fixed style * optimize build.gradle * Resolve conversations * Updated regarding comments and added unit tests * README.md update * made Avro class more abstract * fix style * fixed review conversation items * fix getting ssl credentials from Vault * FIX add empty && null map validation to sslConfig * FIX. remove vault ssl certs parameters * metadata fix * Local paths fix for SSL from GCS * add new log message to avoid wrong local files usage * fix style * Moved kafka-to-pubsub to examples/ directory and updated README.md (#6) * Stylefix * Removed unused file * add tbd section for e-2-e tests * fix styles * specifying kafka-clients version * fix readme * template -> exmples * Update examples/kafka-to-pubsub/README.md Co-authored-by: Brian Hulette <hulettbh@gmail.com> * Fixed outdated import * Moved template to examples/complete * Updated paths in readme file * Updated README.md and javadoc regarding comments * README.md stylefix * Added link to KafkaToPubsub example into complete/README.md * Stylefix Co-authored-by: Artur Khanin <artur.khanin@kzn.akvelon.com> Co-authored-by: Artur Khanin <artur.khanin@akvelon.com> Co-authored-by: AKosolapov <AKosolapov@users.noreply.github.com> Co-authored-by: ramazan-yapparov <75415515+ramazan-yapparov@users.noreply.github.com> Co-authored-by: Ramazan Yapparov <ramazan.yapparov@akvelon.com> Co-authored-by: Brian Hulette <hulettbh@gmail.com>
[Proposal] Apache Beam pipeline example to ingest data from Apache Kafka to Google Cloud Pub/Sub. It can be used as a Dataflow Flex template in the Google Cloud Platform.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.