-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Error Handling to Kafka IO #29546
Conversation
# Conflicts: # CHANGES.md
Create gradle task and github actions config for GCS using this.
…tests Feature/automate performance tests
…tests fix call to gradle
…tests run on hosted runner for testing
…tests add additional checkout
…tests add destination for triggered tests
…tests move env variables to correct location
…tests try uploading against separate dataset
…tests try without a user
…tests update branch checkout, try to view the failure log
…tests run on failure
…tests update to use correct BigQuery instance
…tests convert to matrix
update error handler to be serializable to support using it as a member of an auto-value based PTransform
…er-queue-core # Conflicts: # CHANGES.md
…d-dlq # Conflicts: # sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/errorhandling/ErrorHandler.java
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
@@ -18,6 +18,7 @@ | |||
project.ext { | |||
delimited="0.11.0.3" | |||
undelimited="01103" | |||
sdfCompatable=false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "Compatable" a typo here?
@@ -739,6 +754,9 @@ abstract Builder<K, V> setValueDeserializerProvider( | |||
|
|||
abstract Builder<K, V> setCheckStopReadingFn(@Nullable CheckStopReadingFn checkStopReadingFn); | |||
|
|||
abstract Builder<K, V> setBadRecordErrorHandler( | |||
@Nullable ErrorHandler<BadRecord, ?> badRecordErrorHandler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will we ever put null in this parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will never set it to be null. This is @nullable to indicate that the property may not be set, and that is valid
} catch (SerializationException e) { | ||
// This exception should only occur during the key and value deserialization when | ||
// creating the Kafka Record | ||
badRecordRouter.route( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: do we have timestamp in DLQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments and questions. Overall LGTM.
GCP IO Direct test is a flake, whitespace failure is due to files this PR doesn't touch |
* Update 2.50 release notes to include new Kafka topicPattern feature * Create groovy class for io performance tests Create gradle task and github actions config for GCS using this. * delete unnecessary class * fix env call * fix call to gradle * run on hosted runner for testing * add additional checkout * add destination for triggered tests * move env variables to correct location * try uploading against separate dataset * try without a user * update branch checkout, try to view the failure log * run on failure * update to use correct BigQuery instance * convert to matrix * add result reporting * add failure clause * remove failure clause, update to run on self-hosted * address comments, clean up build * clarify branching * Add error handling base implementation & test DLQ enabled class * Add test cases * apply spotless * Fix Checkstyles * Fix Checkstyles * make DLH serializable * rename dead letter to bad record * make DLH serializable * Change bad record router name, and use multioutputreceiver instead of process context * Refactor BadRecord to be nested * clean up checkstyle * Update error handler test * Add metrics for counting error records, and for measuring feature usage * apply spotless * fix checkstyle * make metric reporting static * spotless * Rework annotations to be an explicit label on a PTransform, instead of using java annotations * fix checkstyle * Address comments * Address comments * Fix test cases, spotless * remove flatting without error collections * fix nullness * spotless + encoding issues * spotless * throw error when error handler isn't used * add concrete bad record error handler class * spotless, fix test category * fix checkstyle * clean up comments * fix test case * initial wiring of error handler into KafkaIO Read * remove "failing transform" field on bad record, add note to CHANGES.md * fix failing test cases * fix failing test cases * apply spotless * Add tests * Add tests * fix test case * add documentation * wire error handler into kafka write * fix failing test case * Add tests for writing to kafka with exception handling * fix sdf testing * fix sdf testing * spotless * deflake tests * add error handling to kafka streaming example update error handler to be serializable to support using it as a member of an auto-value based PTransform * apply final comments * apply final comments * apply final comments * add line to CHANGES.md * fix spotless * fix checkstyle * make sink transform static for serialization * spotless * fix typo * fix typo * fix spotbugs
* Update 2.50 release notes to include new Kafka topicPattern feature * Create groovy class for io performance tests Create gradle task and github actions config for GCS using this. * delete unnecessary class * fix env call * fix call to gradle * run on hosted runner for testing * add additional checkout * add destination for triggered tests * move env variables to correct location * try uploading against separate dataset * try without a user * update branch checkout, try to view the failure log * run on failure * update to use correct BigQuery instance * convert to matrix * add result reporting * add failure clause * remove failure clause, update to run on self-hosted * address comments, clean up build * clarify branching * Add error handling base implementation & test DLQ enabled class * Add test cases * apply spotless * Fix Checkstyles * Fix Checkstyles * make DLH serializable * rename dead letter to bad record * make DLH serializable * Change bad record router name, and use multioutputreceiver instead of process context * Refactor BadRecord to be nested * clean up checkstyle * Update error handler test * Add metrics for counting error records, and for measuring feature usage * apply spotless * fix checkstyle * make metric reporting static * spotless * Rework annotations to be an explicit label on a PTransform, instead of using java annotations * fix checkstyle * Address comments * Address comments * Fix test cases, spotless * remove flatting without error collections * fix nullness * spotless + encoding issues * spotless * throw error when error handler isn't used * add concrete bad record error handler class * spotless, fix test category * fix checkstyle * clean up comments * fix test case * initial wiring of error handler into KafkaIO Read * remove "failing transform" field on bad record, add note to CHANGES.md * fix failing test cases * fix failing test cases * apply spotless * Add tests * Add tests * fix test case * add documentation * wire error handler into kafka write * fix failing test case * Add tests for writing to kafka with exception handling * fix sdf testing * fix sdf testing * spotless * deflake tests * add error handling to kafka streaming example update error handler to be serializable to support using it as a member of an auto-value based PTransform * apply final comments * apply final comments * apply final comments * add line to CHANGES.md * fix spotless * fix checkstyle * make sink transform static for serialization * spotless * fix typo * fix typo * fix spotbugs
Add .withBadRecordHandler() functionality to both KafkaIO.read() and .write()
This also adds usage to the Kafka streaming example
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.