-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-11111 add option to output Elasticsearch error responses as FlowFile to PutElasticsearchJson and PutElasticsearchRecord #6903
Conversation
653d4b3
to
7976f05
Compare
...cessors/src/main/java/org/apache/nifi/processors/elasticsearch/AbstractPutElasticsearch.java
Outdated
Show resolved
Hide resolved
...cessors/src/main/java/org/apache/nifi/processors/elasticsearch/AbstractPutElasticsearch.java
Outdated
Show resolved
Hide resolved
862428e
to
cc9622f
Compare
try (final OutputStream errorsOutputStream = session.write(errorResponsesFF)) { | ||
errorMapper.writeValue(errorsOutputStream, errorResponses); | ||
|
||
errorResponsesFF = session.putAttribute(errorResponsesFF, "elasticsearch.put.error.count", String.valueOf(errorResponses.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested this live? I've had problems with this in the past where the ProcessSession didn't want to update the attributes while the OutputStream was open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed to work OK when I tried it, but should be able to move this outside of the try-with-resources if you've seen it cause problems in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test it out locally to verify. I think I had that problem on 1.13, so it may be fixed by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, looking at the code it would make more sense for the transfer
and attributes
calls to be outside of the try-with-resources, so I've made the change and will push it once all the tests have passed locally 👍
...src/test/groovy/org/apache/nifi/processors/elasticsearch/AbstractPutElasticsearchTest.groovy
Show resolved
Hide resolved
cc9622f
to
a73f82d
Compare
@mattyb149 I believe I addressed your comments @MikeThomsen do the updates help with clarity around the transfer of FlowFiles/assignment of attributes? |
Sorry I lost track of this, @MikeThomsen I'm good if you are |
I've been struggling with the various PutElasticsearch processors and their different handling of responses/errors for a while now, and nothing really does what I would expect it to do. The only one that kind of works is PutElasticsearchHttp because at least that one places the "reason" attribute on the flowfile, but it doesn't handle all types of errors and it makes a bunch of assumptions as to what is retryable. What I really need is for the original documents that fail on a bulk index/create to be routed to failure or error and have the error message and status code added as flowfile attributes. This would enable custom routing logic based on status code and/or error message. I can't tell if that's what this PR is attempting to do. Is that what this PR does or should I post a new feature request for this? If it's just sending the error response documents as separate flowfiles I can see how some developers might find that useful, but it doesn't really help with rerouting the original document and providing error information. |
…File to PutElasticsearchJson and PutElasticsearchRecord
…son/Record processors
@davis-anthony this PR if/when Approved (@MikeThomsen / @mattyb149 ) was not going to introduce such behaviour, but it does make sense for It's not as simple for A flow I've used before for handling things like errors from a record processor such as Notes:
|
ac9454c
to
8aafcfb
Compare
…_bulk response for error documents in PutElasticsearchJson
8aafcfb
to
037d50d
Compare
@ChrisSamo632 Thanks for the explanation and for helping me understand the issues with memory. Yes that makes a lot of sense and after reading the code I see that the processor makes every effort to just pass the documents to the various relationships. So with that in mind I propose that we make these processors behave like the RouteOnAttribute processor where the developer can define additional relationships based on expected error types. So that the processor can simply compare the error type for each item in the collection and route to the various relationships. For example, I might define the following: conflict -> version_conflict_engine_exception Then the following relationships could be routed in the flow: success, error, retry, failure, conflict, schema, blocked. The result would be that errors that match the 3 types are routed to 3 custom routes. The remaining errors and other error routes would operate as they do now. The only flaw I can think of here is that there is no definitive list of error types according to the following post, and so it takes quite a bit of sleuthing to find the possible errors and they are likely subject to change since there is no list. |
@davis-anthony that would definitely warrant a new Jira ticket to make such a change I think. Some sort of "half-way house" approach may be more appropriate, e.g. like PartitionRecord where Feel free to raise a ticket for further discussion. |
NIFI-11480 for the above ☝️ discussion, FYI @davis-anthony |
static final Relationship REL_SUCCESS = new Relationship.Builder() | ||
.name("success") | ||
.description("All flowfiles that succeed in being transferred into Elasticsearch go here. " + | ||
"Documents received by the Elasticsearch _bulk API may still result in errors on the Elasticsearch side. " + | ||
"The Elasticsearch response will need to be examined to determine whether any Document(s)/Record(s) resulted in errors.") | ||
.build(); | ||
|
||
static final Relationship REL_ERROR_RESPONSES = new Relationship.Builder() | ||
.name("error_responses") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be autoterminated because it changes the behavior.
…File to PutElasticsearchJson and PutElasticsearchRecord NIFI-11111 clarify error_responses relationships in PutElasticsearchJson/Record processors NIFI-11111 Refactor exception handling for error response flowfile transfer NIFI-11111 Add elasticsearch.bulk.error attributes containing the Elasticsearch _bulk response for error documents in PutElasticsearchJson This closes #6903 Signed-off-by: Mike Thomsen <mthomsen@apache.org>
Summary
NIFI-11111 add option to output Elasticsearch error responses as FlowFile to PutElasticsearchJson and PutElasticsearchRecord
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
[ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy[ ] New dependencies are documented in applicableLICENSE
andNOTICE
filesDocumentation
[ ] Documentation formatting appears as expected in rendered files