Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-7505 Add InvokeHTTPRecord processor #8127

Closed
wants to merge 3 commits into from

Conversation

Aerilym
Copy link
Contributor

@Aerilym Aerilym commented Dec 6, 2023

Summary

NIFI-7505
Continuation of PR #7532

This PR isn't complete yet, but I've created it in case anybody has any feedback on the approach I'm taking. I've kept the commit from the previous PR for tracking purposes, but they will be merged before the PR merge to conform to the PR rules.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

pspiller and others added 3 commits July 28, 2023 18:05
Add a record-based processor that executes an HTTP request for each
record in a flowfile. Many of the options are similar to the InvokeHTTP
processor's, but adapted for record-based operation.

Request attributes like the URL, headers and request body can vary per
record, using RecordPath queries to extract data from each input record.
The response body and various metadata (headers, status code, etc) can
be written to each record.

The processor can be configured to route the entire input flowfile to
a failure relationship if an error is encountered, or failing records
can be split off into a separate flowfile. HTTP requests can be retried
a configurable number of times.

The response body can optionally be parsed using a record reader
service. As well as allowed the parsed body to be included in the output
record, in this mode the response body can also be used to determine
whether the request is treated as having succeed or failed. This can be
useful for APIs that always return a 200 status code, with error
information in the response body.

This PR isn't complete yet (it still needs tests), but I've created it
in case anybody has any feedback on the approach I'm taking.
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing a work-in-progress proposal for consideration @Aerilym. It is helpful to consider the implementation at this stage before considering additional work.

As I noted in several comments on the previous pull request #7532, this presents a number of implementation concerns and raises questions about whether such a processor provides a reasonable level of abstraction for general use.

InvokeHTTP has many options, which is somewhat understandable given the nature of the HTTP protocol. Bringing a similar level of flexibility to a record-oriented Processor seems less useful due to the nature of record-oriented handling. Most HTTP services have very specific contracts for communication, which may make it difficult to adapt when considering both record-oriented and HTTP protocol abstractions together. The Jira issue itself highlight different types of general use cases, whether bulk requests or requests per record.

Using the WebClientServiceProvider is helpful, but unfortunately InvokeHTTP does not provide a good general starting point. From that perspective, there are a large number of implementation concerns at the code level that would need to be addressed, beyond the concept itself.

For these reasons, I am not in favor of moving forward with an InvokeHTTPRecord Processor as currently implemented. In order to make the best of your effort, I recommend continuing the discussion on the associated Jira issue, describing some concrete use cases, which would provide opportunity for discussion. After working through some of those details, then it would be better to come back to consider implementation.

@Aerilym
Copy link
Contributor Author

Aerilym commented Dec 13, 2023

Thank you for your thorough review and the time you've taken to give feedback and comment again, your tact is appreciated :) I understand the work in its current form being rejected and the desire to return the requirements and design. Though I would have liked to be able to try and initiate and participate in that process, I've been in a position where I just needed to "scratch my own itch" as quickly as possible and won't be able to devote more time to it.

Therefore, I wanted to share what I have implemented in case it is of any benefit or in case anyone else can use it in its current form with 1.x. Would still love to see an InvokeHTTPRecord processor at some point in future because I believe there is a very clear need, but understand it evidently needs some more experienced sponsors than myself to make that case and design it so that it has wider acceptance.

@Aerilym Aerilym closed this Dec 13, 2023
@Aerilym Aerilym deleted the invokehttprecord branch December 13, 2023 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants