Skip to content

Conversation

@EndzeitBegins
Copy link
Contributor

@EndzeitBegins EndzeitBegins commented Dec 25, 2024

Due to using a different API to retrieve the FlowFiles the behaviour when working with multiple queues is no longer unspecified.

I had an circular dependency problem when depending on nifi-mock from nifi-utils, which is why I use an anonymous implementation of FlowFile inside the tests instead of MockFlowFile.

Summary

NIFI-14110

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@mosermw
Copy link
Member

mosermw commented Dec 30, 2024

I recommend modifying a MultiProcessorUseCase or creating another UseCase in order to make the documentation for combining the two Batch Size properties clear. It's very important to be clear that flowfiles will not be delayed in the input queue waiting for a batch size to be reached. It's also very important to support packaging exactly 1 flowfile.

While this improvement appears to be worthwhile, we should be very careful with configuration creep on PackageFlowFile. It's only justification for existence is to be easier to use than MergeContent for a specific use case. Too many features would ruin that justification.

@EndzeitBegins
Copy link
Contributor Author

Thank you for the useful feedback @mosermw. I've adjusted the documentation of the UseCases to clarify the batching behaviour of the processor.

PackageFlowFile in combination with UnpackContent is a useful pair of processors to transfer FlowFiles between NiFi clusters where the more robust approach using remote process groups is not applicable, e.g. due to network restrictions.
Packaging more than one FlowFile can improve efficiency both in storage and transmission.
In my opinion, when the content size of the FlowFiles to transfer can vary largely, being able to apply a soft constraint on the package size can be helpful.

Personally I do not intent to add other properties to the processor at the moment.

Copy link
Member

@mosermw mosermw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed and tested this in various scenarios in running NiFi and just have a few minor comments.

@EndzeitBegins
Copy link
Contributor Author

Thank you for your review @mosermw.

Apologies for the delayed response. I've now addressed the feedback.

The pull request has been rebased onto the latest main commit, resolving all conflicts.
Each PropertyDescriptor is now on its own line.
The redundant batching benefits description has been removed, and a newline added to the test file's end.

I'd appreciate an review of the updated changes.

Due to using a different API to retrieve the FlowFiles
the behaviour when working with multiple queues is no longer unspecified.

Enhance tests to ensure FlowFiles are rejected once size limit was reached

Explain batching behaviour in UseCases
@mosermw
Copy link
Member

mosermw commented Mar 27, 2025

Sorry it took so long to review this @EndzeitBegins. I tested this again and the functionality and documentation looks good. Code looks good.
+1

@asfgit asfgit closed this in 20afcba Mar 27, 2025
@EndzeitBegins EndzeitBegins deleted the NIFI-14110 branch March 27, 2025 19:29
TomaszK-stack pushed a commit to TomaszK-stack/nifi that referenced this pull request May 5, 2025
Due to using a different API to retrieve the FlowFiles
the behaviour when working with multiple queues is no longer unspecified.

Enhance tests to ensure FlowFiles are rejected once size limit was reached

Explain batching behaviour in UseCases

Signed-off-by: Mike Moser <mosermw@apache.org>

Closes apache#9595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants