Skip to content

NIFI-13779: Missing Some Data Provenance Events from Python#9292

Merged
exceptionfactory merged 1 commit intoapache:mainfrom
bobpaulin:NIFI-13779
Sep 26, 2024
Merged

NIFI-13779: Missing Some Data Provenance Events from Python#9292
exceptionfactory merged 1 commit intoapache:mainfrom
bobpaulin:NIFI-13779

Conversation

@bobpaulin
Copy link
Contributor

  • Use cloned flow file as original
  • Transform inbound flow file to ensure it gets picked up in Provenance Events

Summary

NIFI-13779

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this change @bobpaulin. This seems to be a better way to handle the lineage of input FlowFiles, reflecting CONTENT_MODIFIED instead of cloning. However, tagging @markap14 for additional review and consideration.

Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bobpaulin thanks for updating! I do think this makes a lot of sense. I had 2 thoughts about the PR though. I commented inline about the name of the transformed and originalCloned variables. Normally I try not to quibble over variable names, but given how frequently they're used in the code I think it's helpful to use clear names.

The other thought is that the Provenance is now going to always show a CLONE event followed by a DROP event if original is auto-terminated, which is the default and very common. But we just recently added a new method to ProcessContext: boolean isAutoTerminated(Relationship relationship);

Perhaps we should not even clone the FlowFile at all if the original FlowFile is auto-terminated. That way, there's no CLONE or DROP event, and the lineage is much clearer.

public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile original = session.get();
if (original == null) {
FlowFile transformed = session.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to me to call the FlowFile that we are pulling from an input queue transformed. Perhaps we should name the variable simply flowFile? And then we can call the clone just simply clone rather than originalCloned? I think that would make the code a little easier to read personally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you @markap14, I think just calling this flowFile would be simpler and avoid some confusion.

@markap14
Copy link
Contributor

Thanks for pinging @exceptionfactory. I do agree with the approach but think we should clean it up a bit more, as noted above.

@bobpaulin
Copy link
Contributor Author

Thanks @markap14 and @exceptionfactory as always for the feedback. This cleaned up nicely with the usage of isAutoTerminated will look forward to using that more!

* Use cloned flow file as original
* Transform inbound flow file to ensure it gets picked up in Provenance
Events
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input @markap14, and thanks for making the updates @bobpaulin, the latest version looks good! +1 merging

@exceptionfactory exceptionfactory merged commit 34aa764 into apache:main Sep 26, 2024
ravinarayansingh pushed a commit to ravinarayansingh/nifi that referenced this pull request Oct 1, 2024
…pache#9292)

- Transform input FlowFile instead of cloned FlowFile

Signed-off-by: David Handermann <exceptionfactory@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants