New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-12855: Add more information to provenance events to facilitate full graph traversal #8498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adjusting the approach in this pull request @mattyb149.
Two quick notes that will make this more straightforward to review:
- There are large number of white-space formatting changes that make this more difficult to evaluate. It would be very helpful to revert formatting changes so the scope of substantive changes is clearer
- The Graph Client component changes should be extracted to a separate pull request. Although it is helpful to see the relationship, it would be much better to isolate framework changes from component changes that introduce features not directly related.
…ull graph traversal Co-authored-by: Timea Barna <timeabarna@apache.org>
The graph client stuff has been removed, I'll do a separate PR but not yet in case the reviews here affect the clients. Once this is in it will facilitate more clients such as perhaps an RDF/SPARQL client. Thanks for the inputs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for scoping down the changes in this pull request @mattyb149, this makes the core changes much easier to follow.
Reviewing the addition of the previousEventIds
raises several questions. The ProvenanceEventRecord
already contains Parent and Child FlowFile identifiers, used for linking, and it seems like this should be sufficient to support graph traversal use cases. The Provenancen Event ID is more of an internal database identifier, so it seems less than optimal to promote additional usage of this field.
From a performance perspective, querying the Provenance Repository for previous identifiers also seems like a potential performance problem.
If you can provide more background on why this is necessary, that would be helpful, but otherwise, the proposed approach does not look like a good candidate to go forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattyb149 Given the current merge conflicts and the unresolved discussion on the need for these changes, I recommend closing this pull request and revisiting it when ready. We could continue the discussion on the Jira issue if that would be helpful to scope out the types of changes needed.
|
||
@Override | ||
public void initialize(EventReporter eventReporter, Authorizer authorizer, | ||
ProvenanceAuthorizableFactory factory, IdentifierLookup identifierLookup) | ||
throws IOException { | ||
ProvenanceAuthorizableFactory factory, IdentifierLookup identifierLookup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a number of entries here that look like whitespace additions. Please double check and if so remove
* @param previousEventIds The previous event IDs (usually one except for JOIN events and such) | ||
* @return the builder | ||
*/ | ||
ProvenanceEventBuilder setPreviousEventIds(List<Long> previousEventIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply we're storing event ids for the before and after event on both the before and after event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah maybe it is that we had parent FlowFile identifiers and now we'll have previous Event Identfiers.
@mattyb149 It is important to undo all the whitespace changes to improve the reviewer efficiency and avoid the whitespace in general. |
Summary
NIFI-12855 This PR augments the provenance capabilities to include the following features:
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000
NIFI-00000
Pull Request Formatting
main
branchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-check
Licensing
LICENSE
andNOTICE
filesDocumentation