Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-7355: add bytecode submission to tinkerpop service #7677

Closed
wants to merge 4 commits into from

Conversation

levilentz
Copy link
Contributor

@levilentz levilentz commented Sep 10, 2023

Summary

This brings back #4204 which went stale. In my changes I have been able to address:

The new code consolidates the script submissions and gremlin bytecode submission functionality into one service. I am still working to make a unit test to test this with in-memory janusgraph, but I have tested the following extensively in the nifi UI:

  • Script Submission with service settings
  • Bytecode Submission with yaml file settings

Using a localized janusgarph server. I am in the process of testing the other two options (script submission with yaml, and bytecode with service settings), but expect them to work as expected.

I was unable to replicate:

This code is ready for review, and I do not anticipate additional functionality at this time save for the addition of unit tests.

NIFI-7355

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Can you please advise if you want me to squash this commit. I wasnt sure if that was necessary given the fact that I am building off a stale PR.

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • [N/A] Build completed using mvn clean install -P contrib-check
    • JDK 17

I am unable to complete the contrib-check because the neo4j bundle is complaining about in-complete licenses. Should I update those?

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@levilentz
Copy link
Contributor Author

@MikeThomsen -- bought back an OLD pr from the grave to complete out the tinkerpop client service to allow interaction with tinkerpop utilizing janusgraph. Appears to work on a my personal machine, and will be testing it on our corporate project shortly.

@mattyb149 -- you have one comment on the previous PR about GStrings. Can you elaborate on how you are using GStrings within your script submission?

@MikeThomsen
Copy link
Contributor

@levilentz no problem. I'll hit you up this week if I have any concerns.

@mattyb149
Copy link
Contributor

I'll have to find the flow I used to test this, I assume if I was using GStrings it was because I had an expression I wanted to evaluate like "${fieldName}" or something, I'll try to take a look soon.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @levilentz. Without going into the code details for now, there are two items worth highlighting for adjustment:

  1. There are several Checkstyle warnings shown in the Static Analysis job that need to be corrected:
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[21,43] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - org.apache.nifi.annotation.behavior.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[26,34] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - org.apache.nifi.components.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[45,20] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - javax.script.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[47,17] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - java.util.*.
Warning:  src/test/java/org/apache/nifi/graph/TestableGremlinClientService.java:[20,8] (imports) UnusedImports: Unused import - org.apache.tinkerpop.gremlin.driver.Client.
  1. New unit tests should use Java instead of Groovy, so the GremlinByteCodeClientIT class should be refactored.

@levilentz
Copy link
Contributor Author

Thanks for the contribution @levilentz. Without going into the code details for now, there are two items worth highlighting for adjustment:

1. There are several Checkstyle warnings shown in the Static Analysis job that need to be corrected:
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[21,43] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - org.apache.nifi.annotation.behavior.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[26,34] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - org.apache.nifi.components.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[45,20] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - javax.script.*.
Warning:  src/main/java/org/apache/nifi/graph/TinkerpopClientService.java:[47,17] (imports) AvoidStarImport: Using the '.*' form of import should be avoided - java.util.*.
Warning:  src/test/java/org/apache/nifi/graph/TestableGremlinClientService.java:[20,8] (imports) UnusedImports: Unused import - org.apache.tinkerpop.gremlin.driver.Client.
2. New unit tests should use Java instead of Groovy, so the `GremlinByteCodeClientIT` class should be refactored.

Thank you for the quick turn @exceptionfactory -- I have just simply removed the groovy IT tests in the most recent commit and have fixed the check style.

@levilentz
Copy link
Contributor Author

One other thing I have done is I am not re-using the traversal source per this documentation:

https://docs.aws.amazon.com/neptune/latest/userguide/best-practices-gremlin-java-reuse.html

In testing, if we recreate the traversal source for each query, it is time-intensive and uses a lot of overhead inside of the JVM. I will be testing this again locally here shortly, but this has worked on a custom build of nifi for quite some time. Let me know your thoughts!

@exceptionfactory exceptionfactory changed the title Nifi-7355: add bytecode submission to tinkerpop service NIFI-7355: add bytecode submission to tinkerpop service Sep 12, 2023
Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the adjustments @levilentz. I noted a few other general recommendations regarding dependency versioning and other issues.

The custom JAR and class loading warrants some further evaluation to make sure it behaves as expected, but I see the RequiresInstanceClassLoading annotation, so that is good.

I will defer to others for more detailed implementation evaluation.

Comment on lines 350 to 354
.keyStore(service.getKeyStoreFile())
.keyStorePassword(service.getKeyStorePassword())
.keyStoreType(service.getKeyStoreType())
.trustStore(service.getTrustStoreFile())
.trustStorePassword(service.getTrustStorePassword());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing individual properties, the Builder supports an sslContext() method that accepts a Netty SslContext. The Netty JdkSslContext wraps the Java SSLContext returned from the SSLContextService.createContext() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exceptionfactory -- I have updated this to utilize the .createContext() functionality. I am not super familiar with netty, so let me know if you see any problems with JdkSslContext that I have created.

groovy-2.4.16-indy
groovy-json-2.4.16-indy
groovy-sql-2.4.16-indy
(ASLv2) Groovy 2.5.10 (http://www.groovy-lang.org)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to nifi.groovy.version should push this to 3.X, so we need to do a L&N check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeThomsen -- L&N check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License and notice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeThomsen do you have any guidance on how to do that? I looked at other processors/controllers that used groovy and they seem similarly out of compliance with your comment. So just not sure how best to generate this.

@@ -1,107 +0,0 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattyb149 @exceptionfactory @joewitt normally this would violate our major version policy, but I think this is an acceptable edge case because the graph space is a "move fast and break things" market.


if (jarsIsSet && clzIsSet) {
try {
final ClassLoader loader = this.getClass().getClassLoader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works on 2.x and Java 17. These are the changes I had to make to another project.

image

@exceptionfactory thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tested this code using JDK17, and it loaded the custom classes just fine, however if this is the more acceptable use case, I can switch it over and test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and this works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one odd thing, in all my testing, it doesnt matter if I set EXTENSION_CLASSES to be any specific classes I need, they seem to be available as long as I have the classes available inside of the Jar(s) present in EXTENSION_CLASSES. I have included unit test that proves this. Any thoughts on just removing the EXTENSION_CLASSES? I could imaging keeping it as we would need functionality such as this if we are to move away from Groovy entirely within NiFi.

Copy link
Contributor

@MikeThomsen MikeThomsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor change needed.

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@MikeThomsen @exceptionfactory I have been messing with this very module for the past 24 hours or so because I'm trying to move all of our Groovy work to be more Java 21 compliant and also our Groovy 3x stuff is past support/progress. Nothing in the Groovy 3x line appears to work with Java 21. So on main i'm trying for now to move all things to Groovy 4. The Tinkerpop/graph stuff has an even older relationship with Groovy and seems to be on entirely unsupported lines. I am finding information gathering on Groovy super tough frankly.

Anyway, I suggest we remove the groovy support from the graph module entirely. We should in my view move to a mode whereby the only Groovy bits in our codebase are to allow people to write scripted components in groovy. No unit tests should be in Groovy and we should eliminate those that are. I did not realize what a complicated story this gave our build process.

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@levilentz please take a look here #7677 (comment). cc @mattyb149

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@levilentz @mattyb149 @exceptionfactory @MikeThomsen Alternatively I propose I put up this PR which moves all the things to Groovy 4.x in a few mins here. And dumps Groovy support from the graph processors. Then you rebase to that once merged and attempt to restore Groovy support but on the ${nifi.groovy.version} basis which will be Groovy 4.0.15. I was getting an error about Java 17 which itself is a problem since we're at the very least Java 17 (though trying to get to Java 21).

How does that sound?

@levilentz
Copy link
Contributor Author

@levilentz @mattyb149 @exceptionfactory @MikeThomsen Alternatively I propose I put up this PR which moves all the things to Groovy 4.x in a few mins here. And dumps Groovy support from the graph processors. Then you rebase to that once merged and attempt to restore Groovy support but on the ${nifi.groovy.version} basis which will be Groovy 4.0.15. I was getting an error about Java 17 which itself is a problem since we're at the very least Java 17 (though trying to get to Java 21).

How does that sound?

@joewitt obviously I have not been as plugged into this ongoing conversation, but a couple of thoughts for this specific component:

So to get to your proposal, I dont have any issues with that as long as I have some guidance on how the ExecuteGraphQueryRecordProcessor will work after that refactor. If you have a PR for me to review that would be immensely helpful.

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@levilentz Thanks that is helpful. My goal is getting NiFi to a good build basis for Java 21. The Groovy stuff in its various forms is causing problems in doing that as these old Groovy versions neither build nor run on Java 21 (even Java 17 seems dubious). We can avoid pinning them as it has been however where things get problematic is when a service depends on very old versions.

What is the newest/most recent version of Groovy that Tinkerpop/Gremlin can work with?

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@levilentz here is the PR i just put up. https://github.com/apache/nifi/pull/7692/commits

Assuming this works we can merge my PR. Then you could rebase and try to do your changes on top of that. You can put Groovy back in and do your update. The problem is going to be that we can't really stick with Groovy versions that are too old. Even when trying to run this stuff on Java 17 it failed to build due to not being able to understand Java class version 61 (which is 17) so something is going to have to give/break here.

@levilentz
Copy link
Contributor Author

@levilentz Thanks that is helpful. My goal is getting NiFi to a good build basis for Java 21. The Groovy stuff in its various forms is causing problems in doing that as these old Groovy versions neither build nor run on Java 21 (even Java 17 seems dubious). We can avoid pinning them as it has been however where things get problematic is when a service depends on very old versions.

What is the newest/most recent version of Groovy that Tinkerpop/Gremlin can work with?

@joewitt -- oh boy, that does not sound fun! We might need to think on that a bit as tinkerpop's development cycle is much slower.

Per their release notes, gremlin 3.7.0 is on Groovy 4.0.9 -- https://github.com/apache/tinkerpop/blob/master/CHANGELOG.asciidoc

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

@levilentz Ok that is a great news thing then. If my PR gets merged can you then rebase and start again adding the needed Groovy libs back in? Tests that use groovy will be a no go I think as the compiler for it appears non functional but you could see. But since they're on 4.0.9 we'll be on 4.0.15 that should be fine.... You agree?

@levilentz
Copy link
Contributor Author

@joewitt -- just reviewed that PR, and it does not contain anything that is giving me heartburn. I will need to address your same concern as @exceptionfactory is keying off the same here:

#7677 (comment)

So yeah, lets plan on me working on this PR (need to address comments, documentation, and then unit tests), and will keep an eye out on #7692, rebasing after it is merged.

@joewitt
Copy link
Contributor

joewitt commented Sep 14, 2023

awesome thank you @levilentz

@mattyb149
Copy link
Contributor

+1 for going to Groovy 4, I'll review Joe's PR too. Are you going to switch from gremlin-groovy to gremlin-java? What is the impact for things like Janusgraph that only appear to support Gremlin as the query language?

@levilentz
Copy link
Contributor Author

+1 for going to Groovy 4, I'll review Joe's PR too. Are you going to switch from gremlin-groovy to gremlin-java? What is the impact for things like Janusgraph that only appear to support Gremlin as the query language?

@mattyb149 to answer this it really depends on what is going to happen to the other scripted processors that use groovy. Do you have any guidance on what that will look like for nifi as a whole?

In my opinion, depending on the guidance, moving away from groovy will have minimal impact on this. For the scripted version of this processor, there is no groovy dependency. For the bytecode version, I have been running a custom build of this processor for ~ 3 years, which leverages java-gremlin to load data into janusgraph. As long as I have a way to inject the graph traversal source, it shouldent really matter if the underlying code is compiled groovy or a set of standardized methods on a custom jar.

How we do that will just depend on what nifi is doing with the other groovy-injected processors such as: https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-groovyx-bundle/nifi-groovyx-processors/src/main/java/org/apache/nifi/processors/groovyx/ExecuteGroovyScript.java

@joewitt
Copy link
Contributor

joewitt commented Sep 15, 2023

The move to Groovy 4.0.15 is in main now so you can probably rebase and try again

@mattyb149
Copy link
Contributor

There's a Checkstyle error failing the tests

@levilentz
Copy link
Contributor Author

There's a Checkstyle error failing the tests

@mattyb149 -- sorry still working on cleaning this up, should have something in by thursday addressing all PR comments.

@levilentz
Copy link
Contributor Author

@mattyb149 @joewitt @exceptionfactory @MikeThomsen -- ready for another review. I have done the following:

  1. Updated documentation to better show how to use this component
  2. Addressed all outstanding PR concerns on the code (need some guidance on the licensing stance)
  3. Build out a unit test showing the gremlin bytecode functionality

I was not able to get test containers working with JG, so I might still tinker with that to see if I can get the whole test suite working.

For that integration test utilizing gremlin bytecode that I have built out, it needs to load a gremlin serializer, which requires setting the custom jar. I have one that does that, but it is ~15MB in size, so I dont think it is wise to add it to the git history. Do you have any guidance on how best to create that jar on the fly for the test? Or should I just put some documentation on how to download the jar to run the test? It just simply is the standard serializers available in the gremlin driver package, but they are not loaded by default in the controller.

@joewitt
Copy link
Contributor

joewitt commented Sep 22, 2023

@levilentz I'd just clean up the test you mention to be an Integration test that someone can run with instructions or just remove it. Can you please squash these down to a single commit to make merging easier. I prefer to avoid using 'squash and merge' as I'd like to validate in a local build . I started grabbing a commit at a time and it was creating conflicts. THanks!

@levilentz
Copy link
Contributor Author

@joewitt -- I have cleanup the integration test. There is technically a way to use it with janusgraph-core in memory and convert it to an actual unit test, but it will require this to have a lot of O&M to continue working as new version of groovy/gremlin/java come out. I would recommend leaving it as a simple IT. I have added documentation on how to run it, and someone knowledgeable about this process would be able to run it.

I have also squashed all the commits. Let me know if you have any other changes you would like to add to this!

@joewitt
Copy link
Contributor

joewitt commented Sep 24, 2023

Ok. The workflow needed approval to run again so just clicked that. Meanwhile if we can get @mattyb149 @MikeThomsen to weight in or merge that would be great. Given all the discussion it seems all are good so with that understanding I'll plan to merge tomorrow if things build fine and nobody has alternative input.

Copy link
Contributor

@mattyb149 mattyb149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mostly cosmetic/doc stuff but was curious if we can get rid of groovy-jsr223 and use the Grovy-specific classes for evaluating and compiling scripts.

@joewitt
Copy link
Contributor

joewitt commented Sep 26, 2023

@levilentz Do you think you'll be able to knock out the rebase and @mattyb149 commentary? This thing looks close and want to ensure we can get it across the finish line. You've definitely hung in there quite well through a lot on this one! Thanks

@levilentz
Copy link
Contributor Author

@levilentz Do you think you'll be able to knock out the rebase and @mattyb149 commentary? This thing looks close and want to ensure we can get it across the finish line. You've definitely hung in there quite well through a lot on this one! Thanks

@joewitt -- I am planning on getting this completed this evening. I just want to test removing jsr223 to make sure it works. If we can remove it, it will be a huge help for this (and my) project and should make O&M of this easier as groovy changes.

@joewitt
Copy link
Contributor

joewitt commented Sep 26, 2023

Awesome thanks! Dont mean to rush you - just dont want to let this linger given how much effort you put in plus how much harder i made it with the java 21/groovy updates :). Thanks

@levilentz levilentz force-pushed the NIFI-7355 branch 3 times, most recently from ae02fdc to 7889369 Compare September 27, 2023 03:42
@levilentz
Copy link
Contributor Author

@joewitt was able to get to all of @mattyb149, and my integration tests work as expected. However, because I have had to remove the JSR223 per:

#7677 (comment)

I would like an additional review if possible. A lot of this groovy compilation and injection is new to me. In the mean time, I will be doing a complete test a running nifi instance.

Copy link
Contributor

@MikeThomsen MikeThomsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a few questions.

@levilentz
Copy link
Contributor Author

@joewitt @MikeThomsen @mattyb149 I have updated the code with your suggested changes, and they are working great! Let me know if you have any other changes.

Copy link
Contributor

@mattyb149 mattyb149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just want to get rid of that dependency on the graph processors. Thanks for sticking with this! I'm looking forward to the capability

@joewitt
Copy link
Contributor

joewitt commented Oct 1, 2023

Seems like the various rounds of commentary are sorted and the build checks out. Went ahead and merged to main. Thanks! Anything to follow can happen in a new JIRA/PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants