Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12959: Support loading python processors from NARs #8573

Closed
wants to merge 4 commits into from

Conversation

markap14
Copy link
Contributor

Summary

NIFI-00000

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@pvillard31
Copy link
Contributor

Nice improvement Mark! Would this approach help with a processor like ParseDocument where Tesseract has to be installed on the node first?

@markap14
Copy link
Contributor Author

@pvillard31 no, it wouldn't really help there. The issue there is that a native library like Tesseract would need to be compiled for the correct OS and architecture. So in order to include that, we'd need to include many copies of the library. And currently, we do not set any sort of environment variables telling it to search for native libraries in the NAR's unpacked directory. It might be worth exploring as a future improvement, though. Even though it may not help for something like ParseDocument, it might be helpful at least for custom nars, where perhaps you know that you're only going to run in containers or on a specific architecture/os so you can include the appropriate library for the custom nar?

@pvillard31
Copy link
Contributor

Yeah, I must admit that I tried to figure out a way to use ParseDocument with NiFi in a container and I didn't manage to find a clean way to bring Tesseract into it. I feel like it would probably be easier to have it as a side container or something but I didn't find a way to piece everything together so I was naively (without too much hopes) thinking that this improvement may help :)

…d nar and instead switched to a dependency that can be easily loaded using download-maven-plugin; also fixed missing new-lines in unit test
@markap14
Copy link
Contributor Author

@dan-s1 thanks great catch. I removed some carrige-return-newline combos in that file and my IDE did something I didn't expect :) Fixed that.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @markap14, this will be helpful for Python Processor deployment! I just noted a name change to reflect the Python library used for testing.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @markap14, I noted a couple more minor questions, otherwise this looks ready to go.

// Environment creation is only necessary if using PIP. Otherwise, the Process requires no outside dependencies, other than those
// provided in the package and thus we can simply include those packages in the PYTHON_PATH.
if (!isUseVirtualEnv()) {
logger.debug("Will not create Python Virtual Environment because PIP is disabled in nifi.properties");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but general usage of pip is all lowercase and opposed to all uppercase. More importantly though, the message does not reflect the behavior, as it is only based on the the presence of packaged dependencies.

Suggested change
logger.debug("Will not create Python Virtual Environment because PIP is disabled in nifi.properties");
logger.debug("Python Virtual Environment not created: Python Processor packaged with dependencies");

@@ -224,6 +226,10 @@ private String generateAuthToken() {
return Base64.getEncoder().encodeToString(bytes);
}

private boolean isUseVirtualEnv() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only use of this method is a negated reference. Can this be removed and replaced with if (packagedWithDependencies)?


String pythonPath = pythonApiDirectory.getAbsolutePath();
final String absolutePath = virtualEnvHome.getAbsolutePath();
final File dependenciesDir = new File(new File(absolutePath), "NAR-INF/bundled-dependencies");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to add NAR-INF/bundled-dependencies in all cases? It seems like it should be conditional on whether the Processor is packaged with dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the adjustments @markap14, the latest version looks good! +1 merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants