Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-6884 - Native library loading fixed/improved #3894

Closed
wants to merge 9 commits into from

Conversation

tpalfy
Copy link
Contributor

@tpalfy tpalfy commented Nov 19, 2019

NarClassLoader and InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory.

They also scan directories defined via java.library.path system property.
InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically master)?

  • Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not squash or use --force when pushing to allow for clean monitoring of changes.

For code changes:

  • Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
  • Have you written or updated unit tests to verify your changes?
  • Have you verified that the full build is successful on both JDK 8 and JDK 11?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
  • If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory.

They also scan directories defined via java.library.path system property.
InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".
Copy link
Contributor

@markap14 markap14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @tpalfy! I left several comments inline, mostly around "code hygiene" / avoiding Functional style programming when Imperative is more applicable.

It looks like there are a lot of unit tests that are meant to verify that the class properly finds the desired library. However, the main idea behind the Jira, I believe, is to ensure that classloader isolation is working for the different nars. It's hard to tell from this whether that isolation is fully provided. I think it makes most sense to have an Integration Test that is responsible for loading two nar's, each with a different version of the same native library, and ensure that each nar is properly loading its version of the library.

super(combineURLs(instanceUrls, additionalResourceUrls), parent);
this.identifier = identifier;
this.instanceType = type;
this.instanceUrls = Collections.unmodifiableSet(
instanceUrls == null ? Collections.emptySet() : new LinkedHashSet<>(instanceUrls));
this.additionalResourceUrls = Collections.unmodifiableSet(
additionalResourceUrls == null ? Collections.emptySet() : new LinkedHashSet<>(additionalResourceUrls));

List<File> allNativeLibDirs = new ArrayList<>(nativeLibDirs);
Set<File> additionalNativeLibDirs = this.additionalResourceUrls.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code gets a bit confusing. I think it would be much clearer if written in the Imperative style rather than the Functional style. In general, we should avoid functional style programming any time the function exceeds around 3-5 lines. The Guava JavaDocs do a good job of explaining the rational here: https://github.com/google/guava/wiki/FunctionalExplained as well as Joshua Bloch's Effective Java.

return System.getProperty("java.library.path", "");
}

default Function<File, File> toDir() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not be defined as a Function but rather just a method that takes a File as an argument and returns a File. It can still be referenced as a function when appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, good catch.

}

default String findLibrary(String libname) {
Path libLocation = getNativeLibNameToPath().compute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where I think the code becomes very difficult to follow and would be much more clear if written in Imperative style instead of functional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I have started with the imperative approach and switched to this style for the very reason that I think it's more understandable and more robust. I guess I got used to this style by now. And although I admit this can be abused (and I'm probably guilty of that sometimes), in my opinion, even considering code quality - readability and robustness-, the functional approach has objective advantage over the imperative one:

Functional style forces the divide and conquer strategy on the programmer (which is very helpful when dealing with more complex problems). It's about breaking down a process into sequential phases where each phase is separated from the rest (not unlike how NiFi works).
One can understand the logic step by step which helps readability and since the steps are completely isolated from each other and the general flow is more declarative the overall code is less prone to bugs.

Also it's easier to write a cleaner code imo as the declarative nature imposes boundaries that guides the flow. As I said I started with an imperative approach which had 3 private methods and about twice as much code as this. Now I can write it almost the same way as the current functional style, but only because now I see the near-optimal solution.

Nevertheless I'll commit a change with a more traditional code style and leave the current one as an unused method for all to be able to compare.

After taking a look at both we can have a final decision which to use.

@@ -205,26 +216,41 @@ private void updateClasspath(File root) throws IOException {
}

@Override
protected String findLibrary(final String libname) {
public String findLibrary(String libname) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason here to override the method just to call the superclass definition, instead of just removing the method from NarClassLoader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the abstract ClassLoader has an implementation returning null that takes precedent over the interface's default implementation.

But I guess this ties back to the bigger question of wether is it even a good idea to use an interface as a trait instead of a member or an abstract superclass. (I give my opinion on that on that specific comment.)

(Actually ClassLoader defines findLibrary as protected, which left alone wouldn't even compile because it would mean it would try to override the visibility of the public definition of the interface. This actually points out a weakness of the trait interface approach. Had the ClassLoader defined the findLibrary as public, it would have been easy to simply leave out this call to the interface - as it would have compiled but wouldn't have worked properly as it would have returned null.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for clarifying. I know there are sometimes some oddities like this in Java that do require overriding a method and calling the superclass. But 80% of the time when I do it, it's because I refactored and it just didn't even occur to me to delete it. Just figured I'd check.


import static java.nio.file.StandardCopyOption.REPLACE_EXISTING;

public interface NativeLibFinder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels to me like something that belongs more in a util and allow the NarClassLoader to use Composition over Inheritance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about the concept of this interface. In my opinion, the inheritance would be fine, but I would rather use an abstract class (between URLClassLoader and NarClassLoader/InstanceClassLoader) instead of a separate interface.

This interface contains a lot of implementation (that should be private / protected instead of public). Furthermore, it does not add a new feature to NarClassLoader/InstanceClassLoader, actually just implements findLibrary() which also exists in URLClassLoader. That's why an abstract class seems more natural to me.

This way the following method could be avoided too:

    public String findLibrary(String libname) {
        return NativeLibFinder.super.findLibrary(libname);
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all 3 solutions (member, abstract superclass, trait interface) have their pros and cons.
In my opinion the trait interface is the best usually, but it's hard to tell.
In this case probably the abstract superclass wins out.

Member (composition)
Pros

  • Can be reused
  • Further abstraction and extension not limited
  • Proper isolation
  • State code can be written once

Cons

  • More glue code (plus one field, instantiation, data transfer)
  • Need to make sure states are properly set up in order. (Must not reference the member before it's set up properly. Can be mitigated if immutable/stateless.)

Abstract superclass (Regular inheritance)
Pros

  • Least amount of glue code
  • Proper isolation
  • State setup is usually not an issue
  • State code can be written once
  • State can be accessed anywhere

Cons

  • Reusability it very restricted (probably not an issue in this case)
  • Restricts further abstraction and extension

Trait interface (Behavioural inheritance)
Pros

  • Minimal/Least amount of glue code
  • Can be reused
  • Further abstraction and extension not limited
  • No issues with state

Cons

  • Isolation is very poor (everything is public)
  • Can't have state (although can be mitigated by defining and referencing accessors)

All in all, abstract superclasses have their well-known issues. Trait interfaces are basically composition 2.0 except Java doesn't really implement this concept very well. The thing is, no matter the use-case, these issues with the trait interfaces will be there. So we basically either accept them with their flaws as a viable approach or never use them at all. I think the cons are not that bad.

However in this case adding another classloader superclass fits in the existing picture well so I'll go with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes perfect sense as well. It just gave me a bit of pause seeing an interface that was almost fully implemented with default methods. But of course, you can only work with what you have :)


String getTmpLibFilePrefix();

default Set<File> getUsrLibDirs() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having a default function for this that is evaluated each time it's needed, these values, I don't believe, will change over the lifetime of the JVM. Would recommend instead making it into a private static final Set<File> and computing only once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how can it stay testable without increasing the complexity an uncomfortable amount but I'll try to come with something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Is a minor quibble. I would say that it's okay to keep it as-is, if you added just a single-line comment along the lines of // Computed at runtime instead of statically to help facilitate testing. Not called often so the expense is worth the tradeoff or something like that. Just to show that it has been thought about and a decision intentionally made....



default boolean isOsWindows() {
return (OS.indexOf("win") >= 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use OS.contains(...) instead of OS.indexOf(...) >= 0, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, yes.

nativeLibDirs = Arrays.asList(nativeLibDir);

Path cachedLibPath = createTempFile("cached", "mocked").toAbsolutePath();
nativeLibNameToPath = new HashMap<String, Path>() {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should avoid creating new subclass of HashMap and instead use Collections.singletonMap

Path file21 = createTempFile(dir2, "usrLib", "file21");
Path file31 = createTempFile(dir3, "usrLib", "file31");

javaLibraryPath = new HashSet<Path>() {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should avoid subclassing HashSet and instead create the HashSet and then add to it. Or, otherwise, use Arrays.asList(...).stream() or Stream.of(...)

.map(File::getAbsolutePath)
.collect(Collectors.joining(File.pathSeparator));

HashSet<File> expected = new HashSet<File>() {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should create Set<File> expected and then add to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the previous cases but why here?
Any performance impact (be it memory or cpu related) is negligible and the code is easier on the eyes with this - otherwise not uncommon - way of initialisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that this is fairly common. I personally believe that it's a bad practice because it creates a new Class unnecessarily. The cost, though, is certainly negligible. It is especially less important in a test class.

Copy link
Contributor

@turcsanyip turcsanyip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @tpalfy!

I (manually) tested it with a real flow: PutHDFS using Hadoop native libraries for Snappy compression. I was able to reproduce the error with this flow (using 2 PutHDSF in parallel) on the original version.

I tried several configs and they worked fine:

  • load libhadoop.so from the default java.library.path (/usr/lib)
  • define -Djava.library.path in bootstrap.conf and use a custom path for libhadoop.so
  • specify the native lib folder via the Additional Classpath Resources property of PutHDFS
  • build the native lib into nifi-hadoop-libraries-nar
  • remove @RequiresInstanceClassLoading from PutHDFS (in order to use the NarClassLoader instead of the InstanceClassLoader)

I also left some comment regarding documentation and the concept of the NativeLibFinder interface vs abstract class.


import static java.nio.file.StandardCopyOption.REPLACE_EXISTING;

public interface NativeLibFinder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about the concept of this interface. In my opinion, the inheritance would be fine, but I would rather use an abstract class (between URLClassLoader and NarClassLoader/InstanceClassLoader) instead of a separate interface.

This interface contains a lot of implementation (that should be private / protected instead of public). Furthermore, it does not add a new feature to NarClassLoader/InstanceClassLoader, actually just implements findLibrary() which also exists in URLClassLoader. That's why an abstract class seems more natural to me.

This way the following method could be avoided too:

    public String findLibrary(String libname) {
        return NativeLibFinder.super.findLibrary(libname);
    }

@@ -116,10 +122,13 @@
* Maven NAR plugin will fail to build the NAR.
* </p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the javadoc of this class (which describes the NAR structure / java classloading in detail) should be updated with the concept of the native lib handling too.

Usage of -Djava.library.path system property also needs to be documented (the Admin Guide may be the right place for it).
Updating the description of the properties having dynamicallyModifiesClasspath(true) also needed I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the docs.

@markap14
Copy link
Contributor

Thanks for the responses @tpalfy! I realize, as I reviewed again, that my comments may have come off in a bit of a "This has to change, this has to change, this has to change!" tone. My apologies, if that's the case. I just started a review and commented on things that I noticed as I read through, a bit of a stream of thoughts. This resulted in some comments being rather nit-picky, was just mentioning anything that stuck out to me.

And thanks @turcsanyip for reviewing as well! And for explaining how you tested. My biggest concern was that I didn't have a good way to easily test this, which is why I was suggesting the integration test. I do think that an integration test for this would be ideal and would be helpful to avoid breaking things going forward. However, I realize that providing an integration test is a tall order, and it's not something that necessarily has to be done before merging this PR - as long as we have a clear way of testing this. So thanks for outlining that!

@tpalfy
Copy link
Contributor Author

tpalfy commented Nov 22, 2019

No problem @markap14, I appreciate the update.
As for the integration test I think it would be definitely worth it to have one, as even after my second commit I feel like it would be great to have a test that could prove it's still working as before.
I think I'll be able to do one for NarClassLoaders loading libs from the bundled-dependencies/native dir (I can use the TestNarLoader as a starting point).
Not sure about the InstanceClassLoader yet. TestStandardProcessorNode might help me out with that.

@markap14
Copy link
Contributor

markap14 commented Dec 3, 2019

Awesome, thanks, @tpalfy! Looking at the Travis output, I see there is a test failure:

[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.009 s - in org.apache.nifi.nar.TestLoadNativeLibFromNar
[INFO] Running org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty
[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.046 s <<< FAILURE! - in org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty
[ERROR] org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty  Time elapsed: 0.046 s  <<< ERROR!
org.junit.AssumptionViolatedException: got: <false>, expected: is <true>
	at org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty.setUpClass(TestLoadNativeLibViaSystemProperty.java:52)

[ERROR] org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty  Time elapsed: 0.046 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty.tearDownSuite(TestLoadNativeLibViaSystemProperty.java:60)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR] org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty.org.apache.nifi.nar.TestLoadNativeLibViaSystemProperty
[ERROR]   Run 1: TestLoadNativeLibViaSystemProperty.setUpClass:52 » AssumptionViolated got: <fa...
[ERROR]   Run 2: TestLoadNativeLibViaSystemProperty.tearDownSuite:60 » NullPointer
[INFO] 
[INFO] 
[ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 1

I think it's because the #tearDownSuite method is calling System.setProperty with a null value because the assumeTrue in setupClass resulted in skipping the rest of the setup, so the value is null. If so, then it's likely a trivial "if not null" check, so I'll go ahead and continue reviewing. Thanks again for updating

@joewitt
Copy link
Contributor

joewitt commented Jan 9, 2020

@markap14 @tpalfy @turcsanyip This is a pretty great PR and discussion and this solves a long standing and irritating issue. Ready to merge?

@turcsanyip
Copy link
Contributor

@joewitt @markap14 @tpalfy The 2nd round review (refactorings, integration tests) is in progress from my side.

Copy link
Contributor

@turcsanyip turcsanyip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retested the refactored version using multiple versions of the same native library, configured via different ways (system property, processor's classpath property, built into the nar) and everything worked fine.

Only found some minor issues, please fix them.

@@ -2479,6 +2479,9 @@ attempts to resolve filesystem resources from the value of the property. The val
comma-separated list of one or more directories or files, where any paths that do not exist are
skipped. If the resource represents a directory, the directory is listed, and all of the files
in that directory are added to the classpath individually.
These directories also will be scanned for native libraries. If a library is found in one of these
directories, an OS-handled temporary copy is created and cached before loading it to maintain consistenvy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: consistency

});
}

private void deleteDir(String path) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileUtils.deleteFile() could be used.

import java.util.LinkedHashSet;
import java.util.List;
import java.util.Objects;
import java.util.Optional;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import.

Copy link
Contributor

@turcsanyip turcsanyip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpalfy Thanks for the fixes and also for the whole contribution.

LGTM +1

@asfgit asfgit closed this in b7fb947 Jan 13, 2020
@markap14
Copy link
Contributor

Thanks @tpalfy for the updates! I know there was a lot of back & forth here. The code looks good, though, and I did some testing and ran the battery of system tests. I'm a +1 as well. Merged to master. Thanks for sticking with this and providing a fix for a pretty complicated problem!

patricker pushed a commit to patricker/nifi that referenced this pull request Jan 22, 2020
… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory.

They also scan directories defined via java.library.path system property.
InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".
Added tests for loading native libraries. Supports mac only.
Added support for loading native libs from additional resources in AbstractHadoopProcessor.
Updated javadoc for PropertyDescriptor.dynamicallyModifiesClasspath.

This closes apache#3894.

Signed-off-by: Mark Payne <markap14@hotmail.com>
natural pushed a commit to natural/nifi that referenced this pull request Feb 1, 2020
… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory.

They also scan directories defined via java.library.path system property.
InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".
Added tests for loading native libraries. Supports mac only.
Added support for loading native libs from additional resources in AbstractHadoopProcessor.
Updated javadoc for PropertyDescriptor.dynamicallyModifiesClasspath.

This closes apache#3894.

Signed-off-by: Mark Payne <markap14@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants