New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIFI-6884 - Native library loading fixed/improved #3894
Conversation
… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory. They also scan directories defined via java.library.path system property. InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @tpalfy! I left several comments inline, mostly around "code hygiene" / avoiding Functional style programming when Imperative is more applicable.
It looks like there are a lot of unit tests that are meant to verify that the class properly finds the desired library. However, the main idea behind the Jira, I believe, is to ensure that classloader isolation is working for the different nars. It's hard to tell from this whether that isolation is fully provided. I think it makes most sense to have an Integration Test that is responsible for loading two nar's, each with a different version of the same native library, and ensure that each nar is properly loading its version of the library.
super(combineURLs(instanceUrls, additionalResourceUrls), parent); | ||
this.identifier = identifier; | ||
this.instanceType = type; | ||
this.instanceUrls = Collections.unmodifiableSet( | ||
instanceUrls == null ? Collections.emptySet() : new LinkedHashSet<>(instanceUrls)); | ||
this.additionalResourceUrls = Collections.unmodifiableSet( | ||
additionalResourceUrls == null ? Collections.emptySet() : new LinkedHashSet<>(additionalResourceUrls)); | ||
|
||
List<File> allNativeLibDirs = new ArrayList<>(nativeLibDirs); | ||
Set<File> additionalNativeLibDirs = this.additionalResourceUrls.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code gets a bit confusing. I think it would be much clearer if written in the Imperative style rather than the Functional style. In general, we should avoid functional style programming any time the function exceeds around 3-5 lines. The Guava JavaDocs do a good job of explaining the rational here: https://github.com/google/guava/wiki/FunctionalExplained as well as Joshua Bloch's Effective Java.
return System.getProperty("java.library.path", ""); | ||
} | ||
|
||
default Function<File, File> toDir() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably not be defined as a Function but rather just a method that takes a File as an argument and returns a File. It can still be referenced as a function when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, good catch.
} | ||
|
||
default String findLibrary(String libname) { | ||
Path libLocation = getNativeLibNameToPath().compute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another case where I think the code becomes very difficult to follow and would be much more clear if written in Imperative style instead of functional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I have started with the imperative approach and switched to this style for the very reason that I think it's more understandable and more robust. I guess I got used to this style by now. And although I admit this can be abused (and I'm probably guilty of that sometimes), in my opinion, even considering code quality - readability and robustness-, the functional approach has objective advantage over the imperative one:
Functional style forces the divide and conquer strategy on the programmer (which is very helpful when dealing with more complex problems). It's about breaking down a process into sequential phases where each phase is separated from the rest (not unlike how NiFi works).
One can understand the logic step by step which helps readability and since the steps are completely isolated from each other and the general flow is more declarative the overall code is less prone to bugs.
Also it's easier to write a cleaner code imo as the declarative nature imposes boundaries that guides the flow. As I said I started with an imperative approach which had 3 private methods and about twice as much code as this. Now I can write it almost the same way as the current functional style, but only because now I see the near-optimal solution.
Nevertheless I'll commit a change with a more traditional code style and leave the current one as an unused method for all to be able to compare.
After taking a look at both we can have a final decision which to use.
@@ -205,26 +216,41 @@ private void updateClasspath(File root) throws IOException { | |||
} | |||
|
|||
@Override | |||
protected String findLibrary(final String libname) { | |||
public String findLibrary(String libname) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason here to override the method just to call the superclass definition, instead of just removing the method from NarClassLoader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the abstract ClassLoader
has an implementation returning null that takes precedent over the interface's default implementation.
But I guess this ties back to the bigger question of wether is it even a good idea to use an interface as a trait instead of a member or an abstract superclass. (I give my opinion on that on that specific comment.)
(Actually ClassLoader
defines findLibrary
as protected, which left alone wouldn't even compile because it would mean it would try to override the visibility of the public definition of the interface. This actually points out a weakness of the trait interface approach. Had the ClassLoader defined the findLibrary
as public, it would have been easy to simply leave out this call to the interface - as it would have compiled but wouldn't have worked properly as it would have returned null.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks for clarifying. I know there are sometimes some oddities like this in Java that do require overriding a method and calling the superclass. But 80% of the time when I do it, it's because I refactored and it just didn't even occur to me to delete it. Just figured I'd check.
|
||
import static java.nio.file.StandardCopyOption.REPLACE_EXISTING; | ||
|
||
public interface NativeLibFinder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels to me like something that belongs more in a util and allow the NarClassLoader to use Composition over Inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about the concept of this interface. In my opinion, the inheritance would be fine, but I would rather use an abstract class (between URLClassLoader and NarClassLoader/InstanceClassLoader) instead of a separate interface.
This interface contains a lot of implementation (that should be private / protected instead of public). Furthermore, it does not add a new feature to NarClassLoader/InstanceClassLoader, actually just implements findLibrary()
which also exists in URLClassLoader. That's why an abstract class seems more natural to me.
This way the following method could be avoided too:
public String findLibrary(String libname) {
return NativeLibFinder.super.findLibrary(libname);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all 3 solutions (member, abstract superclass, trait interface) have their pros and cons.
In my opinion the trait interface is the best usually, but it's hard to tell.
In this case probably the abstract superclass wins out.
Member (composition)
Pros
- Can be reused
- Further abstraction and extension not limited
- Proper isolation
- State code can be written once
Cons
- More glue code (plus one field, instantiation, data transfer)
- Need to make sure states are properly set up in order. (Must not reference the member before it's set up properly. Can be mitigated if immutable/stateless.)
Abstract superclass (Regular inheritance)
Pros
- Least amount of glue code
- Proper isolation
- State setup is usually not an issue
- State code can be written once
- State can be accessed anywhere
Cons
- Reusability it very restricted (probably not an issue in this case)
- Restricts further abstraction and extension
Trait interface (Behavioural inheritance)
Pros
- Minimal/Least amount of glue code
- Can be reused
- Further abstraction and extension not limited
- No issues with state
Cons
- Isolation is very poor (everything is public)
- Can't have state (although can be mitigated by defining and referencing accessors)
All in all, abstract superclasses have their well-known issues. Trait interfaces are basically composition 2.0 except Java doesn't really implement this concept very well. The thing is, no matter the use-case, these issues with the trait interfaces will be there. So we basically either accept them with their flaws as a viable approach or never use them at all. I think the cons are not that bad.
However in this case adding another classloader superclass fits in the existing picture well so I'll go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes perfect sense as well. It just gave me a bit of pause seeing an interface that was almost fully implemented with default methods. But of course, you can only work with what you have :)
|
||
String getTmpLibFilePrefix(); | ||
|
||
default Set<File> getUsrLibDirs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than having a default function for this that is evaluated each time it's needed, these values, I don't believe, will change over the lifetime of the JVM. Would recommend instead making it into a private static final Set<File>
and computing only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how can it stay testable without increasing the complexity an uncomfortable amount but I'll try to come with something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Is a minor quibble. I would say that it's okay to keep it as-is, if you added just a single-line comment along the lines of // Computed at runtime instead of statically to help facilitate testing. Not called often so the expense is worth the tradeoff
or something like that. Just to show that it has been thought about and a decision intentionally made....
|
||
|
||
default boolean isOsWindows() { | ||
return (OS.indexOf("win") >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use OS.contains(...)
instead of OS.indexOf(...) >= 0
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, yes.
nativeLibDirs = Arrays.asList(nativeLibDir); | ||
|
||
Path cachedLibPath = createTempFile("cached", "mocked").toAbsolutePath(); | ||
nativeLibNameToPath = new HashMap<String, Path>() {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should avoid creating new subclass of HashMap and instead use Collections.singletonMap
Path file21 = createTempFile(dir2, "usrLib", "file21"); | ||
Path file31 = createTempFile(dir3, "usrLib", "file31"); | ||
|
||
javaLibraryPath = new HashSet<Path>() {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should avoid subclassing HashSet and instead create the HashSet and then add to it. Or, otherwise, use Arrays.asList(...).stream()
or Stream.of(...)
.map(File::getAbsolutePath) | ||
.collect(Collectors.joining(File.pathSeparator)); | ||
|
||
HashSet<File> expected = new HashSet<File>() {{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, should create Set<File> expected
and then add to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the previous cases but why here?
Any performance impact (be it memory or cpu related) is negligible and the code is easier on the eyes with this - otherwise not uncommon - way of initialisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that this is fairly common. I personally believe that it's a bad practice because it creates a new Class unnecessarily. The cost, though, is certainly negligible. It is especially less important in a test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @tpalfy!
I (manually) tested it with a real flow: PutHDFS using Hadoop native libraries for Snappy compression. I was able to reproduce the error with this flow (using 2 PutHDSF in parallel) on the original version.
I tried several configs and they worked fine:
- load libhadoop.so from the default
java.library.path
(/usr/lib) - define
-Djava.library.path
in bootstrap.conf and use a custom path for libhadoop.so - specify the native lib folder via the Additional Classpath Resources property of PutHDFS
- build the native lib into nifi-hadoop-libraries-nar
- remove
@RequiresInstanceClassLoading
from PutHDFS (in order to use the NarClassLoader instead of the InstanceClassLoader)
I also left some comment regarding documentation and the concept of the NativeLibFinder interface vs abstract class.
|
||
import static java.nio.file.StandardCopyOption.REPLACE_EXISTING; | ||
|
||
public interface NativeLibFinder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about the concept of this interface. In my opinion, the inheritance would be fine, but I would rather use an abstract class (between URLClassLoader and NarClassLoader/InstanceClassLoader) instead of a separate interface.
This interface contains a lot of implementation (that should be private / protected instead of public). Furthermore, it does not add a new feature to NarClassLoader/InstanceClassLoader, actually just implements findLibrary()
which also exists in URLClassLoader. That's why an abstract class seems more natural to me.
This way the following method could be avoided too:
public String findLibrary(String libname) {
return NativeLibFinder.super.findLibrary(libname);
}
@@ -116,10 +122,13 @@ | |||
* Maven NAR plugin will fail to build the NAR. | |||
* </p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the javadoc of this class (which describes the NAR structure / java classloading in detail) should be updated with the concept of the native lib handling too.
Usage of -Djava.library.path
system property also needs to be documented (the Admin Guide may be the right place for it).
Updating the description of the properties having dynamicallyModifiesClasspath(true)
also needed I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the docs.
Thanks for the responses @tpalfy! I realize, as I reviewed again, that my comments may have come off in a bit of a "This has to change, this has to change, this has to change!" tone. My apologies, if that's the case. I just started a review and commented on things that I noticed as I read through, a bit of a stream of thoughts. This resulted in some comments being rather nit-picky, was just mentioning anything that stuck out to me. And thanks @turcsanyip for reviewing as well! And for explaining how you tested. My biggest concern was that I didn't have a good way to easily test this, which is why I was suggesting the integration test. I do think that an integration test for this would be ideal and would be helpful to avoid breaking things going forward. However, I realize that providing an integration test is a tall order, and it's not something that necessarily has to be done before merging this PR - as long as we have a clear way of testing this. So thanks for outlining that! |
…tract superclass AbstractNativeLibHandlingClassLoader. Added some documentation. Minor refactor.
No problem @markap14, I appreciate the update. |
Awesome, thanks, @tpalfy! Looking at the Travis output, I see there is a test failure:
I think it's because the #tearDownSuite method is calling |
…arDownSuite. Added notifications for skipping tests when not running on Mac OS.
…ources in AbstractHadoopProcessor.
@markap14 @tpalfy @turcsanyip This is a pretty great PR and discussion and this solves a long standing and irritating issue. Ready to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested the refactored version using multiple versions of the same native library, configured via different ways (system property, processor's classpath property, built into the nar) and everything worked fine.
Only found some minor issues, please fix them.
@@ -2479,6 +2479,9 @@ attempts to resolve filesystem resources from the value of the property. The val | |||
comma-separated list of one or more directories or files, where any paths that do not exist are | |||
skipped. If the resource represents a directory, the directory is listed, and all of the files | |||
in that directory are added to the classpath individually. | |||
These directories also will be scanned for native libraries. If a library is found in one of these | |||
directories, an OS-handled temporary copy is created and cached before loading it to maintain consistenvy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: consistency
}); | ||
} | ||
|
||
private void deleteDir(String path) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileUtils.deleteFile() could be used.
import java.util.LinkedHashSet; | ||
import java.util.List; | ||
import java.util.Objects; | ||
import java.util.Optional; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpalfy Thanks for the fixes and also for the whole contribution.
LGTM +1
Thanks @tpalfy for the updates! I know there was a lot of back & forth here. The code looks good, though, and I did some testing and ran the battery of system tests. I'm a +1 as well. Merged to master. Thanks for sticking with this and providing a fix for a pretty complicated problem! |
… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory. They also scan directories defined via java.library.path system property. InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)". Added tests for loading native libraries. Supports mac only. Added support for loading native libs from additional resources in AbstractHadoopProcessor. Updated javadoc for PropertyDescriptor.dynamicallyModifiesClasspath. This closes apache#3894. Signed-off-by: Mark Payne <markap14@hotmail.com>
… InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory. They also scan directories defined via java.library.path system property. InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)". Added tests for loading native libraries. Supports mac only. Added support for loading native libs from additional resources in AbstractHadoopProcessor. Updated javadoc for PropertyDescriptor.dynamicallyModifiesClasspath. This closes apache#3894. Signed-off-by: Mark Payne <markap14@hotmail.com>
NarClassLoader and InstanceClassLoader can load libraries from their own or their ancestors' NAR-INF/bundled-dependencies/native directory.
They also scan directories defined via java.library.path system property.
InstanceClassLoader also checks additional classpath resources defined by PropertyDescriptors with "dynamicallyModifiesClasspath(true)".
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically
master
)?Is your initial contribution a single, squashed commit? Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not
squash
or use--force
when pushing to allow for clean monitoring of changes.For code changes:
mvn -Pcontrib-check clean install
at the rootnifi
folder?LICENSE
file, including the mainLICENSE
file undernifi-assembly
?NOTICE
file, including the mainNOTICE
file found undernifi-assembly
?.displayName
in addition to .name (programmatic access) for each of the new properties?For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.