Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading nd4j multiple times in the same JVM results in UnsatisfiedLinkError #6166

Closed
CedricReichenbach opened this issue Aug 15, 2018 · 13 comments
Labels
Enhancement New features and other enhancements

Comments

@CedricReichenbach
Copy link

CedricReichenbach commented Aug 15, 2018

We're running into a similar problem as others before, e.g. this one: https://github.com/deeplearning4j/nd4j/issues/1507

In our case, we're running a tomcat containing two webapps, each of which containing nd4j-native jars (brought in by a maven dependency to nd4j-native-platform).

When calling build() on a network configuration (likely in the second webapp), the following exception is thrown:

java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html
        at org.nd4j.nativeblas.NativeOpsHolder.<init>(NativeOpsHolder.java:51)
        at org.nd4j.nativeblas.NativeOpsHolder.<clinit>(NativeOpsHolder.java:19)
        ... 83 more
Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcpu in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1220)
        at org.bytedeco.javacpp.Loader.load(Loader.java:980)
        at org.bytedeco.javacpp.Loader.load(Loader.java:879)
        at org.nd4j.nativeblas.Nd4jCpu.<clinit>(Nd4jCpu.java:10)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.bytedeco.javacpp.Loader.load(Loader.java:938)
        at org.bytedeco.javacpp.Loader.load(Loader.java:879)
        at org.nd4j.nativeblas.Nd4jCpu$NativeOps.<clinit>(Nd4jCpu.java:1310)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.nd4j.nativeblas.NativeOpsHolder.<init>(NativeOpsHolder.java:29)
        ... 84 more
Caused by: java.lang.UnsatisfiedLinkError: Native Library /root/.javacpp/cache/nd4j-native-1.0.0-beta-linux-x86_64.jar/org/nd4j/nativeblas/linux-x86_64/libjnind4jcpu.so already loaded in another classloader
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1907)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
        at java.lang.Runtime.load0(Runtime.java:809)
        at java.lang.System.load(System.java:1086)
        at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1200)
        ... 95 more

When running just one of the two webapps alone, everything works flawlessly.

@saudet
Copy link
Contributor

saudet commented Aug 16, 2018

If you're not launching multiple instances of the same app, you could create native libraries with unique names for each app...

@saudet saudet self-assigned this Aug 16, 2018
@saudet saudet added Enhancement New features and other enhancements Documentation labels Aug 16, 2018
@saudet
Copy link
Contributor

saudet commented Aug 16, 2018

BTW, this is something that needs to be fixed in the JDK. If you have any "ammunition" (I really don't like this language, but this is what it is, it's a "fight" against people who do not wish for things to change) we can use to convince the OpenJDK community (think Oracle...) that this is something we need urgently, please do let us know your arguments. This is an issue that comes up with any other libraries using JNI too, such as TensorFlow: tensorflow/tensorflow#19298

/cc @johanvos

@CedricReichenbach
Copy link
Author

@saudet In our particular case, we're running the same app (lib-wise) multifold, but they're distinguished configuration-wise.

We've tried to let javacpp cache into different directories, but that's not possible reliably because the cache directory is only configurable globally (system property org.bytedeco.javacpp.cachedir).

@johanvos
Copy link

The only "ammunition" (I don't like that word too) you need is time to do the work, not to fight. Nobody in OpenJDK will stop you from doing work, but you should be prepared to do it yourself, and it's lots of work and commitment.
The procedures on how to do it are unfortunately not always very clear.
The first step is discussing it and creating a JEP, and finding someone to endorse it. I plan to do that, but it takes time. We are talking about the kernel of a 25 yo platform, it's not trivial to add functionality to it.

@saudet
Copy link
Contributor

saudet commented Aug 17, 2018

@CedricReichenbach The issue isn't with the native libraries, it's with the JDK. Basically, the class that calls System.load(), that is JavaCPP in this case, needs to be loaded by the system class loader. This is not something we can typically control from a container application for security reasons.

@johanvos It's more than that, it's always more than just getting the work done, it's always about politics, always, and I'm very glad you're here, because I'm unable to deal with it. In this case the issue is with a built-in limitation of JNI. If you read https://docs.oracle.com/javase/7/docs/technotes/guides/jni/jni-12.html very carefully you'll notice this paragraph buried in there:

In the Java 2 SDK, each class loader manages its own set of native libraries. The same JNI native library cannot be loaded into more than one class loader. Doing so causes UnsatisfiedLinkError to be thrown. For example, System.loadLibrary throws an UnsatisfiedLinkError when used to load a native library into two class loaders. The benefits of the new approach are:

  • Name space separation based on class loaders is preserved in native libraries. A native library cannot easily mix classes from different class loaders.
  • In addition, native libraries can be unloaded when their corresponding class loaders are garbage collected

In other words, this is a feature, not a bug. And what they plan to do about this is to have everyone switch to what they are working on in Panama, and because they promise that their replacement for JNI is going to fix all this, they are not considering fixing this for JNI:
http://mail.openjdk.java.net/pipermail/panama-dev/2018-May/001951.html

But they have been promising to come up with this for over 4 years now, with no concrete plans or roadmap about what is going to become available when, and even if they do eventually come out with something, we'll still need to use JNI for Android anyway, but we're talking about OpenJDK here. They don't care about what we need for Android, so here we are: politics!

@saudet
Copy link
Contributor

saudet commented Aug 17, 2018

@CedricReichenbach It might be possible to hack around this if we disable the SecurityManager. Would this be something that you could live with?

@CedricReichenbach
Copy link
Author

@saudet As a workaround, we now make sure to only use dl4j/nd4j (and thus the whole native shebang) in one of the webapps, and avoid in the others. That works for us because using neural networks is only crucial in one of them.

Our product is a CMS we're shipping to customers who might run them in any kind of way, so we cannot give a canonical answer on whether the system classloader workaround is acceptable. But it could certainly be helpful at least for dev and testing setups if it's easily configurable.

@saudet
Copy link
Contributor

saudet commented Aug 17, 2018

@CedricReichenbach
According to the following document, it sounds like permission to use JNI is not granted by default:
https://docs.oracle.com/javase/9/security/permissions-java-development-kit.htm
So if you're able to use JNI without anything special, this means security is probably not enabled and hacking something with the system class loader might work to further automate the process. Can you check if this is the case, that you don't need the security features of the JDK?

@CedricReichenbach
Copy link
Author

@saudet As said above, we don't control our customers' setup, it might be anything. But if JNI is already tied to either disabled or customized security, then I guess it's sensible that users enabling it would also have to configure exceptions for the system class loader. If this is not an option for them, we could give them the option to opt-out of features requiring dl4j.

@saudet
Copy link
Contributor

saudet commented Aug 20, 2018

@CedricReichenbach Yes, it sounds like they would need to enable permissions for JNI anyway, so I think we can assume that we can access the system class loader. However, we still can't have it load arbitrary classes, its class path is set by the container, so I'm still not sure how we could hack this yet...

saudet added a commit to bytedeco/javacpp that referenced this issue Aug 27, 2018
@saudet
Copy link
Contributor

saudet commented Aug 27, 2018

I think I was able to work around this successfully in commit bytedeco/javacpp@dd57c2c. I've tested a bit on Tomcat and it appears to work well. There might be few things to adjust in ND4J, so more testing welcome! Please give it a try with JavaCPP 1.4.3-SNAPSHOT: http://bytedeco.org/builds/

@saudet
Copy link
Contributor

saudet commented Aug 28, 2018

@CedricReichenbach I'll close this for now, but please let me know if you still encounter issues with JavaCPP 1.4.3-SNAPSHOT. Thanks for the help!

@saudet saudet closed this as completed Aug 28, 2018
@lock
Copy link

lock bot commented Sep 27, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancement New features and other enhancements
Projects
None yet
Development

No branches or pull requests

3 participants