-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow-1.0.0-1.3 requires CUDA #396
Comments
Thanks for the feedback! The previous build on Maven Central also links with CUDA, this is not new. Is this a new requirement for TensorFrames? |
I think linking statically with Anyway, here are SNAPSHOT binaries for |
* Link TensorFlow statically with `cudart` to avoid dependency on CUDA (issue #396)
Thank you Samuel. The previous version of TensorFrames was using 0.8.0, which was not compiled with GPU acceleration. Speaking about ancient versions, we are getting there. I am just hitting an ABI compatibility issue with travis [1]:
Would you mind trying to compile tensorflow with GCC <= 4.9 and libstdc++.so.6.0.19 or older? From a quick look at the java tensorflow experimental bindings, this is the compatibility level they are also targeting. Thanks! |
Yes, I'll do the release with CentOS 7, so that won't be a problem. Anything else? |
This is the only issue I can think of (but then I was not expecting the ABI compatibility issue either...) |
Ok, just to make sure that everything is alright, I've redeployed the SNAPSHOT binaries from CentOS 7. Let me know how those do! Thanks |
@saudet I confirm that the latest artifact works as expected in a CPU-only environment. You can close the ticket, I am looking forward to the next release. On a personal note, if by any chance you have the time to build the macosx binaries as well, I will be very grateful. |
Sure thing, I've deployed SNAPSHOT binaries for Mac as well. Let's make sure those work properly before a release! Thanks |
@saudet It looks like the mac build requires cuda, which I have not tried to install on my machine. It currently crashes the JVM. Here is the relevant error:
|
Interesting. I thought I had some problem with my old machine, so I tried on a newer one with 10.12, but I get exactly the same thing. Seems to be a known regression: tensorflow/tensorflow#2980 (comment) So, what do you think we should do? |
Actually, the workaround mentioned in the issue above appears to work for me. If I set the LD_LIBRARY_PATH environment variable to something like "/usr/lib", then it magically runs! Would that be satisfactory? |
@saudet thanks for looking into it, and for the pointers. It looks like it is a know issue with recent versions of TensorFlow, and it depends on particular combinations of TensorFlow + Java + Hardware. I still experience the issue on macOS, but I can use docker to run the tests locally. Not running the macos build is a slight inconvenience and unless other people are experiencing the same issue, I am happy with the current artifacts. Feel free to close the ticket. By the way, it looks like you have published a new pom file on Saturday ( |
@saudet many thanks for releasing new BINARIES. I too agree with @thunterdb that not being able to run on a mac machine is a little inconvenient. I hence wanted to know your opinion on releasing specifically no cuda versions for mac and linux similar to how Google people are maybe planning to do too: TFlow-JAVA-Readme |
@akdeoras Yes that's fine, would you be willing to make a contribution? |
@saudet let me give it a shot. I did build locally with CUDA and all worked as expected on mac and linux. |
I mean, making the modifications to have both versions.
|
Yes, thats what I mean too :) |
Awesome! I'll be waiting for your pull request. In the mean time, I've released new binaries:
|
@saudet I tried your above release, but it did not work for me (I think even @thunterdb faced similar exceptions). About the code changes, I realized that it wont suffice to just change tensorflow project i.e. to have a CUDA and no CUDA separate releases. You have a bunch of CPP projects in there Do you have any advice on how to go about it ? |
TensorFlow builds all by itself. It does not and can hardly depend on the
other presets, even if we wanted to. Or am I missing something?
|
You are right. What I meant is that if we change Tensorflow preset to now publish 2 jars, one for CUDA and another for no CUDA, then we will have to change the names of the jar to something like: |
It sounds reasonable, yes, but not very useful if someone wants to use CUDA when CUDA is available... |
So, instead of having different architecture names, the names of the libraries should different. A bit like we have to do with FFTW to get it working for both |
Thanks @saudet. I looked at fftw project and understood what you are suggesting. So if I understand correctly, the process will be to build two separate .so libraries of Tensorflow. One will be with CUDA and other will be without. What we are not sure about is how to specify which cpp library to load at runtime of our java application ? Can you point to some example if you have any ? |
Yes, two .so files. The user will decide which one to use, that's the point, no? |
How to do this? We can simply call |
@akdeoras One more thing, to make sure that JavaCPP doesn't try to load libraries on its own, we'll need to suffix them with "#" in the class properties. This in effect marks that library as "system" or "provided". Specifically, something like this: @Platform(... , link = "tensorflow_cc#", library = "tensorflow#") |
@akdeoras Hi, have you made any progress on this? It looks like we're going to have the same problem with OpenCV (pull #416), so let's see what we can do together about this. There isn't any complete example for something like this. There are bits and pieces that need to be put together, and I will help. But please let me know where you stumble and I will help with those places in priority so we can even out the effort. Thanks for your interest! /cc @SamCarlberg |
Hi all. Thank you for your great work :) .
So I go through all libraries but I stuck to this error
I also exported the env var I run the example on mac osx 10.12.6. I saw really similar error above so I though to post my issue here. Thanks all, Andrea |
@spi-x-i Please try again with 1.3.0-1.3.4-SNAPSHOT and let me know if that doesn't work, thanks! |
It is error version and not found, please help me |
Make sure to follow the instructions here: http://bytedeco.org/builds/ |
There don't seem to be any newer snapshots for tensorflow-platform at sonatype, despite attempting to follow your instructions above. [ERROR] Failed to execute goal on project exampletrainer: Could not resolve dependencies for project org.bytedeco.javacpp-presets.tensorflow:exampletrainer:jar:1.3.4: Could not find artifact org.bytedeco.javacpp-presets:tensorflow-platform:jar:1.4.0-1.3-SNAPSHOT in sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots) |
@andyguest The version is 1.4.0-1.3.4-SNAPSHOT, you can check the list to make sure: |
Fantastic. That works perfectly now! :-) |
@akdeoras FYI, separate CPU-only and GPU-enabled builds are now a reality! In addition to <dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>tensorflow</artifactId>
<version>1.4.0-1.3.4-SNAPSHOT</version>
<classifier>linux-x86_64-gpu</classifier>
</dependency> |
It seems that the official version of JavaCPP's TensorFlow in Maven Central has a linking dependency on
libcudart
. This is problematic for upstream packages that may need to run in a CPU-only environment. Do you have some plans to publish a CPU-only version?Also, I am not a legal expert, but if the
libcudart
were to be statically linked, I wonder if the NVidia license would allow the publication of the final artifact on a public repository like Maven Central.Thank you in advance. This will unblock the next release of TensorFrames:
databricks/tensorframes#74
cc @saudet
The text was updated successfully, but these errors were encountered: