Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CUDA 10.2 #89

Merged
merged 1 commit into from Nov 29, 2019
Merged

Add support for CUDA 10.2 #89

merged 1 commit into from Nov 29, 2019

Conversation

@saudet
Copy link
Member

saudet commented Nov 28, 2019

What changes were proposed in this pull request?

Update scripts, source code, and documentation to build and run with CUDA 10.2, as well as adjust presets to load new dependencies from libnd4j.

How was this patch tested?

Builds and runs CUDA examples fine on Fedora 30 from redist artifact.

@saudet saudet requested review from AlexDBlack and raver119 Nov 28, 2019
Copy link
Member

AlexDBlack left a comment

LGTM, thanks 🎉
I'll report back after testing/building locally and running some tests, just to be sure...

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Nov 28, 2019

So, I can build fine using this:

libnd4j/buildnativeoperations.sh -c cpu -a avx2 -h mkldnn -j 8 && libnd4j/buildnativeoperations.sh -c cuda -cc 61 -j 8 && mvn clean install -Dmaven.test.skip -Dmaven.javadoc.skip=true -Dlibnd4j.cuda=10.2 -Dlibnd4j.compute=61 -pl '!libnd4j'

until I hit this:
https://gist.github.com/AlexDBlack/471dff473d3042c6a03a4fb006f4332f
(downloading snapshots when using shell script build also happens on master)

Whereas if I use maven:

mvn install -pl libnd4j -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -Dlibnd4j.cuda=10.2 -Dlibnd4j.compute=61 && mvn clean install -Dmaven.test.skip -Dmaven.javadoc.skip=true -Dlibnd4j.cuda=10.2 -Dlibnd4j.compute=61 -pl '!libnd4j'

I get this:
https://gist.github.com/AlexDBlack/6f82449ae9c74eb5c86d4e3aef693e01

Note the latter command works on master (with 10.1 of course)

@saudet

This comment has been minimized.

Copy link
Member Author

saudet commented Nov 28, 2019

Looks like the build is assuming something about the state of the artifacts on the CI server...?

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Nov 28, 2019

For the first issue, I think it's more that the buildnativeoperations.sh script (when not run through maven) doesn't install to the local maven repository, but we still have a dependency on it. So it fetches it from sonatype instead.
Even though we don't actually need the maven copy during the build, as we should (and are, afaik are) using the output of the manually run build script.

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Nov 28, 2019

So if I cut out most of the args in the pom.xml for CPU build, I get this (using mvn clean install):
https://gist.github.com/AlexDBlack/e9101d0326fc35bf2d01a443dc62ecab

[INFO] bash C:\DL4J\Git\deeplearning4j\libnd4j/buildnativeoperations.sh --chip cpu
/bin/bash: C:\DL4J\Git\deeplearning4j\libnd4j/buildnativeoperations.sh: No such file or directory

Which doesn't make sense. I've also double checked the PR, everything looks fine to me.

It would be good to know if this is isolated to my machine somehow, or whether it's reproducible elsewhere.

@raver119

This comment has been minimized.

Copy link

raver119 commented Nov 28, 2019

You probably have WSL installed. I've fallen into this recently. So, bash cmd executes /bin/bash from WSL rather than mingw

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Nov 28, 2019

Yep, I definitely do have WSL (WSL 2 in fact) installed, added fairly recently...

@raver119

This comment has been minimized.

Copy link

raver119 commented Nov 28, 2019

In this case temporary workaround is trivial. In pom.xml replace bash with absolute path to it.

Copy link
Member

AlexDBlack left a comment

Bash path issues aside, it seems to build and run OK.
I'm seeing a small number of ND4J test failures (TF import tests) but they could be failing on master also (would need to rebuild and check).

I say we merge this and fix those after.

image

@raver119

This comment has been minimized.

Copy link

raver119 commented Nov 29, 2019

These failures are on master atm.

  • check_numerics/strictly_increasing are order of exec issues we've discussed before
  • svd is a sign issue. graph should be modified with abs op following svd
  • and only mobilenet is something new.
@AlexDBlack AlexDBlack merged commit 5e07998 into master Nov 29, 2019
1 check failed
1 check failed
continuous-integration/jenkins/pr-head The build of this commit was aborted
Details
@AlexDBlack AlexDBlack deleted the sa_cuda branch Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.