-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-3020] Initial GPU junit tests #1317
Conversation
Thanks, @corepointer for the gpu junit test infrastructure.
|
All tests add -gpu. It is up to your test to add more. So -lineage -gpu is definitely possible.
At the moment it is no different (other than the test checking for the appearance of the corresponding gpu instruction in the heavy hitter output). The content of new tests is up to its author - anything's possible ;-) |
Hi @corepointer , the testing does not work on this runner. Any pointers on how to resolve this one. cuda, cudnn is available from the command line: run results here: https://github.com/j143/systemds/runs/4137852455 This one runs fine java -Xmx4g -Xms4g -Xmn400m -cp target/SystemDS.jar:target/lib/*:target/SystemDS-*.jar org.apache.sysds.api.DMLScript -f ../main.dml -exec singlenode -gpu |
This commit is part of the GPU test suite epic [SYSTEMDS-3019] and introduces: * the gpu test java package * tests for cellwise/rowwise codegen * test for unary builtin functions (incomplete)
* Move some maven surefire plugin settings to the properties section (with same defaults as before) to make them settable from command line (need to reduce thread count for GPU tests) * provide an integer when appending "-stats" to a test run (used to crash some tests without it) * Conv2DTest requests "recompile_runtime" explain mode without adding "-explain" so output would not print
a2a1e56
to
1ae56c7
Compare
You're not rebuilding the binaries yet with cmake, are you? Because then there might have been the issue that Jitify is not there. You need to clone with --recursive to fetch the external dependency. But that is an issue once the current binaries run the test. Two things you could check: Is it CUDA version 10.2 that is installed? This is at the moment the latest version we support (I'm working on CUDA11.x support - it's almost there). The other thing to check: Is a CUDA capable device visible to your VM? You could add the command "nvidia-smi" to your runner. That should print the available CUDA devices.
This one isn't using any GPU instructions though ;-) PS: I've rebased to current main branch and cherry picked the commits you added. |
ubuntu@ip-10-0-0-4:~/repo/systemds$ nvidia-smi
Tue Nov 9 06:50:01 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02 Driver Version: 440.118.02 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 26C P0 70W / 149W | 243MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 9955 C .../lib/jvm/java-11-openjdk-amd64/bin/java 232MiB |
+-----------------------------------------------------------------------------+
|
Sorry my bad. I didn't realize that I raised the bar to compute capability 6 and above when I introduced atomicAdd() for double values (that's exactly the function that is called in the line referenced by the error messages). The Tesla K80 has a compute capability of 3.7 [1]. So for the time being code gen is for cc 6+ only if we don't find a workaround. |
This LGTM. 👍 The workaround for the last failing test can be resolved later. :) |
Thank you! I'll fix it and merge it in after my next paper deadline on Dec 10. |
or should I comment out the failed test and merge the remaining. This would avoid rebasing on my gpu runner test fork. If that is okay for you. |
More tests will be added as we go. For the tests to run it is advisable to not start multiple test cases simultaneously in the same JVM. To run from command line use something like this:
mvn -ntp test -DenableGPU=true -Dmaven.test.skip=false -Dtest-parallel=suites -Dtest-threadCount=1 -Dtest-forkCount=1 -D automatedtestbase.outputbuffering=false -Dtest=org.apache.sysds.test.gpu.**