[SYSTEMDS-3020] Initial GPU junit tests #1317

corepointer · 2021-06-14T12:02:14Z

More tests will be added as we go. For the tests to run it is advisable to not start multiple test cases simultaneously in the same JVM. To run from command line use something like this:

mvn -ntp test -DenableGPU=true -Dmaven.test.skip=false -Dtest-parallel=suites -Dtest-threadCount=1 -Dtest-forkCount=1 -D automatedtestbase.outputbuffering=false -Dtest=org.apache.sysds.test.gpu.**

This test suite should not be included in the automated testing for the time being as we don't have GPU testing infrastructure set up.
Two of the tests are failing atm - working on that ;-)

phaniarnab · 2021-06-14T13:29:22Z

Thanks, @corepointer for the gpu junit test infrastructure.
I did not start executing the tests yet. But I have some questions regarding the way it is done.

Is this setup allows writing feature tests on gpu, where the baseline is also with -gpu (e.g. compare -gpu -lineage with -gpu)?
I think this is an easy enough way to run regression tests in a gpu. But effect-wise, how is it different from adding -gpu to the existing test classes?

corepointer · 2021-06-14T14:07:36Z

1. Is this setup allows writing feature tests on gpu, where the baseline is also with `-gpu` (e.g. compare `-gpu -lineage` with `-gpu`)?

All tests add -gpu. It is up to your test to add more. So -lineage -gpu is definitely possible.

2. I think this is an easy enough way to run regression tests in a gpu. But effect-wise, how is it different from adding `-gpu` to the existing test classes?

At the moment it is no different (other than the test checking for the appearance of the corresponding gpu instruction in the heavy hitter output). The content of new tests is up to its author - anything's possible ;-)

j143 · 2021-11-08T11:02:11Z

Hi @corepointer ,

the testing does not work on this runner. Any pointers on how to resolve this one.

cuda, cudnn is available from the command line:

run results here: https://github.com/j143/systemds/runs/4137852455

This one runs fine

java -Xmx4g -Xms4g -Xmn400m -cp target/SystemDS.jar:target/lib/*:target/SystemDS-*.jar org.apache.sysds.api.DMLScript -f ../main.dml -exec singlenode -gpu

This commit is part of the GPU test suite epic [SYSTEMDS-3019] and introduces: * the gpu test java package * tests for cellwise/rowwise codegen * test for unary builtin functions (incomplete)

* Move some maven surefire plugin settings to the properties section (with same defaults as before) to make them settable from command line (need to reduce thread count for GPU tests) * provide an integer when appending "-stats" to a test run (used to crash some tests without it) * Conv2DTest requests "recompile_runtime" explain mode without adding "-explain" so output would not print

corepointer · 2021-11-08T23:47:55Z

Hi @corepointer ,

the testing does not work on this runner. Any pointers on how to resolve this one.

cuda, cudnn is available from the command line:

run results here: https://github.com/j143/systemds/runs/4137852455

You're not rebuilding the binaries yet with cmake, are you? Because then there might have been the issue that Jitify is not there. You need to clone with --recursive to fetch the external dependency. But that is an issue once the current binaries run the test.

Two things you could check: Is it CUDA version 10.2 that is installed? This is at the moment the latest version we support (I'm working on CUDA11.x support - it's almost there). The other thing to check: Is a CUDA capable device visible to your VM? You could add the command "nvidia-smi" to your runner. That should print the available CUDA devices.

This one runs fine

java -Xmx4g -Xms4g -Xmn400m -cp target/SystemDS.jar:target/lib/*:target/SystemDS-*.jar org.apache.sysds.api.DMLScript -f ../main.dml -exec singlenode -gpu

This one isn't using any GPU instructions though ;-)

PS: I've rebased to current main branch and cherry picked the commits you added.

j143 · 2021-11-09T06:55:42Z

1. Checking for installation of cuda

ubuntu@ip-10-0-0-4:~/repo/systemds$ nvidia-smi
Tue Nov  9 06:50:01 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P0    70W / 149W |    243MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      9955      C   .../lib/jvm/java-11-openjdk-amd64/bin/java   232MiB |
+-----------------------------------------------------------------------------+

cudnn is at /usr/include/cudnn.h version 7.6.5 as per our docs.
I could successfully run the cuda samples.

2. Make sure some tests run. only one test seem to fail: https://github.com/j143/systemds/runs/4149318024

corepointer · 2021-11-09T15:15:50Z

| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |

Sorry my bad. I didn't realize that I raised the bar to compute capability 6 and above when I introduced atomicAdd() for double values (that's exactly the function that is called in the line referenced by the error messages). The Tesla K80 has a compute capability of 3.7 [1]. So for the time being code gen is for cc 6+ only if we don't find a workaround.

[1] https://developer.nvidia.com/cuda-gpus

j143 · 2021-11-17T03:54:06Z

This LGTM. 👍

The workaround for the last failing test can be resolved later. :)

corepointer · 2021-11-25T12:14:51Z

This LGTM. +1

The workaround for the last failing test can be resolved later. :)

Thank you! I'll fix it and merge it in after my next paper deadline on Dec 10.

j143 · 2021-11-25T12:20:23Z

or should I comment out the failed test and merge the remaining. This would avoid rebasing on my gpu runner test fork.

If that is okay for you.

corepointer · 2021-11-25T17:45:21Z

or should I comment out the failed test and merge the remaining. This would avoid rebasing on my gpu runner test fork.

You can use the "@ignore" functionality that we already use in the row template test case (I think test #18).

If that is okay for you.

Yes, please go ahead.

corepointer and others added 4 commits November 8, 2021 23:37

[SYSTEMDS-3020] Initial GPU junit tests

4c79519

This commit is part of the GPU test suite epic [SYSTEMDS-3019] and introduces: * the gpu test java package * tests for cellwise/rowwise codegen * test for unary builtin functions (incomplete)

add basic action to test gpu hosted on runner

b4aea22

add flow trigger for push and pr event

1ae56c7

corepointer force-pushed the gpu_testsuite branch from a2a1e56 to 1ae56c7 Compare November 8, 2021 23:40

j143 self-requested a review November 28, 2021 18:45

j143 approved these changes Nov 28, 2021

View reviewed changes

asfgit closed this in d8dd694 Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3020] Initial GPU junit tests #1317

[SYSTEMDS-3020] Initial GPU junit tests #1317

corepointer commented Jun 14, 2021

phaniarnab commented Jun 14, 2021

corepointer commented Jun 14, 2021

j143 commented Nov 8, 2021 •

edited

corepointer commented Nov 8, 2021

j143 commented Nov 9, 2021 •

edited

corepointer commented Nov 9, 2021

j143 commented Nov 17, 2021

corepointer commented Nov 25, 2021

j143 commented Nov 25, 2021

corepointer commented Nov 25, 2021

[SYSTEMDS-3020] Initial GPU junit tests #1317

[SYSTEMDS-3020] Initial GPU junit tests #1317

Conversation

corepointer commented Jun 14, 2021

phaniarnab commented Jun 14, 2021

corepointer commented Jun 14, 2021

j143 commented Nov 8, 2021 • edited

corepointer commented Nov 8, 2021

j143 commented Nov 9, 2021 • edited

corepointer commented Nov 9, 2021

j143 commented Nov 17, 2021

corepointer commented Nov 25, 2021

j143 commented Nov 25, 2021

corepointer commented Nov 25, 2021

j143 commented Nov 8, 2021 •

edited

j143 commented Nov 9, 2021 •

edited