Updates for Volta architecture #414

tmcdonell · 2018-01-19T11:35:22Z

The Volta architecture (compute capability 7.0) makes some changes which we'll need to update for; in particular they've given up on the warp-synchronous programming model.

https://devblogs.nvidia.com/inside-volta/

tmcdonell · 2018-02-07T01:11:34Z

http://docs.nvidia.com/cuda/volta-tuning-guide/index.html

JonathanFraser · 2018-02-07T20:32:43Z

It appears that it also has difficulty even running in a standard configuration on the Volta series. Attempts to run on a V100 cause these errors:

'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
prof: 
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/AccelerateHS/accelerate/issues
./Data/Array/Accelerate/LLVM/PTX/Compile.hs:195:24: (compile): ptxas - -o /home/ubuntu/.accelerate/accelerate-llvm-1.1.0.0/accelerate-llvm-ptx-1.1.0.1/nvptx64-nvidia-cuda/ptx60/rel/morp4D4F5193F6A9991E.sass -arch=sm_70 (exit 255)
ptxas /tmp/tmpxft_00003c7a_00000000-0_stdin, line 6; error   : PTX .version 3.2 does not support .target sm_70
ptxas fatal   : Ptx assembly aborted due to errors

CallStack (from HasCallStack):
  error, called at ./Data/Array/Accelerate/LLVM/PTX/Compile.hs:195:24 in accelerate-llvm-ptx-1.1.0.1-KvigJtjTalJbmKZlivRJv:Data.Array.Accelerate.LLVM.PTX.Compile

tmcdonell · 2018-02-28T04:30:23Z

AWS has volta GPUs: https://aws.amazon.com/ec2/instance-types/p3/

TravisWhitaker · 2018-07-11T21:15:55Z

I'm interested in understanding exactly what has to change here. Would merely moving to the synchronizing warp intrinsics do the job (with some sort of performance impact?), or does something more fundamental have to happen? What, if any, changes to ptxas would assist?

tmcdonell · 2018-07-12T00:28:50Z

I think it is just a matter of adding warp synchronisation at the right points. I have done this in the obvious places already (e.g. here) but haven't found a volta machine to actually test this on yet.

Performance wise it could be beneficial to look at other implementations which don't require the extra synchronisation points, but, again, I'm not exactly sure yet...

TravisWhitaker · 2018-07-12T01:51:41Z

IIRC we've got some Titan Vs around at work somewhere; I'll see if I can grab one. If so I'd be happy to run whatever tests/benchmarks you think are relevant.

tmcdonell · 2018-07-12T04:27:36Z

If you could run just the standard test suite on it that would be amazing; if it works (🤞) that is a decent sanity check.

stack test accelerate-llvm-ptx

TravisWhitaker · 2018-08-01T19:02:18Z

Manged to get my hands on one. The test suite has been running for almost 24 hours... I see that there's an issue for speeding up the test suite.

tmcdonell · 2018-08-03T01:04:39Z

That seems like it might have hung?

If I get it to just run one test each it will complete in under 2 minutes on my machine:

$ stack test accelerate-llvm-ptx --test-arguments='--hedgehog-tests 1'
...
All 595 tests passed (87.23s)

TravisWhitaker · 2018-08-07T23:33:30Z

Hah, for some reason nix-shell was eating a massive stream of errors about compute cap 7 being unrecognized. I'll fix up the cuda package and then try again when I'm back at that machine.

tmcdonell · 2018-08-31T16:12:01Z

tmcdonell/cuda@9351c1f includes device information for compute 7.x, so you shouldn't get those errors anymore.

tmcdonell · 2020-09-01T08:37:49Z

Fixed in v1.3

tmcdonell added the llvm-ptx accelerate-llvm-ptx label Jan 19, 2018

robbert-vdh mentioned this issue Feb 13, 2019

Geforce RTX2070 compatibility #433

Closed

3 tasks

tmcdonell closed this as completed Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for Volta architecture #414

Updates for Volta architecture #414

tmcdonell commented Jan 19, 2018

tmcdonell commented Feb 7, 2018

JonathanFraser commented Feb 7, 2018

tmcdonell commented Feb 28, 2018

TravisWhitaker commented Jul 11, 2018

tmcdonell commented Jul 12, 2018

TravisWhitaker commented Jul 12, 2018

tmcdonell commented Jul 12, 2018

TravisWhitaker commented Aug 1, 2018

tmcdonell commented Aug 3, 2018

TravisWhitaker commented Aug 7, 2018

tmcdonell commented Aug 31, 2018

tmcdonell commented Sep 1, 2020

Updates for Volta architecture #414

Updates for Volta architecture #414

Comments

tmcdonell commented Jan 19, 2018

tmcdonell commented Feb 7, 2018

JonathanFraser commented Feb 7, 2018

tmcdonell commented Feb 28, 2018

TravisWhitaker commented Jul 11, 2018

tmcdonell commented Jul 12, 2018

TravisWhitaker commented Jul 12, 2018

tmcdonell commented Jul 12, 2018

TravisWhitaker commented Aug 1, 2018

tmcdonell commented Aug 3, 2018

TravisWhitaker commented Aug 7, 2018

tmcdonell commented Aug 31, 2018

tmcdonell commented Sep 1, 2020