Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for Volta architecture #414

Closed
tmcdonell opened this issue Jan 19, 2018 · 12 comments
Closed

Updates for Volta architecture #414

tmcdonell opened this issue Jan 19, 2018 · 12 comments
Labels
llvm-ptx accelerate-llvm-ptx

Comments

@tmcdonell
Copy link
Member

The Volta architecture (compute capability 7.0) makes some changes which we'll need to update for; in particular they've given up on the warp-synchronous programming model.

https://devblogs.nvidia.com/inside-volta/

@tmcdonell tmcdonell added the llvm-ptx accelerate-llvm-ptx label Jan 19, 2018
@tmcdonell
Copy link
Member Author

@JonathanFraser
Copy link

It appears that it also has difficulty even running in a standard configuration on the Volta series. Attempts to run on a V100 cause these errors:

'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
'sm_70' is not a recognized processor for this target (ignoring processor)
'+ptx60' is not a recognized feature for this target (ignoring feature)
'sm_70' is not a recognized processor for this target (ignoring processor)
prof: 
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/AccelerateHS/accelerate/issues
./Data/Array/Accelerate/LLVM/PTX/Compile.hs:195:24: (compile): ptxas - -o /home/ubuntu/.accelerate/accelerate-llvm-1.1.0.0/accelerate-llvm-ptx-1.1.0.1/nvptx64-nvidia-cuda/ptx60/rel/morp4D4F5193F6A9991E.sass -arch=sm_70 (exit 255)
ptxas /tmp/tmpxft_00003c7a_00000000-0_stdin, line 6; error   : PTX .version 3.2 does not support .target sm_70
ptxas fatal   : Ptx assembly aborted due to errors

CallStack (from HasCallStack):
  error, called at ./Data/Array/Accelerate/LLVM/PTX/Compile.hs:195:24 in accelerate-llvm-ptx-1.1.0.1-KvigJtjTalJbmKZlivRJv:Data.Array.Accelerate.LLVM.PTX.Compile

@tmcdonell
Copy link
Member Author

AWS has volta GPUs: https://aws.amazon.com/ec2/instance-types/p3/

@TravisWhitaker
Copy link

I'm interested in understanding exactly what has to change here. Would merely moving to the synchronizing warp intrinsics do the job (with some sort of performance impact?), or does something more fundamental have to happen? What, if any, changes to ptxas would assist?

@tmcdonell
Copy link
Member Author

I think it is just a matter of adding warp synchronisation at the right points. I have done this in the obvious places already (e.g. here) but haven't found a volta machine to actually test this on yet.

Performance wise it could be beneficial to look at other implementations which don't require the extra synchronisation points, but, again, I'm not exactly sure yet...

@TravisWhitaker
Copy link

IIRC we've got some Titan Vs around at work somewhere; I'll see if I can grab one. If so I'd be happy to run whatever tests/benchmarks you think are relevant.

@tmcdonell
Copy link
Member Author

If you could run just the standard test suite on it that would be amazing; if it works (🤞) that is a decent sanity check.

stack test accelerate-llvm-ptx

@TravisWhitaker
Copy link

Manged to get my hands on one. The test suite has been running for almost 24 hours... I see that there's an issue for speeding up the test suite.

@tmcdonell
Copy link
Member Author

That seems like it might have hung?

If I get it to just run one test each it will complete in under 2 minutes on my machine:

$ stack test accelerate-llvm-ptx --test-arguments='--hedgehog-tests 1'
...
All 595 tests passed (87.23s)

@TravisWhitaker
Copy link

Hah, for some reason nix-shell was eating a massive stream of errors about compute cap 7 being unrecognized. I'll fix up the cuda package and then try again when I'm back at that machine.

@tmcdonell
Copy link
Member Author

tmcdonell/cuda@9351c1f includes device information for compute 7.x, so you shouldn't get those errors anymore.

@tmcdonell
Copy link
Member Author

Fixed in v1.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm-ptx accelerate-llvm-ptx
Projects
None yet
Development

No branches or pull requests

3 participants