@rrnewton notes in #48 that the current (driver default) behaviour is to spin when waiting for GPU operations to complete, which is not friendly towards other Haskell threads that want to do useful work. We should change this to something that is gentler with CPU resources (CU_CTX_SCHED_BLOCKING_SYNC).
Tangentially related to #13.
Asynchronous execution entails using non-default stream(s) and event waiting for dependencies.
With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires:
Note: this issue is further discussed in June/July 2014 on the accelerate mailing list here.