Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Asynchronous execution #53

Open
tmcdonell opened this Issue · 3 comments

2 participants

@tmcdonell
Collaborator

@rrnewton notes in #48 that the current (driver default) behaviour is to spin when waiting for GPU operations to complete, which is not friendly towards other Haskell threads that want to do useful work. We should change this to something that is gentler with CPU resources (CU_CTX_SCHED_BLOCKING_SYNC).

Tangentially related to #13.

@tmcdonell
Collaborator

Asynchronous execution entails using non-default stream(s) and event waiting for dependencies.

With support for streams and events, we should also (correctly) support asynchronous memory transfer, which additionally requires:

  • The host memory is pinned, so the CUDA driver can do a DMA. Currently Accelerate (base) allocates in pageable memory that is pinned only with respect to the Haskell RTS's GC. Internally, the CUDA driver must copy the data to a pinned region, before performing the DMA.
  • If data transfers and kernels operate in distinct non-default streams these will also overlap on all devices which support the feature (almost all 1.1 and later devices).
@tmcdonell tmcdonell closed this
@tmcdonell tmcdonell reopened this
@robstewart57

Note: this issue is further discussed in June/July 2014 on the accelerate mailing list here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.