Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When is it safe to free external DevicePtrs? #435

Closed
TravisWhitaker opened this issue Feb 7, 2019 · 3 comments
Closed

When is it safe to free external DevicePtrs? #435

TravisWhitaker opened this issue Feb 7, 2019 · 3 comments

Comments

@TravisWhitaker
Copy link

In this stack overflow answer, there's code similar to the following for making an Array out of a DevicePtr.

...
ptx <- PTX.createTargetFromContext ctx
...
fp  <- c_generate_gpu_data
...
arr@(Array _ ad) <- Sugar.allocateArray (Z :. 32) :: IO (Vector Float)
LRU.insertUnmanaged (ptxMemoryTable ptx) ad fp
let r = PTX.runWith ptx $ A.fold (+) 0 (use arr')
print r
free fp

Something that's not quite clear to me is precisely when it's safe to free the device array backing arr. There're a few potential answers:

  • After r is in WHNF, which means the kernel has launched. This is probably not sufficient, because used arrays are allowed to be copied to the remote asynchronously.
  • After r (or part of it) is in normal form, which means the computation has finished. This probably works, but isn't ideal, since I might want to use the same arr again later.
  • After arr (or its ArrayData) is GC'd, which means the host is done with it but it might still be present on the device. Empirically, this seems to work, but I want to double-check that I understand why.

Initially the assumption 'host array is unreachable' -> 'device array is reclaimable' seemed insufficient to me, but from what I can tell this is essentially how Data.Array.Accelerate.Array.Remote.Table works. From my understanding, arrays are considered reclaimable once their UniqueArray goes out of scope.

Is it safe to do something like this?

...
myFree :: DevicePtr a -> IO ()
...
-- Like the 'mw' helper in `makeWeakArrayData`, but this probably isn't well-typed.
getUniqueArray :: (ArrayPtrs e' ~ Ptr a') => ArrayEltR e' -> ArrayData e' -> UniqueArray a'
...
ptx <- PTX.createTargetFromContext ctx
...
fp  <- c_generate_gpu_data
...
arr@(Array _ ad) <- Sugar.allocateArray (Z :. 32) :: IO (Vector Float)
let !uad = uniqueArrayData (getUniqueArray arrayElt ad)
addFinalizer uad (myFree fp)
LRU.insertUnmanaged (ptxMemoryTable ptx) ad fp
let r = PTX.runWith ptx $ A.fold (+) 0 (use arr')
print r
-- myFree gets called later, if I'm lucky.
@tmcdonell
Copy link
Member

(oops, sorry I forgot to reply to this... feel free to ping me when I lose track of things...)

Hm, that is a good question...

  1. I think we want that when r is in WHNF, the final kernel has completed executing (so run should be synchronous, whereas runAsync can return immediately). But, poking around the code right now I don't think that that is actually the case; we have only that all the kernels have been scheduled at this point, but they may not necessarily have completed (since kernels execute asynchronously w.r.t. each other). Under most circumstances I don't think you can observe this distinction, but see more below...

  2. Certainly after r is in normal form it will be safe; it definitely needs to synchronise the kernel execution before it can copy the data back.

  3. After arr is GC'd is safe because that can only happen after r gets GC'd as well, so either it is done evaluating with fp or it is not going to do so.

Yes, arrays are considered reclaimable once the UniqueArray goes out of scope. Thinking about this more... there is perhaps an awkward condition here where you force r to WHNF to initiate the computation, but if that computation takes a long time, and you discard r without ever forcing it to normal form, and a GC runs, then I think you can fall into a case where the arrays are reclaimed even though the computation may still be executing... I guess that motivates that (1) should be synchronous w.r.t. kernel execution.

Yes, I think the code you posted at the end should be safe; r is the lynchpin which keeps everything it depends on alive.

@tmcdonell
Copy link
Member

A further comment... I think v1.2 does implement (1) above as being synchronous w.r.t. kernel execution. I think this problem exists only in the development version, where a lot of this code got rewritten (mostly for the benefit of the CPU backend).

@tmcdonell
Copy link
Member

I think this can be closed... feel free to reopen if you have further questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants