Issue:
Currently, fixed sized blocks returned by the memory resource are not stream aware. When a block is released, it is added back to the pool immediately, while there are other operations that are queued on a stream that uses those blocks. The released blocks could then be provided to a different allocation on a different stream. This could lead to stream ordering issues.
Expected behavior:
When a set of blocks is allocated, the allocating cuda stream needs to be tracked. And released blocks should only be added back to the pool when the work on those blocks are completed. Maybe we could use a cuda event to indicate a last write.
Ex: rmm fixed sized mr follows this. I have a draft PR that enables block allocations there. rapidsai/rmm#2280
Also, the MR doesnt necessarily have to be "host". Any stream-aware upstream resource would be able to use the fixed size mr.
Some of the work is certainly overlapping. Maybe we should converge? @bdice @felipeblazing
Issue:
Currently, fixed sized blocks returned by the memory resource are not stream aware. When a block is released, it is added back to the pool immediately, while there are other operations that are queued on a stream that uses those blocks. The released blocks could then be provided to a different allocation on a different stream. This could lead to stream ordering issues.
Expected behavior:
When a set of blocks is allocated, the allocating cuda stream needs to be tracked. And released blocks should only be added back to the pool when the work on those blocks are completed. Maybe we could use a cuda event to indicate a last write.
Ex: rmm fixed sized mr follows this. I have a draft PR that enables block allocations there. rapidsai/rmm#2280
Also, the MR doesnt necessarily have to be "host". Any stream-aware upstream resource would be able to use the fixed size mr.
Some of the work is certainly overlapping. Maybe we should converge? @bdice @felipeblazing