Analyze overlapped P2P memory transfer and computing #1

andreaskoepf · 2024-02-19T05:22:03Z

Create an ipynb to analyze in PyTorch the peer-to-peer (between two GPUs) memory transfer and computing in parallel. Dummy computation could for example be some larger matmuls in a loop. Create notebooks folder and place the file there.

Goal should be to demonstrate that memory transfer and computation can run (to some degree) overlapped.

Old nvidia blog post: How to Overlap Data Transfers in CUDA C/C++
cuda steams in pytorch, see CUDA streams
Slides: Data Transfer and CUDA Streams
Slides: CUDA STREAMS BEST PRACTICES AND COMMON PITFALLS

Quote from the ring-attention paper:
"If the computation time exceeds the time required for transferring key-value blocks, this results in no additional communication cost. This overlapping mechanism applies to both forward and backward passes of our approach since the same operations and techniques can be used"

eternalops · 2024-02-19T05:53:59Z

first one's link: https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/

andreaskoepf · 2024-02-28T21:31:40Z

zhuzilin/ring-flash-attention#9

andreaskoepf closed this as completed Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze overlapped P2P memory transfer and computing #1

Analyze overlapped P2P memory transfer and computing #1

andreaskoepf commented Feb 19, 2024

eternalops commented Feb 19, 2024

andreaskoepf commented Feb 28, 2024

Analyze overlapped P2P memory transfer and computing #1

Analyze overlapped P2P memory transfer and computing #1

Comments

andreaskoepf commented Feb 19, 2024

eternalops commented Feb 19, 2024

andreaskoepf commented Feb 28, 2024