custom allreduce cuda kernel #20703

wangyems · 2024-05-16T19:58:02Z

Description

Conditionally route to custom AllReduce kernel when buffer size and gpu numbers meet certain requirements. Otherwise, keep using NCCL's AllReduce.

Motivation and Context

onnxruntime/test/python/onnxruntime_test_collective.py

onnxruntime/core/providers/js/operators/conv.h

orttraining/orttraining/core/optimizer/compute_optimizer/padding_elimination.cc

onnxruntime/contrib_ops/cuda/collective/sharded_moe.h

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.h

yuslepukhin

🕐

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.h

…ustom_reduce

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.cu

…ustom_reduce

onnxruntime/contrib_ops/cuda/collective/ipc_utils.cc

This reverts commit db61ff9.

…ustom_reduce

onnxruntime/contrib_ops/cuda/collective/ipc_utils.h

onnxruntime/contrib_ops/cuda/collective/ipc_utils.cc

onnxruntime/contrib_ops/cuda/collective/ipc_utils.h

onnxruntime/contrib_ops/cuda/collective/ipc_utils.cc

yuslepukhin

tianleiwu · 2024-06-13T17:54:55Z

If possible, try use allocator in cuda ep instead of cudaMalloc.

wangyems · 2024-06-13T18:10:13Z

If possible, try use allocator in cuda ep instead of cudaMalloc.

will have a try

wangyems added 6 commits May 16, 2024 19:57

checkin custom reduce

be8e676

suppress Windows nvcc warnings

49461b1

fix misspell

bace947

update

e7d37cb

rocm

c7471b6

rocm

435f1d3

wangyems marked this pull request as ready for review May 17, 2024 04:22

github-advanced-security bot found potential problems May 17, 2024

View reviewed changes

onnxruntime/test/python/onnxruntime_test_collective.py Fixed Show fixed Hide fixed

wangyems requested a review from a team May 17, 2024 21:32

update

9ba3637

wangyems force-pushed the wangye/custom_reduce branch from ead5e90 to 9ba3637 Compare May 21, 2024 04:20

yuslepukhin reviewed May 22, 2024

View reviewed changes

onnxruntime/core/providers/js/operators/conv.h Outdated Show resolved Hide resolved

yuslepukhin reviewed May 22, 2024

View reviewed changes

orttraining/orttraining/core/optimizer/compute_optimizer/padding_elimination.cc Outdated Show resolved Hide resolved

yuslepukhin reviewed May 22, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/sharded_moe.h Show resolved Hide resolved

yuslepukhin reviewed May 22, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.h Outdated Show resolved Hide resolved

yuslepukhin requested changes May 22, 2024

View reviewed changes

yuslepukhin reviewed May 22, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.h Outdated Show resolved Hide resolved

wangyems added 2 commits May 22, 2024 21:33

update

46a1eb7

Merge branch 'main' of github.com:microsoft/onnxruntime into wangye/c…

ec1c605

…ustom_reduce

tianleiwu reviewed May 23, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed May 23, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.cu Outdated Show resolved Hide resolved

tianleiwu reviewed May 23, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/custom_reduce_impl.cu Outdated Show resolved Hide resolved

review comments

a478025

wangyems requested review from tianleiwu and yuslepukhin May 28, 2024 16:21

wangyems added the release:1.18.1 label May 28, 2024

tianleiwu previously approved these changes May 28, 2024

View reviewed changes

protect rank_to_experts_start_index_

799f58e

wangyems dismissed tianleiwu’s stale review via 799f58e May 28, 2024 19:13

Merge branch 'main' of github.com:microsoft/onnxruntime into wangye/c…

7eabb61

…ustom_reduce

review comments

c135d98

wangyems dismissed tianleiwu’s stale review via c135d98 June 4, 2024 20:03

yuslepukhin reviewed Jun 4, 2024

View reviewed changes

onnxruntime/contrib_ops/cuda/collective/ipc_utils.cc Outdated Show resolved Hide resolved

Your Name added 6 commits June 4, 2024 22:33

move expert start idx sync to ctor

f561263

raii ipc ptrs

14a4827

update

db61ff9

Revert "update"

1660b5a

This reverts commit db61ff9.

Merge branch 'main' of github.com:microsoft/onnxruntime into wangye/c…

8f773d3

…ustom_reduce

update

24b7769

wangyems requested a review from yuslepukhin June 6, 2024 02:36