Conversation
896c191 to
455b1ef
Compare
455b1ef to
e4e40e8
Compare
b3e676a to
823adfd
Compare
|
|
||
| # This file was modified for portability to AMDGPU | ||
| # Copyright (c) 2025-2026, Advanced Micro Devices, Inc. All rights reserved. | ||
| # Copyright (c) 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Was this file sharing a lot of codes with examples/pytorch/comm_gemm_overlap/te_layer_with_overlap.py? Is it possible to consolidate those two files
| allgather_handle, barrier_handle, tp_size, num_max_streams, comm_cga_size, | ||
| gemm_priority, comm_priority, num_comm_sm, set_sm_margin, use_ce, | ||
| atomic_gemm) { | ||
| initialize(buffer_shape, buffer_dtype, comm_type, aggregate); |
There was a problem hiding this comment.
Same question here for the motivation of this initialize function in the constructor
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
d779653 to
470f153
Compare
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers-host.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
a81c29f to
2ef5743
Compare
abf93a3 to
25972e1
Compare
|
L3 CI -- missing distributed/test_cast_master_weights_to_fp8.py hotfix that is now in dev. |
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.h
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/include/transformer_engine/comm_gemm_overlap.h
Show resolved
Hide resolved
dbbe5c2 to
882a79f
Compare
examples/pytorch/comm_gemm_overlap/rocm_te_layer_with_overlap.py
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
transformer_engine/common/include/transformer_engine/comm_gemm_overlap.h
Outdated
Show resolved
Hide resolved
transformer_engine/common/include/transformer_engine/comm_gemm_overlap.h
Outdated
Show resolved
Hide resolved
ipanfilo
left a comment
There was a problem hiding this comment.
Please also review newly enabled code for FP8 FNUZ/OCP data type selection: torch.float8_e4m3fn ones should be replaced with get_torch_float8_e4m3_type() and the same is for e5m2.
run_comm_gemm_overlap.py - is one of such modules
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
ae200f1 to
c169c75
Compare
alextmagro
left a comment
There was a problem hiding this comment.
Merge conflicts addressed, rerunning L3 just in case.
transformer_engine/common/comm_gemm_overlap/userbuffers/userbuffers.cu
Outdated
Show resolved
Hide resolved
transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
Outdated
Show resolved
Hide resolved
Remove TODO regarding userbuffers
This is the userbuffer_epic branch, to be merged only once all epic tasks have been completed. PRs for epic tasks will be onto this branch.