Skip to content

Userbuffer epic#367

Merged
alextmagro merged 18 commits intodevfrom
userbuffer_epic
Mar 31, 2026
Merged

Userbuffer epic#367
alextmagro merged 18 commits intodevfrom
userbuffer_epic

Conversation

@alextmagro
Copy link
Copy Markdown
Contributor

This is the userbuffer_epic branch, to be merged only once all epic tasks have been completed. PRs for epic tasks will be onto this branch.


# This file was modified for portability to AMDGPU
# Copyright (c) 2025-2026, Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this file sharing a lot of codes with examples/pytorch/comm_gemm_overlap/te_layer_with_overlap.py? Is it possible to consolidate those two files

allgather_handle, barrier_handle, tp_size, num_max_streams, comm_cga_size,
gemm_priority, comm_priority, num_comm_sm, set_sm_margin, use_ce,
atomic_gemm) {
initialize(buffer_shape, buffer_dtype, comm_type, aggregate);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here for the motivation of this initialize function in the constructor

@alextmagro alextmagro force-pushed the userbuffer_epic branch 2 times, most recently from a81c29f to 2ef5743 Compare March 17, 2026 03:17
@alextmagro
Copy link
Copy Markdown
Contributor Author

L3 CI
https://github.com/ROCm/TransformerEngine/actions/runs/23252049008

-- missing distributed/test_cast_master_weights_to_fp8.py hotfix that is now in dev.

@alextmagro alextmagro requested a review from ipanfilo March 26, 2026 22:01
Copy link
Copy Markdown
Collaborator

@ipanfilo ipanfilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also review newly enabled code for FP8 FNUZ/OCP data type selection: torch.float8_e4m3fn ones should be replaced with get_torch_float8_e4m3_type() and the same is for e5m2.
run_comm_gemm_overlap.py - is one of such modules

Copy link
Copy Markdown
Contributor Author

@alextmagro alextmagro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflicts addressed, rerunning L3 just in case.

@alextmagro alextmagro requested a review from ipanfilo March 27, 2026 14:36
@alextmagro alextmagro added the ci-level 3 CI test level 3 label Mar 27, 2026
Remove TODO regarding userbuffers
@alextmagro alextmagro requested a review from ipanfilo March 30, 2026 19:04
@alextmagro alextmagro merged commit 3dd8af9 into dev Mar 31, 2026
3 checks passed
@alextmagro alextmagro deleted the userbuffer_epic branch March 31, 2026 05:19
@wangye805 wangye805 restored the userbuffer_epic branch April 1, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants