Skip to content

Refactor hip kl#2624

Merged
amd-ruitang3 merged 10 commits intomainfrom
refactor_hip_kl
Apr 8, 2026
Merged

Refactor hip kl#2624
amd-ruitang3 merged 10 commits intomainfrom
refactor_hip_kl

Conversation

@amd-ruitang3
Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@amd-ruitang3 amd-ruitang3 requested review from a team and Copilot April 6, 2026 05:55
@amd-ruitang3 amd-ruitang3 added ci:atom ci:multi-gpu Trigger multi-GPU op tests on PR labels Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2624 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors parts of the HIP tensor utility layer by improving device correctness around memory ops and updating AiterTensor::empty_like() to preserve the source tensor’s strided layout when allocating storage. It also removes a Python-side HIP stream sync helper previously called during Torch→pybind tensor conversion.

Changes:

  • Update AiterTensor::empty_like() to preserve shape/strides and allocate storage sized for the full positive-stride span.
  • Add HipDeviceGuard around device-sensitive operations (hipMemset*, hipFree) and add a max-dims check in init_shape().
  • Remove _sync_hip_stream() and its invocation from torch_to_aiter_pybind().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
csrc/include/aiter_tensor.h Adjusts allocation semantics for empty_like(), adds device guarding for memset/free, and adds a dims bound check.
aiter/utility/dtypes.py Removes implicit HIP stream synchronization from Torch→pybind tensor conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread csrc/include/aiter_tensor.h Outdated
Comment thread csrc/include/aiter_tensor.h
@amd-ruitang3 amd-ruitang3 merged commit 795c281 into main Apr 8, 2026
25 checks passed
@amd-ruitang3 amd-ruitang3 deleted the refactor_hip_kl branch April 8, 2026 02:48
yzhou103 pushed a commit that referenced this pull request Apr 8, 2026
* refactor hip kernel

* optimize aiter tensor

* update

* update

* update

* update

* update

* update

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:atom ci:multi-gpu Trigger multi-GPU op tests on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants