Skip to content

PyTorch Kernels CI Tests#265

Merged
sreeram-11 merged 12 commits into
mainfrom
sreeram/pytorch-kernels-tests
May 16, 2026
Merged

PyTorch Kernels CI Tests#265
sreeram-11 merged 12 commits into
mainfrom
sreeram/pytorch-kernels-tests

Conversation

@sreeram-11
Copy link
Copy Markdown
Collaborator

@sreeram-11 sreeram-11 commented May 13, 2026

User Actions

Windows

  • Visual Studio 2022 C++ Build Tools
  • AMD Adrenaline Software / ROCm-compatible AMD GPU driver stack

Linux

  • Runner user must have GPU access (User should be in the required GPU access groups)
    • sudo usermod -aG render,video $USER
    • sudo reboot

What the tests validate

  1. ROCm + PyTorch sanity test
    • Create venv and activate it
    • Required dependencies can be installed (ROCm nightly build, PyTorch, Python packages)
    • rocm-sdk init works.
    • ROCm compiler/runtime paths can be configured
    • ROCm tools are available:
      • Windows: hipcc, hipinfo
      • Linux: hipcc, rocminfo
    • PyTorch is a ROCm/HIP build: torch.version.hip is not None
    • PyTorch can see the AMD GPU: torch.cuda.is_available() returns True
    • PyTorch can identify the GPU: torch.cuda.get_device_name(0)

On Windows, the test also handles a HIPRTC DLL naming mismatch seen with current ROCm/PyTorch nightlies. Some PyTorch builds request hiprtc0701.dll, while the installed ROCm SDK may provide hiprtc07013.dll. The test creates a compatibility copy when needed so that the JIT path can load HIPRTC correctly.

  1. Vector Addition JIT test ( lightweight version of the vector addition example)

    • A small CUDA/HIP kernel can be written as a Python string
    • Creates a tensor of ones on the GPU
    • PyTorch can compile the kernel at runtime using torch.cuda._compile_kernel
    • The compiled kernel can be launched on the AMD GPU (Launches a custom add_one kernel)
    • The output is numerically correct (Verifies that all values become 2.0)
  2. Vector Addition C++ Extension test

    • The existing assets/Vector_Addition/setup.py can build the extension
    • The existing add_one_kernel.cu can be compiled through PyTorch’s extension build path
    • The generated Python extension module can be imported
    • Create a GPU tensor of ones
    • The extension can launch the GPU kernel
    • The output is numerically correct (Verifies that all values become 2.0)

On Windows, the test activates the Visual Studio C++ build environment using vcvars64.bat before building the extension.

  1. Matrix Multiplication JIT test

    • A 2D matrix multiplication kernel can be written as a Python string
    • PyTorch can compile the kernel at runtime using torch.cuda._compile_kernel
    • The compiled kernel can be launched with a 2D grid and 2D block
    • The output matrix is correct compared to torch.mm
  2. Matrix Multiplication C++ Extension test

    • The existing assets/Matrix_Multiplication/setup.py can build the extension
    • The existing matmul_kernel.cu can be compiled through PyTorch’s extension build path
    • The generated Python extension module can be imported
    • Create small GPU matrices
    • The extension can launch the GPU matrix multiplication kernel
    • The output is correct compared to torch.mm

On Windows, the test activates the Visual Studio C++ build environment using vcvars64.bat before building the extension.

@adamlam2-amd
Copy link
Copy Markdown
Collaborator

adamlam2-amd commented May 14, 2026

Please note that I made changes to the pytorch kernels playbook. Nothing functionally changed that much though.

Copy link
Copy Markdown
Collaborator

@danielholanda danielholanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. This is definitely a power user playbook. Great tests.

@sreeram-11
Copy link
Copy Markdown
Collaborator Author

sreeram-11 commented May 15, 2026

@adamlam2-amd

Please note that I made changes to the pytorch kernels playbook.

Can you please point me to the PR that has these changes?

@sreeram-11 sreeram-11 merged commit ff6fb7f into main May 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants