Skip to content

Conversation

@Ziminli
Copy link
Collaborator

@Ziminli Ziminli commented Apr 8, 2025

  • Refactor the general elementwise framework first proposed in issue/48: rope 算子 - CPU #49.
  • Add the CUDA implementation of the general elementwise framework.
  • Refactor SwiGLU to use the general elementwise framework.
  • Add broadcasting testcases for swiglu.
  • Update the python test for swiglu to correctly handle broadcasting testcases.

Screenshot for passing the swiglu CPU testcases:
image

Screenshot for passing the swiglu CUDA testcases:
image

Ziminli added 8 commits April 14, 2025 15:09
…on, refactor swiglu using the generic elementwise framework
…, add two broadcast testcases, correct elementwise cpu mix-precision implementation
…entwise calcualte and calculateImpl to return infiniStatus_t, add CHECK_CUDA to cuda function calls
enable_if, remove std::move() in elementwise_cpu.h, add <array> inclusion
…adjust comment structure and template variable order
@Ziminli Ziminli force-pushed the issue/127_add_general_cuda_elementwise branch from a9ffc27 to 9cc0c41 Compare April 14, 2025 07:47
…nge/correct kernel logic when all inputs have the same dtype
@Ziminli
Copy link
Collaborator Author

Ziminli commented Apr 14, 2025

Refactored the framework to use workspace.

Passing all the cpu test cases (only show a portion of all the testcases):
image

Passing all the cuda test cases (only show a portion of all the testcases):
image

Ziminli added 2 commits April 15, 2025 14:54
…cros for indirecting variable names, change DeviceImpl to use Result for the return type of the create function, change CEIL_DIV
…for correct alignment and change the reference name of the Opaque struct to Opaque instead of struct Opaque
@PanZezhong1725 PanZezhong1725 merged commit 95fd5c1 into main Apr 15, 2025
8 checks passed
@PanZezhong1725 PanZezhong1725 deleted the issue/127_add_general_cuda_elementwise branch April 15, 2025 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] Refactor SwiGLU CUDA Implementation [DEV] General CUDA Elementwise Infrastructure [DEV] General CPU Elementwise Infrastructure

4 participants