-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing device only code that get called in the host side #25
fixing device only code that get called in the host side #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
#if defined(CUTLASS_ENABLE_SYCL) | ||
#if defined(__CUDA_ARCH__) | ||
return threadIdx.x; | ||
#elif defined(__SYCL_Device_ONLY__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the capitalization of DEVICE
in __SYCL_**Device**_ONLY__
is missing, should've been, __SYCL_DEVICE_ONLY__
,
I will open a PR ton fix this 👍🏻
#if defined(CUTLASS_ENABLE_SYCL) | ||
#if defined(__CUDA_ARCH__) | ||
return hfma2(a, b, c); | ||
#elif defined(__SYCL_Device_ONLY__) | ||
// TODO: Add SYCL equivalent function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aacostadiaz , was there a reason sycl::fma
was not used here ?
Acc. to the spec, It accepts sycl::vec of types float, double and half as well
PS - I am assuming that "Add SYCL equivalent function" means that there is no way to support this via sycl at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use sycl::fma
here. The TODO comment means let's figure out its equivalent later when we can properly test it.
* Migrate cute components to SYCL (#19) * Migrate Cute components to SYCL * Add CMake configuration (#20) * Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update README-sycl.md (#22) * Update README-sycl.md Fixing CUDA version * Add XE MMA/copy atom * Update to 3.5 API * fixing device only code that get called in the host side (#25) * Fix GPU clock (#21) * Apply suggestions from code review Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Fix typo in Macro (#28) Fix typo in Macro Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Cosmetic --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Applying the comments --------- Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> * Revert "Updating README-sycl.md to capture the 3.5 modifications (#16)" (#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
…playsoftware#29) * Migrate cute components to SYCL (codeplaysoftware#19) * Migrate Cute components to SYCL * Add CMake configuration (codeplaysoftware#20) * Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update README-sycl.md (codeplaysoftware#22) * Update README-sycl.md Fixing CUDA version * Add XE MMA/copy atom * Update to 3.5 API * fixing device only code that get called in the host side (codeplaysoftware#25) * Fix GPU clock (codeplaysoftware#21) * Apply suggestions from code review Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Fix typo in Macro (codeplaysoftware#28) Fix typo in Macro Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Cosmetic --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Applying the comments --------- Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> * Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Add pvc example (codeplaysoftware#26) * Migrate cute components to SYCL (codeplaysoftware#19) * Migrate Cute components to SYCL * Add CMake configuration (codeplaysoftware#20) * Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update README-sycl.md (codeplaysoftware#22) * Update README-sycl.md Fixing CUDA version * Add XE MMA/copy atom * Update to 3.5 API * Add example * Update include/cute/util/sycl_vec.hpp Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update include/cute/atom/mma_traits_xe.hpp Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update include/cute/atom/copy_traits_xe.hpp Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update include/cute/atom/mma_atom.hpp Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update include/cute/arch/mma_xe.hpp Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: Roland Schulz <roland.schulz@intel.com> add prefetch, mkl validation, and group partition misc refine Make atom type a make_2d_copy argument Use cute::bfloat16_t add KK=2 enable btile prefetch, got 250Tflops (codeplaysoftware#4) direct big tile, got 280Tflops remove unused code and add more print (codeplaysoftware#7) enable unaligned shape like 4098 (codeplaysoftware#9) add barrier and wait enable big tile modify some datatype
…playsoftware#29) * Migrate cute components to SYCL (codeplaysoftware#19) * Migrate Cute components to SYCL * Add CMake configuration (codeplaysoftware#20) * Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update README-sycl.md (codeplaysoftware#22) * Update README-sycl.md Fixing CUDA version * Add XE MMA/copy atom * Update to 3.5 API * fixing device only code that get called in the host side (codeplaysoftware#25) * Fix GPU clock (codeplaysoftware#21) * Apply suggestions from code review Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Fix typo in Macro (codeplaysoftware#28) Fix typo in Macro Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Cosmetic --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Applying the comments --------- Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> * Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
…playsoftware#29) * Migrate cute components to SYCL (codeplaysoftware#19) * Migrate Cute components to SYCL * Add CMake configuration (codeplaysoftware#20) * Add cmake configuration * Update examples/cute/tutorial/CMakeLists.txt Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Update README-sycl.md (codeplaysoftware#22) * Update README-sycl.md Fixing CUDA version * Add XE MMA/copy atom * Update to 3.5 API * fixing device only code that get called in the host side (codeplaysoftware#25) * Fix GPU clock (codeplaysoftware#21) * Apply suggestions from code review Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Fix typo in Macro (codeplaysoftware#28) Fix typo in Macro Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Cosmetic --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Applying the comments --------- Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> * Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17) This reverts commit a726bd3. * fix typo in macro --------- Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
No description provided.