Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing device only code that get called in the host side #25

Merged

Conversation

mehdi-goli
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@aacostadiaz aacostadiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mehdi-goli mehdi-goli merged commit d30f750 into codeplaysoftware:sycl-develop Apr 11, 2024
@AD2605 AD2605 mentioned this pull request Apr 16, 2024
#if defined(CUTLASS_ENABLE_SYCL)
#if defined(__CUDA_ARCH__)
return threadIdx.x;
#elif defined(__SYCL_Device_ONLY__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the capitalization of DEVICE in __SYCL_**Device**_ONLY__ is missing, should've been, __SYCL_DEVICE_ONLY__,

I will open a PR ton fix this 👍🏻

#if defined(CUTLASS_ENABLE_SYCL)
#if defined(__CUDA_ARCH__)
return hfma2(a, b, c);
#elif defined(__SYCL_Device_ONLY__)
// TODO: Add SYCL equivalent function
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aacostadiaz , was there a reason sycl::fma was not used here ?
Acc. to the spec, It accepts sycl::vec of types float, double and half as well

PS - I am assuming that "Add SYCL equivalent function" means that there is no way to support this via sycl at the moment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use sycl::fma here. The TODO comment means let's figure out its equivalent later when we can properly test it.

taozha2 pushed a commit that referenced this pull request Apr 17, 2024
* Migrate cute components to SYCL (#19)

* Migrate Cute components to SYCL

* Add CMake configuration (#20)

* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update README-sycl.md (#22)

* Update README-sycl.md

Fixing CUDA version

* Add XE MMA/copy atom

* Update to 3.5 API

* fixing device only code that get called in the host side (#25)

* Fix GPU clock (#21)

* Apply suggestions from code review

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Fix typo in Macro (#28)

Fix typo in Macro
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Applying the comments

---------

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

* Revert "Updating README-sycl.md to capture the 3.5 modifications (#16)" (#17)

This reverts commit a726bd3.

* fix typo in macro

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

---------

Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
jiyang1011 pushed a commit to jiyang1011/cutlass-fork that referenced this pull request Apr 24, 2024
…playsoftware#29)

* Migrate cute components to SYCL (codeplaysoftware#19)

* Migrate Cute components to SYCL

* Add CMake configuration (codeplaysoftware#20)

* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update README-sycl.md (codeplaysoftware#22)

* Update README-sycl.md

Fixing CUDA version

* Add XE MMA/copy atom

* Update to 3.5 API

* fixing device only code that get called in the host side (codeplaysoftware#25)

* Fix GPU clock (codeplaysoftware#21)

* Apply suggestions from code review

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Fix typo in Macro (codeplaysoftware#28)

Fix typo in Macro
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Applying the comments

---------

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

* Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17)

This reverts commit a726bd3.

* fix typo in macro

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

---------

Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

Add pvc example (codeplaysoftware#26)

* Migrate cute components to SYCL (codeplaysoftware#19)

* Migrate Cute components to SYCL

* Add CMake configuration (codeplaysoftware#20)

* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update README-sycl.md (codeplaysoftware#22)

* Update README-sycl.md

Fixing CUDA version

* Add XE MMA/copy atom

* Update to 3.5 API

* Add example

* Update include/cute/util/sycl_vec.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update include/cute/atom/mma_traits_xe.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update include/cute/atom/copy_traits_xe.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update include/cute/atom/mma_atom.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update include/cute/arch/mma_xe.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
Co-authored-by: Roland Schulz <roland.schulz@intel.com>

add prefetch, mkl validation, and group partition

misc refine

Make atom type a make_2d_copy argument

Use cute::bfloat16_t

add KK=2

enable btile prefetch, got 250Tflops (codeplaysoftware#4)

direct big tile, got 280Tflops

remove unused code and add more print (codeplaysoftware#7)

enable unaligned shape like 4098 (codeplaysoftware#9)

add barrier and wait

enable big tile

modify some datatype
jiyang1011 pushed a commit to jiyang1011/cutlass-fork that referenced this pull request Apr 29, 2024
…playsoftware#29)

* Migrate cute components to SYCL (codeplaysoftware#19)

* Migrate Cute components to SYCL

* Add CMake configuration (codeplaysoftware#20)

* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update README-sycl.md (codeplaysoftware#22)

* Update README-sycl.md

Fixing CUDA version

* Add XE MMA/copy atom

* Update to 3.5 API

* fixing device only code that get called in the host side (codeplaysoftware#25)

* Fix GPU clock (codeplaysoftware#21)

* Apply suggestions from code review

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Fix typo in Macro (codeplaysoftware#28)

Fix typo in Macro
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Applying the comments

---------

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

* Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17)

This reverts commit a726bd3.

* fix typo in macro

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

---------

Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
jiyang1011 pushed a commit to jiyang1011/cutlass-fork that referenced this pull request Apr 29, 2024
jiyang1011 pushed a commit to jiyang1011/cutlass-fork that referenced this pull request Apr 29, 2024
…playsoftware#29)

* Migrate cute components to SYCL (codeplaysoftware#19)

* Migrate Cute components to SYCL

* Add CMake configuration (codeplaysoftware#20)

* Add cmake configuration

* Update examples/cute/tutorial/CMakeLists.txt

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Update README-sycl.md (codeplaysoftware#22)

* Update README-sycl.md

Fixing CUDA version

* Add XE MMA/copy atom

* Update to 3.5 API

* fixing device only code that get called in the host side (codeplaysoftware#25)

* Fix GPU clock (codeplaysoftware#21)

* Apply suggestions from code review

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Fix typo in Macro (codeplaysoftware#28)

Fix typo in Macro
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Applying the comments

---------

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

* Revert "Updating README-sycl.md to capture the 3.5 modifications (codeplaysoftware#16)" (codeplaysoftware#17)

This reverts commit a726bd3.

* fix typo in macro

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

---------

Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>
Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
AD2605 pushed a commit to AD2605/cutlass-fork that referenced this pull request May 24, 2024
aacostadiaz pushed a commit to aacostadiaz/cutlass-fork that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants