From a726bd30735fd50956edf9c8cfc59bb6ac398b02 Mon Sep 17 00:00:00 2001
From: Mehdi Goli <mehdi.goli@codeplay.com>
Date: Thu, 4 Apr 2024 18:05:18 +0100
Subject: [PATCH] Updating README-sycl.md to capture the 3.5 modifications
 (#16)

* Updating README-sycl.md to capture the 3.5 modifications

* Update README-sycl.md

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>

* Remove the sgemm_nt_1_sycl PoC (#15)

* Remove sgemm_nt_1 PoC

* Fix build issues

* Fix code style format

* Remove ENABLE_NVPTX flag

* Update include/cute/util/debug.hpp

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Cosmetic

---------

Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>

* Applying the comments

---------

Co-authored-by: aacostadiaz <alejandro.acosta@codeplay.com>
---
 README-sycl.md | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/README-sycl.md b/README-sycl.md
index 36467e58f7..ea41236438 100644
--- a/README-sycl.md
+++ b/README-sycl.md
@@ -17,14 +17,24 @@ resources for GPUs.
 
 Currently, only one example works on NVIDIA SM 80.
 
-## Building with SYCL support
-
-To build CUTLASS SYCL support you need the latest version of DPC++ compiler, 
-you can either use a recent [nighly build](https://github.com/intel/llvm/releases)
+## Requirements
+ 
+To build CUTLASS SYCL support you need the latest version of DPC++ compiler, you can either use a recent [nighly build](https://github.com/intel/llvm/releases)
 or build the compiler from source.
-In either case, make sure to enable the NVIDIA plugin so you can build applications
+For the latter, make sure to enable the NVIDIA plugin so you can build applications
 for NVIDIA GPUs.
 
+
+I see, in that case let's not call it plugins, which confuses with the Plugins available on the codeplay's website to people who are completely new to SYCL,
+
+we can phrase it as -
+
+Suggested change
+In either case, make sure to enable the NVIDIA plugin so you can build applications
+To build CUTLASS with SYCL support, install the latest DPC++ compiler with the CUDA backend enabled, either by building from source as described [here](https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#build-dpc-toolchain-with-support-for-nvidia-cuda) ,  or by downloading the [nightly releases](https://github.com/intel/llvm/releases)
+
+
+## Building with SYCL support
 Once you have your compiler installed, you need to point the
 `CMAKE_CUDA_HOST_COMPILER` flag to the clang++ provided by it.
 This enables the compilation of SYCL sources without altering the current NVCC path.
@@ -44,18 +54,29 @@ make -G Ninja  \
 
 # Running the example
 
+## CuTe 
 Currently, you can build the CuTe Tutorial using the following command: 
 
 ```
-ninja sgemm_nt_1_sycl
+ninja [EXAMPLE_NAME]_sycl
 ```
 
 You can run it like this from your build directory
 
 ```
-LD_LIBRARY_PATH=/path/to/sycl/install/lib ./examples/cute/tutorial/sgemm_nt_1_sycl
+LD_LIBRARY_PATH=/path/to/sycl/install/lib ./examples/cute/tutorial/[EXAMPLE_NAME]_sycl
 ```
 
+## CUTLASS Example
+ Currently, the example `14_amper_tf32_tensorop_gemm` has been implemented for SYCL on Nvidia Ampere architecture. You can build this from your build directory by running :
+ ```
+  ninja 14_ampere_tf32_tensorop_gemm_cute
+ ```
+ You can run it like this from your build directory
+ ```
+  NVIDIA_TF32_OVERRIDE=1 LD_LIBRARY_PATH=/path/to/sycl/install/lib ./examples/14_ampere_tf32_tensorop_gemm/14_ampere_tf32_tensorop_gemm_cute
+ ```
+
 # References
 
 [1] https://www.khronos.org/sycl/