[BLAS] SYCL-Graph integration for native-command #669

EwanC · 2025-05-07T08:07:27Z

In order to support applications calling the library with a sycl queue recording to a SYCL-Graph, check if the ext_codeplay_enqueue_native_command command-group is being recorded to a graph object. If so use the native stream recording APIs to add the blas calls as nodes in the graph.

In particular this fixes the llama.cpp MUL_MAT unit tests on CUDA with SYCL-Graph enabled. Previously this would throw an error:

$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2

UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        operator()
        Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154

Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
  in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.

With this patch we can successfully record the blas native-command command-groups used as part of the llama.cpp oneMath calls. In particular USM gemm and gemm_batch which I've extended the tests to verify the correctness of.

Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` I see crashes from 3 operations: 1) `-o MUL_MAT`: Issue arising from recording of oneMath `ext_codeplay_enqueue_native_command`. 2) `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187, can these wait calls just be removed? 3) `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 , host work could be wrapped in a host-task? For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669, I've put a provisional git tag to pull in this PR for testing, but will update to the upstream commit once merged. For 2 & 3) we've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

Rbiessy

To answer your questions:

Yes we need to ensure this can build with the latest public oneAPI release
Yes it would be best to have a test with SYCL-Graph if this is something we want to support. I'm thinking it could be a separate test file. I don't think we would need to test every operation, some sort of example using a single oneMath operation could be enough?

Let me know if you think you will still need this oneMath change. The llama PR using oneDNN looks promising and if it works well we could remove the dependency on oneMath.

src/blas/backends/cublas/cublas_batch.cpp

tests/unit_tests/blas/sycl-graph/gemm_batch_usm.cpp

tests/unit_tests/blas/sycl-graph/gemm_usm.cpp

tests/unit_tests/blas/sycl-graph/gemm_batch_usm.cpp

In order to support applications calling the library with a sycl queue recording to a SYCL-Graph, check if the `ext_codeplay_enqueue_native_command` command-group is being recorded to a graph object. If so use the native stream recording APIs to add the blas calls as nodes in the graph. In particular this fixes the llama.cpp unit test `MUL_MAT(type_a=f16,type_b=f32,m=16,n=1,k=256,bs=[1,1],nr=[2,1],per=[0,1,2,3],v=0)` on CUDA with SYCL-Graph enabled. Previously this would throw an error: ```sh $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

Create SYCL-graph extension specific tests for blas in `tests/unit_tests/blas/sycl-graph`. Currently only covering `gemm_usm` and `gemm_batch_usm` These are stubbed out for the CT tests variants, and when the SYCL compiler doesn't support the `sycl_ext_oneapi_graph` extension.

EwanC · 2025-06-10T14:20:17Z

@andrewtbarker Would you be able to do the other review on this PR? Picking on you since you've reviewed my other PRs, but if there are other codeowners that could do the review then feel free to tag them instead.

andrewtbarker · 2025-06-10T19:29:33Z

/intelci: run

andrewtbarker

This generally looks good, thanks especially for the nice comments. I have one question for my own understanding but it doesn't block merging.

src/blas/backends/cublas/cublas_scope_handle.cpp

Rbiessy · 2025-06-11T12:14:38Z

/intelci: run

I think the internal CI is down for some time. @EwanC would you be able to attach a log of the tests running on Nvidia HW? That's the best alternative.

EwanC · 2025-06-12T10:30:43Z

I think the internal CI is down for some time. @EwanC would you be able to attach a log of the tests running on Nvidia HW? That's the best alternative.

Unfortunately the results I see locally are non-deterministic and I see fails (which vary in number on each run) on the develop branch and this feature branch. I've inserted the logs below however.

Tip DPC++ with CUDA 12.8 used for below testing

$ git log | head
commit 4b3ed9b13632bc31eb35c34a3ef37e17c80ffe05
Author: Wu Yingcong <yingcong.wu@intel.com>
Date:   Thu Jun 12 16:21:08 2025 +0800

    [DeviceSanitizer] Remove not needed test fix. (#18946)

    The fix is no longer need and not working at the moment since it does
    not come with `%{run}`.
	
$ ./bin/sycl-ls
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce GT 1030 6.1 [CUDA 12.8]

Logs
develop_test_main_blas_ct.txt - 19 FAILED TESTS
develop_test_main_blas_rt.txt - 12 FAILED TESTS
native-command_test_main_blas_ct.txt - 12 FAILED TESTS
native-command_test_main_blas_rt.txt - 8 FAILED TESTS

Rbiessy · 2025-06-12T13:29:25Z

Thanks for sharing. Yeah I'm not sure why you see inconsistent tests failing. It may have to do with this device that we're not using for tests usually.
I think this is enough to show this PR should not introduce more failures so merging now.

Update oneMath commit to merged PR uxlfoundation/oneMath#669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

EwanC mentioned this pull request May 7, 2025

SYCL: Fix test-backend-ops crashes with SYCL-Graph ggml-org/llama.cpp#13357

Closed

EwanC force-pushed the sycl-graph_native-command branch from 671d9bc to 3c06934 Compare May 9, 2025 09:40

Rbiessy reviewed May 12, 2025

View reviewed changes

src/blas/backends/cublas/cublas_batch.cpp Outdated Show resolved Hide resolved

EwanC mentioned this pull request May 13, 2025

[SYCL] Bump native enqueue extension version intel/llvm#18321

Merged

EwanC force-pushed the sycl-graph_native-command branch 3 times, most recently from a332a53 to 8d153ca Compare May 15, 2025 11:24

EwanC force-pushed the sycl-graph_native-command branch 4 times, most recently from 32d9344 to 6e3c97c Compare May 30, 2025 09:55

EwanC force-pushed the sycl-graph_native-command branch from 6e3c97c to 5b9dbfc Compare May 30, 2025 10:35

EwanC marked this pull request as ready for review May 30, 2025 13:51

EwanC requested review from a team as code owners May 30, 2025 13:51

EwanC requested a review from Rbiessy June 2, 2025 12:39

EwanC force-pushed the sycl-graph_native-command branch from 5b9dbfc to 7a12cf4 Compare June 5, 2025 12:35

EwanC commented Jun 5, 2025

View reviewed changes

tests/unit_tests/blas/sycl-graph/gemm_batch_usm.cpp Outdated Show resolved Hide resolved

Rbiessy reviewed Jun 6, 2025

View reviewed changes

tests/unit_tests/blas/sycl-graph/gemm_usm.cpp Outdated Show resolved Hide resolved

tests/unit_tests/blas/sycl-graph/gemm_batch_usm.cpp Outdated Show resolved Hide resolved

EwanC added 2 commits June 6, 2025 12:24

EwanC force-pushed the sycl-graph_native-command branch from 7a12cf4 to aec2ab1 Compare June 6, 2025 11:30

Rbiessy approved these changes Jun 6, 2025

View reviewed changes

Rbiessy requested a review from a team June 6, 2025 13:00

andrewtbarker approved these changes Jun 10, 2025

View reviewed changes

src/blas/backends/cublas/cublas_scope_handle.cpp Show resolved Hide resolved

Rbiessy merged commit 8efe85f into uxlfoundation:develop Jun 12, 2025
10 checks passed

EwanC mentioned this pull request Jun 12, 2025

sycl: Bump oneMath commit ggml-org/llama.cpp#14152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BLAS] SYCL-Graph integration for native-command #669

[BLAS] SYCL-Graph integration for native-command #669

Uh oh!

EwanC commented May 7, 2025 •

edited

Loading

Uh oh!

Rbiessy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EwanC commented Jun 10, 2025

Uh oh!

andrewtbarker commented Jun 10, 2025

Uh oh!

andrewtbarker left a comment

Uh oh!

Uh oh!

Rbiessy commented Jun 11, 2025

Uh oh!

EwanC commented Jun 12, 2025

Uh oh!

Rbiessy commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

[BLAS] SYCL-Graph integration for native-command #669

[BLAS] SYCL-Graph integration for native-command #669

Uh oh!

Conversation

EwanC commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EwanC commented Jun 10, 2025

Uh oh!

andrewtbarker commented Jun 10, 2025

Uh oh!

andrewtbarker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Rbiessy commented Jun 11, 2025

Uh oh!

EwanC commented Jun 12, 2025

Uh oh!

Rbiessy commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

EwanC commented May 7, 2025 •

edited

Loading