Skip to content

Commit 2e8955f

Browse files
committed
SYCL: Fix test-backend-ops crashes with SYCL-Graph
Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` I see crashes from 3 operations: 1) `-o MUL_MAT`: Issue arising from recording of oneMath `ext_codeplay_enqueue_native_command`. 2) `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187, can these wait calls just be removed? 3) `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 , host work could be wrapped in a host-task? For 1) I have come up with a oneMath fix in uxlfoundation/oneMath#669, I've put a provisional git tag to pull in this PR for testing, but will update to the upstream commit once merged. For 2 & 3) we've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.
1 parent 27aa259 commit 2e8955f

File tree

2 files changed

+30
-3
lines changed

2 files changed

+30
-3
lines changed

ggml/src/ggml-sycl/CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,8 @@ else()
135135
endif()
136136
FetchContent_Declare(
137137
ONEMATH
138-
GIT_REPOSITORY https://github.com/uxlfoundation/oneMath.git
139-
GIT_TAG c255b1b4c41e2ee3059455c1f96a965d6a62568a
138+
GIT_REPOSITORY https://github.com/EwanC/oneMath.git
139+
GIT_TAG 671d9bcc5aa6cc52cce6b6518c9b8f126c0352f4
140140
)
141141
FetchContent_MakeAvailable(ONEMATH)
142142
# Create alias to match with find_package targets name

ggml/src/ggml-sycl/ggml-sycl.cpp

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3639,11 +3639,38 @@ static void ggml_backend_sycl_graph_compute_impl(ggml_backend_sycl_context * syc
36393639
}
36403640
}
36413641

3642+
#ifdef GGML_SYCL_GRAPH
3643+
static bool check_node_graph_compatibility(ggml_cgraph * cgraph) {
3644+
for (int i = 0; i < cgraph->n_nodes; i++) {
3645+
ggml_tensor * node = cgraph->nodes[i];
3646+
switch(node->op) {
3647+
default: break;
3648+
case GGML_OP_CONCAT:
3649+
// ggml_sycl_op_concat() does a blocking host wait after memcpy operations,
3650+
// but wait() can't be called on the events returned by a queue recording
3651+
// to a graph.
3652+
[[fallthrough]];
3653+
case GGML_OP_MUL_MAT_ID:
3654+
// ggml_sycl_op_concat() does a blocking host wait on the sycl queue after
3655+
// submitting a memcpy operation, but wait() can't be called on a queue that
3656+
// is recording to a graph.
3657+
#ifndef NDEBUG
3658+
GGML_LOG_DEBUG("%s: disabling SYCL graphs due to unsupported node type\n", __func__);
3659+
#endif
3660+
return false;
3661+
}
3662+
}
3663+
return true;
3664+
3665+
}
3666+
#endif
3667+
36423668
static ggml_status ggml_backend_sycl_graph_compute(ggml_backend_t backend, ggml_cgraph * cgraph) {
36433669
auto * sycl_ctx = static_cast<ggml_backend_sycl_context *>(backend->context);
36443670

36453671
#ifdef GGML_SYCL_GRAPH
3646-
if (!g_ggml_sycl_disable_graph) {
3672+
bool use_sycl_graph = !g_ggml_sycl_disable_graph && check_node_graph_compatibility(cgraph);
3673+
if (use_sycl_graph) {
36473674
const bool graph_support = dpct::get_device(sycl_ctx->device).has(sycl::aspect::ext_oneapi_limited_graph);
36483675
if (!graph_support) {
36493676
GGML_SYCL_DEBUG("[SYCL-GRAPH] can not use graphs on device:%d\n", sycl_ctx->device);

0 commit comments

Comments
 (0)