diff --git a/interop_task/interop_task.md b/interop_task/interop_task.md index 5a47d6b..16695c2 100644 --- a/interop_task/interop_task.md +++ b/interop_task/interop_task.md @@ -2,11 +2,12 @@ |-------------|--------| | Name | Interop Task | | Date of Creation | 16 January 2019 | +| Revision | 0.2 | | Target | Vendor extension | -| Current Status | _Availalable since CE 1.0.5_ | +| Current Status | 0.1 _Availalable since CE 1.0.5_, 0.2 TBD | | Reply-to | Victor Lomüller | | Original author | Victor Lomüller , Gordon Brown , Peter Zuzek | -| Contributors | Victor Lomüller , Gordon Brown , Peter Zuzek | +| Contributors | Victor Lomüller , Gordon Brown , Peter Zuzek , Ruyman Reyes | # interop_task: Improving SYCL-OpenCL Interoperability @@ -16,6 +17,18 @@ SYCL does not allow a user to access cl_mem object out of an cl::sycl::accessor, This proposal introduces a way for a user to retrieve the low-level objects associated with SYCL buffers and enqueue a host task that can execute an arbitrary portion of host code within the SYCL runtime, therefore taking advantage of SYCL dependency analysis and scheduling. +## Revisions + +### 0.2 + +* `get_buffer` renamed to `get_mem` +* Clarified wording on `get_queue` and `get_mem` +* `interop_handle` is passed by value to the lambda instead of reference + +### 0.1 + +Initial proposal + ## Accessing low-level API functionality on SYCL queues We introduce a new type of handler, the **codeplay::handler**, which includes a new @@ -28,7 +41,7 @@ thread to continue submitting command groups). Other command groups enqueued in the same or different queues can be executed following the sequential consistency by guaranteeing the satisfaction of the requisites of this command group. -It is the user's responsibility to ensure the lambda submitted via interop_task does not create race conditions with other command groups or with the host. +It is the user's responsibility to ensure the lambda submitted via `interop_task` does not create race conditions with other command groups or with the host. The possibility of enqueuing host tasks on SYCL queues also enables the runtime to perform further optimizations when available. @@ -61,18 +74,33 @@ class handler : public cl::sycl::handler { The `interop_task` allows users to submit tasks containing C++ statements with low-level API calls (e.g. OpenCL Host API entries). The command group that encapsulates the task will execute following the usual SYCL dataflow execution rules. -The functor passed to the `interop_task` takes as input a const reference to a `cl::sycl::codeplay::interop_handle`. The handle can be used to retrieve underlying OpenCL objects relative to the execution of the task. -It is not allowed to allocate new SYCL object inside an `interop_task`. +The SYCL event returned by the command group will be completed when the `interop_task` +functor is completed. Note the SYCL event is completed regardless of the completion +status of any OpenCL operation enqueued or performed inside the `interop_task` +scope. In particular, dispatching of asynchronous OpenCL operations inside +of the `interop_task` requires manual synchronization. + +The functor passed to the `interop_task` takes as input a `cl::sycl::codeplay::interop_handle`. The handle can be used to retrieve underlying OpenCL objects relative to the execution of the task. + +It is not allowed to allocate new SYCL objects inside a `interop_task` scope. It is the user's responsibility to ensure that all operations performed inside the `interop_task` are finished before returning from it. +Since SYCL queues are out of order, and any underlying OpenCL queue can be as well, +there is no guarantee that OpenCL commands enqueued inside the `interop_task` +functor will execute on a particular order w.r.t other SYCL commands or +`interop_task` once dispatched to the OpenCL queue, unless this is is +explicitly handled by using OpenCL events or barriers. -Although the statements inside the lambda submitted to the `interop_task` are executed on the host, the requirements and actions for the command group are satisied for the device. -This is the opposite of the `host_handler` vendor extension, where requisites are satisfied for the host since the statements on the lambda submited to the single task are meant to have side effects on the host only. -The interop task lambda can have side effects on the host, but it is the programmer responsability to ensure requirements dont need to be satisfied for the host. +Although the statements inside the lambda submitted to the `interop_task` are executed on the host, the requirements and actions for the command group are satisfied for the device. +This is the opposite of the `host_handler` [vendor extension](https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/sycl-2.2/03_interacting_with_data_on_the_host.md), where requisites are satisfied for the host since the statements on the lambda submitted to the single task are meant to have side effects on the host only. +The `interop-task` lambda can have side effects on the host, but it is the programmer responsibility to ensure requirements don't need to be satisfied for the host. + +Executing a `interop_task` in a host device is invalid, and the asynchronous +exception `cl::sycl::feature_not_supported` is thrown. ## Accessing low-level API objects -We introduce the `interop_handle` class which provide access to underlying OpenCL objects during the execution of the `interop_task`. +We introduce the `interop_handle` class which provides access to underlying OpenCL objects during the execution of the `interop_task`. `interop_handle` objects are immutable objects whose purpose is to enable users access to low-level API functionality. The interface of the `interop_handle` is defined as follow: @@ -88,13 +116,13 @@ class interop_handle { public: /* Return the context */ - cl_context get_context() const; + cl_context get_context() const noexcept; /* Return the device id */ - cl_device_id get_device() const; + cl_device_id get_device() const noexcept; /* Return the command queue associated with this task */ - cl_command_queue get_queue() const; + cl_command_queue get_queue() const noexcept; /* Returns the underlying cl_mem object associated with a given accessor @@ -102,13 +130,36 @@ class interop_handle { template - cl_mem get_buffer(const accessor&) const; + cl_mem get_mem(const accessor&) const; }; } // namespace codeplay } // namespace sycl } // namespace cl ``` +### Obtaining the underlying OpenCL queue + +The `get_queue` method returns an underlying OpenCL queue for the +SYCL queue used to submit the command group, or the fallback queue +if this command-group is re-trying execution on an OpenCL queue. +The OpenCL command queue returned is implementation-defined in cases +where the SYCL queue maps to multiple underlying OpenCL objects. + +It is responsibility of the SYCL runtime to ensure the OpenCL queue +returned is in a state that can be used to dispatch work, +and that other potential OpenCL command queues associated with the same +SYCL command queues are not executing commands while the `interop_task` +is being executed. + +### Obtaining memory objects for interoperability + +The `get_mem` method receives a SYCL accessor that has been defined as a +requirement for the command group, and returns the underlying OpenCL +memory object that is used by the SYCL runtime. +If the accessor passed as parameter is not part of the command group +requirements (e.g. it is an unregistered placeholder accessor), +the exception `cl::sycl::invalid_object` is thrown asynchronously. + ## Example using regular accessor ```cpp @@ -167,7 +218,7 @@ int main( void ) /* Execute the plan. */ cl_command_queue queue = handle.get_queue(); - cl_mem X_mem = handle.get_buffer(X_accessor); + cl_mem X_mem = handle.get_mem(X_accessor); err = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &queue, 0, NULL, NULL, &X_mem, NULL, NULL);