Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 65 additions & 14 deletions interop_task/interop_task.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
|-------------|--------|
| Name | Interop Task |
| Date of Creation | 16 January 2019 |
| Revision | 0.2 |
| Target | Vendor extension |
| Current Status | _Availalable since CE 1.0.5_ |
| Current Status | 0.1 _Availalable since CE 1.0.5_, 0.2 TBD |
| Reply-to | Victor Lomüller <victor@codeplay.com> |
| Original author | Victor Lomüller <victor@codeplay.com>, Gordon Brown <gordon@codeplay.com>, Peter Zuzek <peter@codeplay.com> |
| Contributors | Victor Lomüller <victor@codeplay.com>, Gordon Brown <gordon@codeplay.com>, Peter Zuzek <peter@codeplay.com> |
| Contributors | Victor Lomüller <victor@codeplay.com>, Gordon Brown <gordon@codeplay.com>, Peter Zuzek <peter@codeplay.com>, Ruyman Reyes <ruyman@codeplay.com> |

# interop_task: Improving SYCL-OpenCL Interoperability

Expand All @@ -16,6 +17,18 @@ SYCL does not allow a user to access cl_mem object out of an cl::sycl::accessor,

This proposal introduces a way for a user to retrieve the low-level objects associated with SYCL buffers and enqueue a host task that can execute an arbitrary portion of host code within the SYCL runtime, therefore taking advantage of SYCL dependency analysis and scheduling.

## Revisions

### 0.2

* `get_buffer` renamed to `get_mem`
* Clarified wording on `get_queue` and `get_mem`
* `interop_handle` is passed by value to the lambda instead of reference

### 0.1

Initial proposal

## Accessing low-level API functionality on SYCL queues

We introduce a new type of handler, the **codeplay::handler**, which includes a new
Expand All @@ -28,7 +41,7 @@ thread to continue submitting command groups).
Other command groups enqueued in the same or different queues
can be executed following the sequential consistency by guaranteeing the
satisfaction of the requisites of this command group.
It is the user's responsibility to ensure the lambda submitted via interop_task does not create race conditions with other command groups or with the host.
It is the user's responsibility to ensure the lambda submitted via `interop_task` does not create race conditions with other command groups or with the host.

The possibility of enqueuing host tasks on SYCL queues also enables the
runtime to perform further optimizations when available.
Expand Down Expand Up @@ -61,18 +74,33 @@ class handler : public cl::sycl::handler {

The `interop_task` allows users to submit tasks containing C++ statements with low-level API calls (e.g. OpenCL Host API entries).
The command group that encapsulates the task will execute following the usual SYCL dataflow execution rules.
The functor passed to the `interop_task` takes as input a const reference to a `cl::sycl::codeplay::interop_handle`. The handle can be used to retrieve underlying OpenCL objects relative to the execution of the task.

It is not allowed to allocate new SYCL object inside an `interop_task`.
The SYCL event returned by the command group will be completed when the `interop_task`
functor is completed. Note the SYCL event is completed regardless of the completion
status of any OpenCL operation enqueued or performed inside the `interop_task`
scope. In particular, dispatching of asynchronous OpenCL operations inside
of the `interop_task` requires manual synchronization.

The functor passed to the `interop_task` takes as input a `cl::sycl::codeplay::interop_handle`. The handle can be used to retrieve underlying OpenCL objects relative to the execution of the task.

It is not allowed to allocate new SYCL objects inside a `interop_task` scope.
It is the user's responsibility to ensure that all operations performed inside the `interop_task` are finished before returning from it.
Since SYCL queues are out of order, and any underlying OpenCL queue can be as well,
there is no guarantee that OpenCL commands enqueued inside the `interop_task`
functor will execute on a particular order w.r.t other SYCL commands or
`interop_task` once dispatched to the OpenCL queue, unless this is is
explicitly handled by using OpenCL events or barriers.

Although the statements inside the lambda submitted to the `interop_task` are executed on the host, the requirements and actions for the command group are satisied for the device.
This is the opposite of the `host_handler` vendor extension, where requisites are satisfied for the host since the statements on the lambda submited to the single task are meant to have side effects on the host only.
The interop task lambda can have side effects on the host, but it is the programmer responsability to ensure requirements dont need to be satisfied for the host.
Although the statements inside the lambda submitted to the `interop_task` are executed on the host, the requirements and actions for the command group are satisfied for the device.
This is the opposite of the `host_handler` [vendor extension](https://github.com/codeplaysoftware/standards-proposals/blob/master/asynchronous-data-flow/sycl-2.2/03_interacting_with_data_on_the_host.md), where requisites are satisfied for the host since the statements on the lambda submitted to the single task are meant to have side effects on the host only.
The `interop-task` lambda can have side effects on the host, but it is the programmer responsibility to ensure requirements don't need to be satisfied for the host.

Executing a `interop_task` in a host device is invalid, and the asynchronous
exception `cl::sycl::feature_not_supported` is thrown.

## Accessing low-level API objects

We introduce the `interop_handle` class which provide access to underlying OpenCL objects during the execution of the `interop_task`.
We introduce the `interop_handle` class which provides access to underlying OpenCL objects during the execution of the `interop_task`.
`interop_handle` objects are immutable objects whose purpose is to enable users access to low-level API functionality.

The interface of the `interop_handle` is defined as follow:
Expand All @@ -88,27 +116,50 @@ class interop_handle {

public:
/* Return the context */
cl_context get_context() const;
cl_context get_context() const noexcept;

/* Return the device id */
cl_device_id get_device() const;
cl_device_id get_device() const noexcept;

/* Return the command queue associated with this task */
cl_command_queue get_queue() const;
cl_command_queue get_queue() const noexcept;

/*
Returns the underlying cl_mem object associated with a given accessor
*/
template <typename dataT, int dimensions, access::mode accessmode,
access::target accessTarget,
access::placeholder isPlaceholder>
cl_mem get_buffer(const accessor<dataT, dimensions, accessmode, access::target accessTarget, access::placeholder isPlaceholder>&) const;
cl_mem get_mem(const accessor<dataT, dimensions, accessmode, access::target accessTarget, access::placeholder isPlaceholder>&) const;
};
} // namespace codeplay
} // namespace sycl
} // namespace cl
```

### Obtaining the underlying OpenCL queue

The `get_queue` method returns an underlying OpenCL queue for the
SYCL queue used to submit the command group, or the fallback queue
if this command-group is re-trying execution on an OpenCL queue.
The OpenCL command queue returned is implementation-defined in cases
where the SYCL queue maps to multiple underlying OpenCL objects.

It is responsibility of the SYCL runtime to ensure the OpenCL queue
returned is in a state that can be used to dispatch work,
and that other potential OpenCL command queues associated with the same
SYCL command queues are not executing commands while the `interop_task`
is being executed.

### Obtaining memory objects for interoperability

The `get_mem` method receives a SYCL accessor that has been defined as a
requirement for the command group, and returns the underlying OpenCL
memory object that is used by the SYCL runtime.
If the accessor passed as parameter is not part of the command group
requirements (e.g. it is an unregistered placeholder accessor),
the exception `cl::sycl::invalid_object` is thrown asynchronously.

## Example using regular accessor

```cpp
Expand Down Expand Up @@ -167,7 +218,7 @@ int main( void )

/* Execute the plan. */
cl_command_queue queue = handle.get_queue();
cl_mem X_mem = handle.get_buffer(X_accessor);
cl_mem X_mem = handle.get_mem(X_accessor);
err = clfftEnqueueTransform(planHandle, CLFFT_FORWARD,
1, &queue, 0, NULL, NULL,
&X_mem, NULL, NULL);
Expand Down