Currently, the only possible way of directly interacting with device data from the host is via a host accessor. However, host accessors provide synchronous access to the data: The constructor of a host accessor blocks access to the memory object, and no further command groups can be submitted until the host accessor is destroyed. This blocks the SYCL runtime to continue scheduling command groups, creating a bottleneck that disables some optimizations, such as concurrent copy and kernel execution. It also limits the possibility of users to asynchronously retrieve data from the device.
By using the requirements and actions definitions, we can define additional operations on the command group that enable accessing SYCL memory objects from the host.
Listing [hostcg:req] shows a command group (CGh{rread(bufA),rread_write(bufB)}) that is enqueued on a device queue but performs operations on the host.
We introduce a new type of handler, the host_handler, which includes a new
single_task method that executes a single task on the host. The
host_handler
can be passed to any function that expects a handler
parameter
(e.g. calling get_access(handler)
on a buffer).
By submitting this command group to the SYCL device queue, we guarantee it is
executed in-order w.r.t the other command groups on the same queue.
Simultaneously, we guarantee that this operation is performed
asynchronously w.r.t to the user-thread (therefore, enabling the user
thread to continue submitting command groups).
Other command groups enqueued in the same or different queues
can be executed following the sequential consistency by guaranteeing the
satisfaction of the requisites of this command group.
The possibility of enqueuing host tasks on SYCL queues also enables the runtime to perform further optimizations when available. For example, a SYCL runtime may decide to map / unmap instead of copy operations, or performing asynchronous transfers while data is being computed.
It is possible to run multiple host tasks in the same command group - the host
tasks are executed in-order within the command group. It is not possible to use
host tasks and kernel invocations (e.g. parallel_for
) in the same command
group. This restriction is enforced by the use of host_handler
, which does not
include kernel invocation methods.
auto cgH = [=] (host_handler& h) {
auto accA = bufA.get_access<access::mode::read>(h);
auto accB = bufB.get_access<access::mode::read_write>(h);
h.single_task([=]() {
accB[0] = accA[0] * std::rand();
}
auto accC = bufC.get_access<access::mode::read>(h);
h.single_task([=]() {
accC[0] += accA[0] * accB[0];
}
};
qA.submit(cgH);
Note that in the code above accC
was created after the first host task has
been scheduled, but the actual data may already be available before the first
host task starts executing. In the second host task, the value of accB
is
consistent with the value set in the first host task.
namespace cl {
namespace sycl {
class host_handler {
private:
// implementation defined constructor
host_handler(__unspecified__);
public:
template <typename FunctorT>
void single_task(FunctorT hostFunction);
};
} // namespace sycl
} // namespace cl
Method | Description |
---|---|
template <typename FunctorT> void single_task(FunctorT hostFunction) |
The user-provided hostFunction will be executed once all requirements for the command group are met. The hostFunction must be a Callable object. |
A specific use-case for a host task is to simply update some data on the host after it has been modified on the device. Currently in SYCL, the only opportunity of accessing data on the host is using host accessors or destroying the buffer (to update the final pointer). However, in some cases developers want to update some pointer on the host without blocking the execution.
auto cgH = [=] (handler& h) {
auto accA = bufA.get_access<access::mode::read>(h);
h.parallel_for<class kernel>(range, SomeKernel(accA));
h.copy(accA, hostPtr);
};
qA.submit(cgH);
Note that the users can query the event returned by the submit to check if the command group has finished (and, therefore, the host pointer has been updated).
Additionally, users may want to manually update the device data with host data.
auto cgH = [=] (handler& h) {
auto accA = bufA.get_access<access::mode::read_write>(h);
h.copy(hostPtr, accA);
h.parallel_for<class kernel>(range, SomeKernel(accA));
};
qA.submit(cgH);
In the current SYCL specification there is no functionality to copy the contents of one SYCL buffer to another. Using the copy method, the user can specify also two accessors as origin and destination, thus enabling the runtime to perform the copy in the most efficient way giving the current location of the data in the system.
auto cgH = [=] (handler& h) {
auto accA = bufA.get_access<access::mode::read_write>(h);
auto accB = bufB.get_access<access::mode::read_write>(h);
h.copy(accA, accB);
};
qA.submit(cgH);
A special case of updating the data directly on the device is when a certain pattern is to be replicated across a range of values in the accessor. In C++, this is implemented via the std::fill function in the algorithm headers.
The example shows how to fill buffer A contents with an scalar. Note that at least write access is required.
int pattern = 10;
auto cgH = [=] (handler& h) {
auto accA = bufA.get_access<access::mode::write>(h);
h.fill(accA, pattern);
};
qA.submit(cgH);
Because of restrictions in OpenCL implementations on the size of the pattern, the size in bytes of the data type used for the pattern must be one of 1, 2, 4, 8, 16, 32, 64, or 128.
The following restrictions apply to access mode and target of the accessor
provided to copy
.
When a call to copy
triggers a read from the device data, the
only valid accessor modes are the following:
access::mode::read
access::mode::read_write
access::mode::discard_read_write
Similarly, when a call to copy
triggers a write to the device
data, the only valid accessor modes are the following:
access::mode::write
access::mode::read_write
access::mode::discard_write
access::mode::discard_read_write
Method | Description |
---|---|
template <typename T, int dims, access::mode accessMode, access::target accessTarget> void copy(accessor<T, dims, accessMode, accessTarget> acc, shared_ptr<T> hostPtr) |
Update the contents of the host pointer with the data in accessor acc . hostPtr must have enough space allocated to hold the data. The accessor T can also be const qualified and the hostPtr can also be of type shared_ptr<void> . |
template <typename T, int dims, access::mode accessMode, access::target accessTarget> void copy(shared_ptr<T> hostPtr, accessor<T, dims, accessMode, accessTarget> acc) |
Update the the data in accessor acc with the contents of the host pointer. hostPtr must have enough space allocated to hold the data. The underlying type of the hostPtr shared pointer can also be const qualified or be void . |
template <typename AccessorO, typename AccessorD> void copy(AccessorO accO, AccessorD accD) |
Update the data in accessor accD with the contents of the buffer pointed by accO . The underlying type of accO can also be const qualified. |
template <typename T, int dims, access::mode accessMode, access::target accessTarget> void fill(accessor<T, dims, accessMode, accessTarget> acc, T val) |
Special case of copy from host to device where the origin is a scalar value that will be replicated across the range of the accessor. The type of val can also be const qualified. |