Skip to content

Commit e1b8657

Browse files
committed
Add buffer-usm interop blog post
1 parent d899f8e commit e1b8657

File tree

3 files changed

+209
-13
lines changed

3 files changed

+209
-13
lines changed

_config.yml

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -54,19 +54,8 @@ plugins:
5454
- jekyll-feed
5555
- jemoji
5656
- jekyll-include-cache
57-
58-
author:
59-
name : "Aksel Alpay"
60-
avatar : "/assets/images/avatar-aksel.png"
61-
bio : "hipSYCL and HPC @ Heidelberg University"
62-
links:
63-
- label: "Twitter"
64-
icon: "fab fa-fw fa-twitter-square"
65-
url: "https://twitter.com/illuhad"
66-
- label: "GitHub"
67-
icon: "fab fa-fw fa-github"
68-
url: "https://github.com/illuhad"
69-
57+
# default author (can be overriden in individual posts)
58+
author: Aksel Alpay
7059
footer:
7160
links:
7261
- label: "Twitter"
@@ -83,6 +72,7 @@ defaults:
8372
type: posts
8473
values:
8574
layout: single
75+
classes: wide
8676
author_profile: true
8777
read_time: true
8878
comments: true

_data/authors.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Aksel Alpay:
2+
name : "Aksel Alpay"
3+
avatar : "/assets/images/avatar-aksel.png"
4+
bio : "hipSYCL and HPC @ Heidelberg University"
5+
links:
6+
- label: "Twitter"
7+
icon: "fab fa-fw fa-twitter-square"
8+
url: "https://twitter.com/illuhad"
9+
- label: "GitHub"
10+
icon: "fab fa-fw fa-github"
11+
url: "https://github.com/illuhad"
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
---
2+
layout: single
3+
title: "hipSYCL 0.9.1 features: buffer-USM interoperability"
4+
date: 2021-05-26 18:30:00 +0100
5+
categories: hipsycl extension
6+
---
7+
8+
This post is part of a series where we discuss some features of hipSYCL 0.9.1. Today's topic is interoperability between buffers and USM pointers.
9+
10+
# Why it matters
11+
12+
SYCL 2020 features two major memory management models, both of which are supported by hipSYCL:
13+
1. The traditional buffer-accessor model that has already been available in the old SYCL 1.2.1. In this model, a task graph is constructed automatically based on access conflicts between access specifications described by `accessor` objects. These `accessor` objects are also used to access data in kernels. The buffer-accessor model provides the SYCL runtime with a lot of information about how much data is used and how it is used. This can help scheduling, and enables automatic optimizations such as overlap of data transfers and kernels.
14+
2. The pointer-based USM model that was introduced in SYCL 2020. Here, allocations are managed explicitly and (unless shared allocations are used) data must be copied explicitly between host and device. The USM model provides more control to the user compared to the buffer-USM model, at the cost of requiring the user to do work that the runtime can do automatically in the buffer-accessor model. It also forces the programmer to think in a model of a host-device dichotomy, which may not be an ideal fit when CPUs are targeted. On the other hand, it is usually considerably easier to port existing pointer-based code to SYCL using the USM model compared to the buffer-accessor model.
15+
16+
It is apparent that both models have valid use cases and are complementary. However, in SYCL 2020, there is barely any interoperability between the two. Accessing data that is stored in a buffer using a USM pointer requires launching a custom kernel that explicitly copies all data elements from the buffer into a USM pointer. This is both cumbersome and comes at a performance cost.
17+
18+
Consequently, once a codebase has started using one particular model, it is effectively locked into it. This is problematic for several reasons:
19+
20+
1. As the SYCL software ecosystem grows, there is a **real danger of ecosystem bifurcation** if no mechanisms are provided to cross from USM-land to buffer-land and vice versa. A SYCL library with a USM pointer API will be of little use for a SYCL application that is written using buffers and accessors.
21+
2. SYCL is all about taking control when you want it, and letting SYCL do what it thinks is best otherwise. This allows to combine the best of two worlds: Low-level kernel optimizations for critical code paths, and the convenience of a high-level programming model for the remaining program. Consequently, **it should be possible to use USM pointers whenever we want detailed low-level control, and move to a more high-level model for other parts of the program**. Not having interoperability between them **can block potential incremental optimization paths during software development**.
22+
3. Which model will be better in terms of performance or clarity is not always apparent, and might be different for different parts of the program. As outlined above, both have strengths and weaknesses, and are complementary. **We should therefore be able to mix buffers and USM pointers.**
23+
24+
# buffer-USM interoperability
25+
26+
To address these issues, hipSYCL 0.9.1 has introduced a comprehensive API for interoperability between USM pointers and buffers. In hipSYCL, you can always construct a buffer on top of existing USM pointers, or extract a USM pointer from a buffer -- completely without additional data copies.
27+
28+
hipSYCL is the first SYCL implementation to expose such a feature, and the reason is found easily: Buffer-USM interoperability in a meaningful, convenient and efficient way requires guarantees about the internal buffer behavior and SYCL implementation design that far exceed anything the SYCL specification guarantees.
29+
30+
We have therefore introduced an additional [hipSYCL runtime specification](https://github.com/illuhad/hipSYCL/blob/develop/doc/runtime-spec.md) that more rigorously defines buffer behavior. In particular hipSYCL makes the following guarantees that are crucial for buffer-USM interoperability:
31+
* Buffers use USM pointers internally. All allocations a buffer performs are USM allocations, and buffers are entirely implemented on top of USM pointers.
32+
* Allocations are persistent. Buffers guarantee that allocations, once they have been made, will remain valid at least until the end of buffer lifetime. Buffers will manage exactly one allocation per (physical) device.
33+
* Buffers allocate lazily. When the buffer is used for the first time on a particular device, it will allocate memory large enough for all of the data such that no reallocations are needed for the lifetime of the buffer.
34+
35+
There are two cases to distinguish for buffer-USM interoperability:
36+
1. Temporal composition: Here we just move memory allocations from USM pointers into a buffer or vice versa; at each point in time only either a USM pointer or a buffer exists for a given allocation.
37+
2. The more complex case: Simultaneously accessing the same allocation as USM pointer and buffer. This is more complicated as it requires some correctness considerations by the programmer.
38+
39+
## Temporal composition
40+
41+
Let's focus on the simple case first: Assume we only want to turn an existing buffer into a USM pointer (or vice versa), but don't want to use them simultaneously. hipSYCL has a fairly intuitive API for that: `buffer::get_pointer()` to extract USM pointers and a special buffer constructor that accepts USM pointers.
42+
43+
{% highlight cpp %}
44+
45+
sycl::queue q;
46+
std::size_t s = 1024;
47+
int* mem = sycl::malloc_device<int>(s, q);
48+
49+
// Use mem as USM pointer
50+
q.parallel_for(sycl::range{s},
51+
[=](sycl::id<1> idx){ mem[idx[0]] = idx[0]; });
52+
// Make sure that USM operations terminate before
53+
// using mem as buffer
54+
q.wait();
55+
56+
// Construct buffer on top of existing USM pointer
57+
{
58+
sycl::device dev = q.get_device();
59+
// Use mem for all operations for device dev. view() assumes
60+
// that the pointer holds valid data. If it should be considered empty,
61+
// use empty_view() instead.
62+
// Note the {} around the view: This is because we are actually passing
63+
// an std::vector. You can feed multiple USM pointers (one for each device)
64+
// into a buffer! Here, we only use one device.
65+
sycl::buffer<int> buff{
66+
{sycl::buffer_allocation::view(mem, dev)}, sycl::range{s}};
67+
68+
q.submit([&](sycl::handler& cgh){
69+
sycl::accessor acc{buff, cgh};
70+
cgh.parallel_for(sycl::range{s}, [=](sycl::id<1> idx){
71+
acc[idx] += 1;
72+
});
73+
});
74+
75+
// Turn buffer into USM pointer again.
76+
// Note: get_pointer() returns nullptr if no allocation is available on a device,
77+
// e.g. if a buffer hasn't yet been used on a device (remember: lazy allocation!)
78+
// or was not initialized with an appropriate view() object.
79+
// In this example, we know that the buffer has an allocation for this
80+
// device because we have given one in the constructor.
81+
int* mem_extracted = buff.get_pointer(dev);
82+
assert(mem_extracted == mem);
83+
84+
// This makes sure that the buffer won't delete the allocation when
85+
// it goes out of scope, so we can use it afterwards.
86+
// By default, view() is non-owning, so in this example it's
87+
// not strictly necessary.
88+
buff.disown_allocation(dev);
89+
} // Closing scope synchronizes all tasks operating on the buffer.
90+
91+
// Use USM pointer again
92+
q.parallel_for(sycl::range{s}, ...).wait();
93+
94+
sycl::free(mem, q);
95+
{% endhighlight %}
96+
97+
## Simultaneous USM pointers and buffers
98+
99+
If we want to have both USM pointers and buffers accessing the same allocation simultaneously, things get more complicated. In this scenario, it is crucial to understand that
100+
1. Buffers automatically calculate dependencies to other operations by detecting conflicting accessors. If operations use the same allocation but without going through accessors, buffers cannot know about these additional dependencies -- the programmer must insert them manually.
101+
2. Buffers automatically calculate necessary data transfers by tracking whether data is valid or outdated on a particular device. If data is modified through USM pointers without the buffer knowing of it, the internal data tracking of the buffer is off and no longer reflects reality. This can cause the buffer to emit data transfers that shouldn't take place, or omit data transfers when they might actually be required. To avoid this, we need to manually update the buffer's data tracking.
102+
103+
Here's an example that shows how it's done.
104+
{% highlight cpp %}
105+
106+
sycl::queue q;
107+
// Queue on a different device for later use
108+
sycl::device other_dev = ...;
109+
sycl::queue q2{other_dev};
110+
111+
std::size_t s = 1024;
112+
sycl::buffer<int> buff{sycl::range{s}};
113+
114+
// Extract USM pointer - at this point we are not yet guaranteed
115+
// that an allocation exists because memory is allocated lazily.
116+
// We can however force preallocation of memory using the hipSYCL
117+
// handler::update extension (Not yet in hipSYCL 0.9.1, but in
118+
// current develop branch on github).
119+
q.submit([&](sycl::handler& cgh){
120+
sycl::accessor acc{buff, cgh};
121+
cgh.update(acc);
122+
});
123+
// Also preallocate on another device for later use.
124+
q2.submit([&](sycl::handler& cgh){
125+
sycl::accessor acc{buff, cgh};
126+
cgh.update(acc);
127+
});
128+
q.wait(); q2.wait();
129+
130+
// Since memory has now been allocated by the buffer, we can now extract
131+
// an USM pointer.
132+
int* usm_ptr = buff.get_pointer(q.get_device());
133+
134+
// Submit a kernel operating on buff
135+
sycl::event evt = q.submit([&](sycl::handler& cgh){
136+
sycl::accessor acc{buff, cgh};
137+
cgh.parallel_for(sycl::range{s}, [=](sycl::id<1> idx){
138+
// Use acc here
139+
});
140+
});
141+
// Submit a USM kernel
142+
sycl::event evt2 = q.submit([&](sycl::handler& cgh){
143+
// Important: Add dependency to the other kernel!
144+
cgh.depends_on(evt);
145+
cgh.parallel_for(sycl::range{s}, [=](sycl::id<1> idx){
146+
// Use usm_ptr here
147+
});
148+
});
149+
{% endhighlight %}
150+
So far no surprises -- we just had to insert dependencies manually as expected. Let's now look at submitting work to a different device. When submitting USM operations to another device, we need to inform the buffer that there are writes taking place on that device, and that it should consider allocations on other devices as outdated after this point. We again use `handler::update()` for this.
151+
{% highlight cpp %}
152+
153+
// This is necessary to allow the buffer to infer necessary data transfers correctly.
154+
sycl::event evt3 = q2.submit([&](sycl::handler& cgh){
155+
// Depend on previous USM operation
156+
cgh.depends_on(evt2);
157+
// This is a read-write accessor - it's important that there's
158+
// a write in the access mode if we want to write to usm_ptr
159+
// in the next kernel.
160+
sycl::accessor acc{buff, cgh};
161+
cgh.update(acc);
162+
})
163+
int* usm_ptr2 = buff.get_pointer(q2.get_device());
164+
sycl::event evt4 = q2.submit([&](sycl::handler& cgh){
165+
cgh.depends_on(evt3);
166+
cgh.parallel_for(sycl::range{s}, [=](sycl::id<1> idx){
167+
// Use usm_ptr2 here
168+
});
169+
});
170+
// End with operation on first device
171+
q.submit([&](sycl::handler& cgh){
172+
// Buffer cannot know that USM kernel operates on same data,
173+
// so we need to manually insert a dependency.
174+
cgh.depends_on(evt4);
175+
// This accessor will trigger data migration back to
176+
// the first device because we are submitting to q
177+
// instead of q2
178+
sycl::accessor acc{buff, cgh};
179+
cgh.parallel_for(sycl::range{s}, [=](sycl::id<1> idx){
180+
// Use acc here
181+
});
182+
});
183+
{% endhighlight %}
184+
185+
In summary, even using buffers and USM pointers simultaneously for the same data is possible, but requires a solid understanding of SYCL and the guarantees that hipSYCL makes specifically.
186+
187+
Remember that buffers cannot know about USM kernels that utilize the same allocations, so always, always make sure to insert correct dependencies. Also, make sure to inform the buffer that an allocation has been *modified* so that it can correctly emit data transfers when an accessor is used for the buffer on a different device (including the host device). This can be done by constructing a accessor with a suitable access mode -- either by using `handler::update()`, or by submitting a kernel that uses accessors.
188+
189+
In practice, this might be much simpler. If you are not working with complex task graphs, you could just use a SYCL 2020 in-order queue to avoid having to insert all those dependencies manually. And if you are only working on a single device, your `handler::update()` calls might not be required anymore.
190+
191+
192+
## API reference
193+
194+
For the full API reference, see the [hipSYCL documentation](https://github.com/illuhad/hipSYCL/blob/develop/doc/buffer-usm-interop.md).
195+

0 commit comments

Comments
 (0)