[C++] Add device-specific synchronization API to Buffer #36103

pitrou · 2023-06-15T15:59:21Z

Describe the enhancement requested

For the C device data interface, and other applications, we'll need to add optional synchronization information to buffers.

Here is for example a possible API. It tries to avoid or minimize additional footprint, especially for the CPU case:

// in device.h
class DeviceSyncEvent {
 public:
  virtual ~DeviceSyncEvent() = default;
  /// \brief Block until synchronization event is ready
  virtual Status Wait() = 0;
};

// in buffer.h
class Buffer {
 public:
  ...
  /// \brief Device-specific synchronization event
  ///
  /// If nullptr is returned (which is always the case for CPU buffers),
  /// no synchronization is required.
  virtual DeviceSyncEvent* sync_event() { return nullptr; }
  // XXX or perhaps:
  virtual std::shared_ptr<DeviceSyncEvent> sync_event() { return nullptr; }
  ...
};

// in arrow/gpu
class CudaSyncEvent : public DeviceSyncEvent {
 public:
  void* cuda_event();  // actually a CUevent
  Status Wait() override;
};

class CudaBuffer : public Buffer {
 public:
  DeviceSyncEvent* sync_event() override { return &sync_event_; }
  ...
 protected:
  CudaSyncEvent sync_event_;
  ...
};

Or, as an alternative, define a DeviceSyncStream instead of a DeviceSyncEvent.

TODO: define lifetime semantics.

Component(s)

C++

The text was updated successfully, but these errors were encountered:

pitrou · 2023-06-15T15:59:32Z

@kkraus14 @zeroshade

zeroshade · 2023-06-15T16:16:30Z

@pitrou Is the idea that the Wait() is actually having the CPU wait for synchronization? Or that it puts a wait on the GPU stream?

pitrou · 2023-06-15T16:17:46Z

Wait is a CPU wait call for convenience. You don't have to use it, you can instead use the CUDA event directly. Perhaps it's not a very useful method, in which case it can be removed.

pitrou · 2023-06-15T16:20:29Z

Also cc @felipecrv , if you know a bit about GPU APIs

zeroshade · 2023-06-15T16:38:59Z

We should probably put a void* raw_event() in the base interface, while the derived impls would have properly typed methods. i.e.: CudaSyncEvent::cuda_event would return a cudaEvent_t* and CudaSyncEvent::event would return a void*. etc. But I guess nitpicks like this can be discussed on the eventual PR for this rather than hashing it out here. I think we can agree that this is definitely needed.

pitrou · 2023-06-15T16:45:08Z

AFAICT, we don't expose cuda.h in our own headers. We might want to change this if really necessary, otherwise I think it's better to keep this policy.

pitrou · 2023-06-15T16:45:53Z

In any case, the right lifetime semantics will have to be decided, which may entail putting a shared_ptr somewhere in the API.

kkraus14 · 2023-06-15T17:03:16Z

I'm not sure if an event is the right thing for Buffers. In addition to being able to synchronize, device buffers are often managed in a stream ordered fashion. I.E. allocated and more importantly freed asynchronously from a CPU perspective with the ordering being handled via CUDA streams.

I.E. RMM (libcudf's memory manager) has a couple of stream APIs associated to its device_buffer class as well as a private stream member which gets used in its destructor for deallocating. https://github.com/rapidsai/rmm/blob/0c08dd585031f58e2a9dcfbba5608cce10c423b2/include/rmm/device_buffer.hpp#L374-L400

pitrou · 2023-06-15T17:08:15Z

Ok, so instead of exposing an event-like API, we could expose a stream-like API.

### Rationale for this change Building on the `ArrowDeviceArray` we need to expand the abstractions for handling events and stream synchronization for devices. ### What changes are included in this PR? Initial Abstract implementations for the new DeviceSync API and a CPU implementation. This will be followed up by a CUDA implementation in a subsequent PR. ### Are these changes tested? Yes, tests are added for Import/Export DeviceArrays using the DeviceSync handling. * Closes: #36103 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Matt Topol <zotthewizard@gmail.com>

### Rationale for this change Building on the `ArrowDeviceArray` we need to expand the abstractions for handling events and stream synchronization for devices. ### What changes are included in this PR? Initial Abstract implementations for the new DeviceSync API and a CPU implementation. This will be followed up by a CUDA implementation in a subsequent PR. ### Are these changes tested? Yes, tests are added for Import/Export DeviceArrays using the DeviceSync handling. * Closes: apache#36103 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Matt Topol <zotthewizard@gmail.com>

pitrou added the Type: enhancement label Jun 15, 2023

github-actions bot added the Component: C++ label Jun 15, 2023

pitrou added the Component: GPU label Jun 15, 2023

zeroshade mentioned this issue Jul 12, 2023

GH-36488: [C++] Import/Export ArrowDeviceArray #36489

Merged

zeroshade added a commit to zeroshade/arrow that referenced this issue Aug 7, 2023

apacheGH-36103: [C++] Initial device-specific synchronization API

84b533c

github-actions bot mentioned this issue Aug 7, 2023

GH-36103: [C++] Initial device sync API #37040

Merged

github-actions bot assigned zeroshade Aug 7, 2023

zeroshade added a commit to zeroshade/arrow that referenced this issue Aug 7, 2023

apacheGH-36103: [C++] Initial device-specific synchronization API

22c0c98

zeroshade closed this as completed in #37040 Aug 22, 2023

zeroshade added this to the 14.0.0 milestone Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Add device-specific synchronization API to Buffer #36103

[C++] Add device-specific synchronization API to Buffer #36103

pitrou commented Jun 15, 2023 •

edited

Loading

pitrou commented Jun 15, 2023

zeroshade commented Jun 15, 2023

pitrou commented Jun 15, 2023 •

edited

Loading

pitrou commented Jun 15, 2023

zeroshade commented Jun 15, 2023

pitrou commented Jun 15, 2023

pitrou commented Jun 15, 2023

kkraus14 commented Jun 15, 2023

pitrou commented Jun 15, 2023

[C++] Add device-specific synchronization API to Buffer #36103

[C++] Add device-specific synchronization API to Buffer #36103

Comments

pitrou commented Jun 15, 2023 • edited Loading

Describe the enhancement requested

Component(s)

pitrou commented Jun 15, 2023

zeroshade commented Jun 15, 2023

pitrou commented Jun 15, 2023 • edited Loading

pitrou commented Jun 15, 2023

zeroshade commented Jun 15, 2023

pitrou commented Jun 15, 2023

pitrou commented Jun 15, 2023

kkraus14 commented Jun 15, 2023

pitrou commented Jun 15, 2023

pitrou commented Jun 15, 2023 •

edited

Loading

pitrou commented Jun 15, 2023 •

edited

Loading