Skip to content

Commit

Permalink
ARROW-7879: [C++][Doc] Add doc for the Device API
Browse files Browse the repository at this point in the history
Closes #6454 from pitrou/ARROW-7879-device-api-doc and squashes the following commits:

9c51a4c <Antoine Pitrou> Improve docstring for Buffer::View
7f7302d <Antoine Pitrou> Some rewrites
9d34ea3 <Antoine Pitrou> More changes
690ee03 <Antoine Pitrou> ARROW-7879:  Add doc for the Device API

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
pitrou committed Feb 25, 2020
1 parent ac5aa71 commit f3ac832
Show file tree
Hide file tree
Showing 10 changed files with 168 additions and 30 deletions.
3 changes: 3 additions & 0 deletions cpp/src/arrow/buffer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ Result<std::shared_ptr<io::RandomAccessFile>> Buffer::GetReader(
}

Result<std::shared_ptr<io::OutputStream>> Buffer::GetWriter(std::shared_ptr<Buffer> buf) {
if (!buf->is_mutable()) {
return Status::Invalid("Expected mutable buffer");
}
return buf->memory_manager_->GetBufferWriter(buf);
}

Expand Down
38 changes: 35 additions & 3 deletions cpp/src/arrow/buffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,6 @@ class ARROW_EXPORT Buffer {

uint8_t operator[](std::size_t i) const { return data_[i]; }

bool is_mutable() const { return is_mutable_; }

/// \brief Construct a new std::string with a hexadecimal representation of the buffer.
/// \return std::string
std::string ToHexString();
Expand Down Expand Up @@ -254,20 +252,54 @@ class ARROW_EXPORT Buffer {
/// `data()` pointer. Otherwise, you'll have to `View()` or `Copy()` it.
bool is_cpu() const { return is_cpu_; }

/// \brief Whether the buffer is mutable
///
/// If this function returns true, you are allowed to modify buffer contents
/// using the pointer returned by `mutable_data()` or `mutable_address()`.
bool is_mutable() const { return is_mutable_; }

const std::shared_ptr<Device>& device() const { return memory_manager_->device(); }

const std::shared_ptr<MemoryManager>& memory_manager() const { return memory_manager_; }

std::shared_ptr<Buffer> parent() const { return parent_; }

// Convenience functions
/// \brief Get a RandomAccessFile for reading a buffer
///
/// The returned file object reads from this buffer's underlying memory.
static Result<std::shared_ptr<io::RandomAccessFile>> GetReader(std::shared_ptr<Buffer>);

/// \brief Get a OutputStream for writing to a buffer
///
/// The buffer must be mutable. The returned stream object writes into the buffer's
/// underlying memory (but it won't resize it).
static Result<std::shared_ptr<io::OutputStream>> GetWriter(std::shared_ptr<Buffer>);

/// \brief Copy buffer
///
/// The buffer contents will be copied into a new buffer allocated by the
/// given MemoryManager. This function supports cross-device copies.
static Result<std::shared_ptr<Buffer>> Copy(std::shared_ptr<Buffer> source,
const std::shared_ptr<MemoryManager>& to);

/// \brief View buffer
///
/// Return a Buffer that reflects this buffer, seen potentially from another
/// device, without making an explicit copy of the contents. The underlying
/// mechanism is typically implemented by the kernel or device driver, and may
/// involve lazy caching of parts of the buffer contents on the destination
/// device's memory.
///
/// If a non-copy view is unsupported for the buffer on the given device,
/// nullptr is returned. An error can be returned if some low-level
/// operation fails (such as an out-of-memory condition).
static Result<std::shared_ptr<Buffer>> View(std::shared_ptr<Buffer> source,
const std::shared_ptr<MemoryManager>& to);

/// \brief View or copy buffer
///
/// Try to view buffer contents on the given MemoryManager's device, but
/// fall back to copying if a no-copy view isn't supported.
static Result<std::shared_ptr<Buffer>> ViewOrCopy(
std::shared_ptr<Buffer> source, const std::shared_ptr<MemoryManager>& to);

Expand Down
24 changes: 24 additions & 0 deletions cpp/src/arrow/buffer_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
#include "arrow/buffer.h"
#include "arrow/buffer_builder.h"
#include "arrow/device.h"
#include "arrow/io/interfaces.h"
#include "arrow/memory_pool.h"
#include "arrow/status.h"
#include "arrow/testing/gtest_util.h"
Expand Down Expand Up @@ -512,6 +513,29 @@ TEST(TestBuffer, SliceMutableBuffer) {
ASSERT_TRUE(slice->Equals(expected));
}

TEST(TestBuffer, GetReader) {
const std::string data_str = "some data to read";
auto data = reinterpret_cast<const uint8_t*>(data_str.c_str());

auto buf = std::make_shared<Buffer>(data, data_str.size());
ASSERT_OK_AND_ASSIGN(auto reader, Buffer::GetReader(buf));
ASSERT_OK_AND_EQ(static_cast<int64_t>(data_str.size()), reader->GetSize());
ASSERT_OK_AND_ASSIGN(auto read_buf, reader->ReadAt(5, 4));
AssertBufferEqual(*read_buf, "data");
}

TEST(TestBuffer, GetWriter) {
std::shared_ptr<Buffer> buf;
ASSERT_OK(AllocateBuffer(9, &buf));
ASSERT_OK_AND_ASSIGN(auto writer, Buffer::GetWriter(buf));
ASSERT_OK(writer->Write(reinterpret_cast<const uint8_t*>("some data"), 9));
AssertBufferEqual(*buf, "some data");

// Non-mutable buffer
buf = std::make_shared<Buffer>(reinterpret_cast<const uint8_t*>("xxx"), 3);
ASSERT_RAISES(Invalid, Buffer::GetWriter(buf));
}

template <typename AllocateFunction>
void TestZeroSizeAllocateBuffer(MemoryPool* pool, AllocateFunction&& allocate_func) {
auto allocated_bytes = pool->bytes_allocated();
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/device.cc
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Result<std::shared_ptr<Buffer>> MemoryManager::CopyBuffer(

Result<std::shared_ptr<Buffer>> MemoryManager::ViewBuffer(
const std::shared_ptr<Buffer>& buf, const std::shared_ptr<MemoryManager>& to) {
if (buf->is_cpu() && to->is_cpu()) {
if (buf->memory_manager() == to) {
return buf;
}
const auto& from = buf->memory_manager();
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/arrow/device.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ class ARROW_EXPORT MemoryManager : public std::enable_shared_from_this<MemoryMan
/// \brief Create a OutputStream to write to a particular buffer.
///
/// The given buffer must be mutable and tied to this MemoryManager.
/// The returned stream object writes into the buffer's underlying memory
/// (but it won't resize it).
///
/// See also the Buffer::GetWriter shorthand.
virtual Result<std::shared_ptr<io::OutputStream>> GetBufferWriter(
Expand Down
26 changes: 16 additions & 10 deletions cpp/src/arrow/gpu/cuda_arrow_ipc.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ class Message;

namespace cuda {

/// \defgroup cuda-ipc-functions Functions for CUDA IPC
///
/// @{

/// \brief Write record batch message to GPU device memory
/// \param[in] batch record batch to write
/// \param[in] ctx CudaContext to allocate device memory from
Expand All @@ -49,6 +53,18 @@ ARROW_EXPORT
Result<std::shared_ptr<CudaBuffer>> SerializeRecordBatch(const RecordBatch& batch,
CudaContext* ctx);

/// \brief ReadRecordBatch specialized to handle metadata on CUDA device
/// \param[in] schema the Schema for the record batch
/// \param[in] buffer a CudaBuffer containing the complete IPC message
/// \param[in] pool a MemoryPool to use for allocating space for the metadata
/// \return RecordBatch or Status
ARROW_EXPORT
Result<std::shared_ptr<RecordBatch>> ReadRecordBatch(
const std::shared_ptr<Schema>& schema, const std::shared_ptr<CudaBuffer>& buffer,
MemoryPool* pool = default_memory_pool());

/// @}

/// \brief Write record batch message to GPU device memory
/// \param[in] batch record batch to write
/// \param[in] ctx CudaContext to allocate device memory from
Expand All @@ -71,16 +87,6 @@ ARROW_EXPORT
Status ReadMessage(CudaBufferReader* reader, MemoryPool* pool,
std::unique_ptr<ipc::Message>* message);

/// \brief ReadRecordBatch specialized to handle metadata on CUDA device
/// \param[in] schema the Schema for the record batch
/// \param[in] buffer a CudaBuffer containing the complete IPC message
/// \param[in] pool a MemoryPool to use for allocating space for the metadata
/// \return RecordBatch or Status
ARROW_EXPORT
Result<std::shared_ptr<RecordBatch>> ReadRecordBatch(
const std::shared_ptr<Schema>& schema, const std::shared_ptr<CudaBuffer>& buffer,
MemoryPool* pool = default_memory_pool());

/// \brief ReadRecordBatch specialized to handle metadata on CUDA device
/// \param[in] schema the Schema for the record batch
/// \param[in] buffer a CudaBuffer containing the complete IPC message
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/arrow/memory_pool.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class MemoryPoolStats {

} // namespace internal

/// Base class for memory allocation.
/// Base class for memory allocation on the CPU.
///
/// Besides tracking the number of allocated bytes, the allocator also should
/// take care of the required 64-byte alignment.
Expand Down
35 changes: 20 additions & 15 deletions docs/source/cpp/api/cuda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
CUDA support
============

CUDA Contexts
=============
Contexts
========

.. doxygenclass:: arrow::cuda::CudaDeviceManager
:project: arrow_cpp
Expand All @@ -30,22 +30,30 @@ CUDA Contexts
:project: arrow_cpp
:members:

Device and Host Buffers
=======================
Devices
=======

.. doxygenclass:: arrow::cuda::CudaBuffer
.. doxygenclass:: arrow::cuda::CudaDevice
:project: arrow_cpp
:members:

.. doxygenclass:: arrow::cuda::CudaMemoryManager
:project: arrow_cpp
:members:

.. doxygenfunction:: arrow::cuda::AllocateCudaHostBuffer
Buffers
=======

.. doxygenclass:: arrow::cuda::CudaBuffer
:project: arrow_cpp
:members:

.. doxygenclass:: arrow::cuda::CudaHostBuffer
:project: arrow_cpp
:members:

Device Memory Input / Output
============================
Memory Input / Output
=====================

.. doxygenclass:: arrow::cuda::CudaBufferReader
:project: arrow_cpp
Expand All @@ -55,15 +63,12 @@ Device Memory Input / Output
:project: arrow_cpp
:members:

CUDA IPC
========
IPC
===

.. doxygenclass:: arrow::cuda::CudaIpcMemHandle
:project: arrow_cpp
:members:

.. doxygenfunction:: arrow::cuda::SerializeRecordBatch
:project: arrow_cpp

.. doxygenfunction:: arrow::cuda::ReadRecordBatch
:project: arrow_cpp
.. doxygengroup:: cuda-ipc-functions
:content-only:
25 changes: 25 additions & 0 deletions docs/source/cpp/api/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,31 @@
Memory (management)
===================

Devices
-------

.. doxygenclass:: arrow::Device
:project: arrow_cpp
:members:

.. doxygenclass:: arrow::CPUDevice
:project: arrow_cpp
:members:

.. doxygenfunction:: arrow::default_cpu_memory_manager
:project: arrow_cpp

Memory Managers
---------------

.. doxygenclass:: arrow::MemoryManager
:project: arrow_cpp
:members:

.. doxygenclass:: arrow::CPUMemoryManager
:project: arrow_cpp
:members:

Buffers
-------

Expand Down
41 changes: 41 additions & 0 deletions docs/source/cpp/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,3 +125,44 @@ you can do so using the :class:`arrow::stl::allocator` wrapper.
Conversely, you can also use a STL allocator to allocate Arrow memory,
using the :class:`arrow::stl::STLMemoryPool` class. However, this may be less
performant, as STL allocators don't provide a resizing operation.

Devices
=======

Many Arrow applications only access host (CPU) memory. However, in some cases
it is desirable to handle on-device memory (such as on-board memory on a GPU)
as well as host memory.

Arrow represents the CPU and other devices using the
:class:`arrow::Device` abstraction. The associated class :class:`arrow::MemoryManager`
specifies how to allocate on a given device. Each device has a default memory manager, but
additional instances may be constructed (for example, wrapping a custom
:class:`arrow::MemoryPool` the CPU).
:class:`arrow::MemoryManager` instances which specifiy how to allocate
memory on a given device (for example, using a particular
:class:`arrow::MemoryPool` on the CPU).

Device-Agnostic Programming
---------------------------

If you receive a Buffer from third-party code, you can query whether it is
CPU-readable by calling its :func:`~arrow::Buffer::is_cpu` method.

You can also view the Buffer on a given device, in a generic way, by calling
:func:`arrow::Buffer::View` or :func:`arrow::Buffer::ViewOrCopy`. This will
be a no-operation if the source and destination devices are identical.
Otherwise, a device-dependent mechanism will attempt to construct a memory
address for the destination device that gives access to the buffer contents.
Actual device-to-device transfer may happen lazily, when reading the buffer
contents.

Similarly, if you want to do I/O on a buffer without assuming a CPU-readable
buffer, you can call :func:`arrow::Buffer::GetReader` and
:func:`arrow::Buffer::GetWriter`.

For example, to get an on-CPU view or copy of an arbitrary buffer, you can
simply do::

std::shared_ptr<arrow::Buffer> arbitrary_buffer = ... ;
std::shared_ptr<arrow::Buffer> cpu_buffer = arrow::Buffer::ViewOrCopy(
arbitrary_buffer, arrow::default_cpu_memory_manager());

0 comments on commit f3ac832

Please sign in to comment.