Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) #38472

Merged
merged 73 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
f8f430f
Initial commit - add _dlpack.pxi
AlenkaF Oct 17, 2023
cf143cd
Initial test
AlenkaF Oct 17, 2023
d43367f
Add to test
AlenkaF Oct 18, 2023
5778a33
New update
AlenkaF Oct 26, 2023
1f0e100
Correct data pointer cast and add numpy test
AlenkaF Oct 26, 2023
e23249e
Add buffer check and tests
AlenkaF Oct 26, 2023
3d8c581
Fix linter errors and try to update code for Cython 3
AlenkaF Oct 26, 2023
82a270f
Add implementation example for fixed shape tensor before moving the c…
AlenkaF Nov 8, 2023
00ac266
Fix linter error
AlenkaF Nov 8, 2023
aeb20a6
Initial move to the Arrow C++
AlenkaF Nov 15, 2023
fadf1f9
Cython lint fix
AlenkaF Nov 16, 2023
72ffcf3
Run linter on dlpack header file
AlenkaF Nov 16, 2023
c1ec84e
Update pycapsule_deleter
AlenkaF Nov 16, 2023
eabc58f
Add dlpack.h to dlpack.cc
AlenkaF Nov 16, 2023
f834d27
Add checks for zero length array and unsupported data types
AlenkaF Nov 21, 2023
3556b07
Update toDLPack signature and remove FixedShapeTensorArray implementa…
AlenkaF Nov 22, 2023
57ceee0
Rename toDLPack to ExportToDLPack
AlenkaF Nov 22, 2023
78363df
Update getDLDataType
AlenkaF Nov 22, 2023
22739e8
Move _dlpack.pxi include to lib.pyx
AlenkaF Nov 22, 2023
1bcf161
Move dlpack files to arrow/c and add dlpack namespace
AlenkaF Nov 23, 2023
c8d8799
Update test parametrisation and numpy assert
AlenkaF Nov 23, 2023
2a5bf42
Update C++ method and expand the docs note
AlenkaF Nov 27, 2023
21b95d8
Rename dlpack_structure.h to dlpack_abi.h and add commit info
AlenkaF Nov 27, 2023
1762544
Empty line
AlenkaF Nov 27, 2023
fedd464
Add comments to pycapsule_deleter
AlenkaF Nov 27, 2023
01997a6
Update includes
AlenkaF Nov 27, 2023
f8dbb0b
Skip test on numpy < 1.24.0
AlenkaF Nov 27, 2023
c6ee1bb
Skip the tests for real
AlenkaF Nov 27, 2023
bcd05ea
Remove leftover from the CMakeLists
AlenkaF Nov 28, 2023
6626a91
Add a CPU device check and pyarrow test for cuda (trial)
AlenkaF Nov 28, 2023
d169d4c
Handle offsets
AlenkaF Nov 28, 2023
11d48af
Linter
AlenkaF Nov 28, 2023
5742e1d
Relax null_bitmap check to null_count check
AlenkaF Nov 28, 2023
024f535
Update python tests
AlenkaF Nov 29, 2023
e6d927c
Update dlpack.cc
AlenkaF Nov 29, 2023
da9baf2
Update libarrow.pxd
AlenkaF Nov 29, 2023
5e2bb80
Fix typo in cuda test
AlenkaF Nov 29, 2023
3af44e1
Add C++ tests
AlenkaF Nov 29, 2023
1f81277
Update C++ tests
AlenkaF Nov 30, 2023
cb0a942
Linter
AlenkaF Nov 30, 2023
46206ec
Remove redundant code in libarrow.pxd
AlenkaF Nov 30, 2023
0619c35
Add __dlpack_device__ implementation
AlenkaF Nov 30, 2023
55246ea
Update __dlpack_device__ - ExportDevice
AlenkaF Nov 30, 2023
0fa84de
Add documentation on the python side
AlenkaF Nov 30, 2023
010f28e
Update C++ tests to use vector instead of function for expected data
AlenkaF Nov 30, 2023
53f2867
Add a deleter to the test
AlenkaF Dec 1, 2023
f9fbf2c
Include suggested changes from Joris
AlenkaF Dec 5, 2023
ab88549
Use C++17 nested namespace declarations
AlenkaF Dec 5, 2023
c832edd
Return Result instead of Status
AlenkaF Dec 5, 2023
0089e23
Update GetDLDataType
AlenkaF Dec 5, 2023
804878f
Fix Python spelling in C++
AlenkaF Dec 5, 2023
fea6fe3
Use ExportDevice to define the device struct in ExportArray
AlenkaF Dec 6, 2023
8071c9b
Fix tensor shape
AlenkaF Dec 6, 2023
9f21208
Linter
AlenkaF Dec 6, 2023
8a10e68
Move the methods from _dlpack.pxi to array.pxi and fix a typo in Pyth…
AlenkaF Dec 6, 2023
672043b
Fix capsule docstring and a typo in tests
AlenkaF Dec 6, 2023
ac85f4e
Add test helper function
AlenkaF Dec 6, 2023
7e0d5f8
Move tests to separate test_dlpack.py
AlenkaF Dec 6, 2023
0fc962c
Apply suggestions from code review
AlenkaF Dec 6, 2023
0d81ae0
Restructure DLMTensorCtx to ManagerCtx and use unique_ptr
AlenkaF Dec 6, 2023
d232de4
Rename pycapsule_deleter to dlpack_pycapsule_deleter and reorganize t…
AlenkaF Dec 7, 2023
602e7b5
Add link to dlpack GitHub repo
AlenkaF Dec 7, 2023
d9b3182
Add check to ExportDevice and restructure ExportArray
AlenkaF Dec 7, 2023
015ac3d
Remove old code
AlenkaF Dec 11, 2023
d2eb7c8
Add dlpack_abi.h to rat_exclude_files.txt.
AlenkaF Dec 11, 2023
15ead8f
dlpack.cc - Add unnamed namespace and optimize code in ExportDevice
AlenkaF Dec 12, 2023
5e87138
dlpack_test.cc - remove gen, add a check for memory deallocation and …
AlenkaF Dec 12, 2023
2160ecd
_dlpack.pxi - remove unsued imports
AlenkaF Dec 12, 2023
8bb7173
array.pxi - simplify the code and change docstrings
AlenkaF Dec 12, 2023
9389941
test_dlpack.py - add a check for memory deallocation
AlenkaF Dec 12, 2023
94dec4b
Change docstrings in __dlpack_device__
AlenkaF Dec 12, 2023
811d2b5
Add null type empty array to C++ tests
AlenkaF Dec 13, 2023
49a978f
Nits and cleanups
pitrou Dec 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/src/arrow/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ set(ARROW_SRCS
type_traits.cc
visitor.cc
c/bridge.cc
c/dlpack.cc
io/buffered.cc
io/caching.cc
io/compressed.cc
Expand Down
1 change: 1 addition & 0 deletions cpp/src/arrow/c/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
# under the License.

add_arrow_test(bridge_test PREFIX "arrow-c")
add_arrow_test(dlpack_test)

add_arrow_benchmark(bridge_benchmark)

Expand Down
133 changes: 133 additions & 0 deletions cpp/src/arrow/c/dlpack.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include "arrow/c/dlpack.h"

#include "arrow/array/array_base.h"
#include "arrow/c/dlpack_abi.h"
#include "arrow/device.h"
#include "arrow/type.h"
#include "arrow/type_traits.h"

namespace arrow::dlpack {

namespace {

Result<DLDataType> GetDLDataType(const DataType& type) {
AlenkaF marked this conversation as resolved.
Show resolved Hide resolved
DLDataType dtype;
dtype.lanes = 1;
dtype.bits = type.bit_width();
switch (type.id()) {
case Type::INT8:
case Type::INT16:
case Type::INT32:
case Type::INT64:
dtype.code = DLDataTypeCode::kDLInt;
return dtype;
case Type::UINT8:
case Type::UINT16:
case Type::UINT32:
case Type::UINT64:
dtype.code = DLDataTypeCode::kDLUInt;
return dtype;
case Type::HALF_FLOAT:
case Type::FLOAT:
case Type::DOUBLE:
dtype.code = DLDataTypeCode::kDLFloat;
return dtype;
case Type::BOOL:
// DLPack supports byte-packed boolean values
return Status::TypeError("Bit-packed boolean data type not supported by DLPack.");
default:
return Status::TypeError("DataType is not compatible with DLPack spec: ",
type.ToString());
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to later support fixed_shape_tensor extension arrays?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to later support fixed_shape_tensor extension arrays?

See #38868 for that


struct ManagerCtx {
std::shared_ptr<ArrayData> array;
DLManagedTensor tensor;
};

} // namespace

Result<DLManagedTensor*> ExportArray(const std::shared_ptr<Array>& arr) {
// Define DLDevice struct nad check if array type is supported
// by the DLPack protocol at the same time. Raise TypeError if not.
// Supported data types: int, uint, float with no validity buffer.
ARROW_ASSIGN_OR_RAISE(auto device, ExportDevice(arr))

// Define the DLDataType struct
const DataType& type = *arr->type();
std::shared_ptr<ArrayData> data = arr->data();
ARROW_ASSIGN_OR_RAISE(auto dlpack_type, GetDLDataType(type));

// Create ManagerCtx that will serve as the owner of the DLManagedTensor
std::unique_ptr<ManagerCtx> ctx(new ManagerCtx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think it's best practice to use std::make_unique instead of wrapping new call. Any reason we can't use make_unique here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change to make_unique. @pitrou any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have the explicit delete as well, I personally like seeing the new+delete combo explicitly, even though make_unique is exactly equivalent AFAIU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless that delete call can then be replaced with a unique_ptr reset() ?


// Define the data pointer to the DLTensor
// If array is of length 0, data pointer should be NULL
pitrou marked this conversation as resolved.
Show resolved Hide resolved
if (arr->length() == 0) {
ctx->tensor.dl_tensor.data = NULL;
} else {
const auto data_offset = data->offset * type.byte_width();
ctx->tensor.dl_tensor.data =
const_cast<uint8_t*>(data->buffers[1]->data() + data_offset);
}

ctx->tensor.dl_tensor.device = device;
ctx->tensor.dl_tensor.ndim = 1;
ctx->tensor.dl_tensor.dtype = dlpack_type;
ctx->tensor.dl_tensor.shape = const_cast<int64_t*>(&data->length);
ctx->tensor.dl_tensor.strides = NULL;
ctx->tensor.dl_tensor.byte_offset = 0;

ctx->array = std::move(data);
ctx->tensor.manager_ctx = ctx.get();
ctx->tensor.deleter = [](struct DLManagedTensor* self) {
delete reinterpret_cast<ManagerCtx*>(self->manager_ctx);
};
return &ctx.release()->tensor;
}

Result<DLDevice> ExportDevice(const std::shared_ptr<Array>& arr) {
// Check if array is supported by the DLPack protocol.
if (arr->null_count() > 0) {
return Status::TypeError("Can only use DLPack on arrays with no nulls.");
}
const DataType& type = *arr->type();
if (type.id() == Type::BOOL) {
return Status::TypeError("Bit-packed boolean data type not supported by DLPack.");
}
if (!is_integer(type.id()) && !is_floating(type.id())) {
return Status::TypeError("DataType is not compatible with DLPack spec: ",
type.ToString());
}

// Define DLDevice struct
DLDevice device;
if (arr->data()->buffers[1]->device_type() == DeviceAllocationType::kCPU) {
AlenkaF marked this conversation as resolved.
Show resolved Hide resolved
AlenkaF marked this conversation as resolved.
Show resolved Hide resolved
device.device_id = 0;
device.device_type = DLDeviceType::kDLCPU;
return device;
} else {
return Status::NotImplemented(
"DLPack support is implemented only for buffers on CPU device.");
}
}

} // namespace arrow::dlpack
51 changes: 51 additions & 0 deletions cpp/src/arrow/c/dlpack.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#pragma once

#include "arrow/array/array_base.h"
#include "arrow/c/dlpack_abi.h"

namespace arrow::dlpack {

/// \brief Export Arrow array as DLPack tensor.
///
/// DLMangedTensor is produced as defined by the DLPack protocol,
/// see https://dmlc.github.io/dlpack/latest/.
///
/// Data types for which the protocol is supported are
/// integer and floating-point data types.
///
/// DLPack protocol only supports arrays with one contiguous
/// memory region which means Arrow Arrays with validity buffers
/// are not supported.
///
/// \param[in] arr Arrow array
/// \return DLManagedTensor struct
ARROW_EXPORT
Result<DLManagedTensor*> ExportArray(const std::shared_ptr<Array>& arr);

/// \brief Get DLDevice with enumerator specifying the
/// type of the device data is stored on and index of the
/// device which is 0 by default for CPU.
///
/// \param[in] arr Arrow array
/// \return DLDevice struct
ARROW_EXPORT
Result<DLDevice> ExportDevice(const std::shared_ptr<Array>& arr);

} // namespace arrow::dlpack