-
Notifications
You must be signed in to change notification settings - Fork 67
Description
This is a question/idea concerning the xtensor/python wrapping.
I was wondering if the following way I'm currently interfacing xt::xarray
to numpy arrays in python is somehow valid and not an abomination. I'm writing down here the ideas, but I also wrote a self-contained and completely commented "example" project attached to this issue.
What this wrapping method allowed me to do:
- Wrap
xarray
andxstrided_view
; - Allow propagation of changes from python to C++ xtensor arrays;
- Creating numpy views of the xtensor-wrapped arrays, and other similar things.
Full Example: pytensor.zip
C++ to Python
First I had to take care of transforming xtensor strides to numpy strides:
template <typename T>
inline typename xt::xarray<T>::strides_type strides_to_pystrides(typename xt::xarray<T>::strides_type strides)
{
for (auto &stride : strides)
stride *= sizeof(T);
return strides;
}
Once I had a way of adapting the strides, I transformed the xarray into a pybind11::array_t
, as follows:
template <typename T>
inline typename pybind11::array_t<T> array_to_pyarray(typename xt::xarray<T> &src)
{
return pybind11::memoryview::from_buffer(
src.data(), // buffer
src.shape(), // shape
strides_to_pystrides<T>(src.strides()), // strides
false // readonly
);
}
I've tested, and the changes made in python are correctly registered in C++. The only thing I know I must avoid is overriding the array with itself in python, otherwise, it will be a mess (more on this inside the attached full example).
A this point I thought, maybe I can do the same with views? So I wrote the following function:
template <typename T>
inline pybind11::array_t<T> view_to_pyarray(typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &view)
{
// Return a memory view, allowing changes from python.
return pybind11::memoryview::from_buffer(
view.data() + view.data_offset(), // buffer
view.shape(), // shape
strides_to_pystrides<T>(view.strides()) // strides
);
}
I mean, that's all from this side.
Python to C++
The interfacing in the other direction is done in a similar fashion.
I needed some "adaptation" functions.
First, for the shape:
template <typename T>
inline typename xt::xarray<T>::shape_type pyinfo_to_shape(const pybind11::buffer_info &info)
{
size_t ndim = static_cast<size_t>(info.ndim);
typename xt::xarray<T>::shape_type shape(ndim);
for (size_t i = 0; i < ndim; ++i)
shape[i] = info.shape[i];
return shape;
}
Second, for the strides:
template <typename T>
inline typename xt::xarray<T>::strides_type pyinfo_to_strides(const pybind11::buffer_info &info)
{
typename xt::xarray<T>::strides_type strides;
for (size_t i = 0; i < info.strides.size(); ++i)
strides.push_back(info.strides[i] / info.itemsize);
return strides;
}
Once I had both those adaptation function, I could easily modify the xt::xarray:
template <typename T>
inline xt::xarray<T> &pyarray_to_array(const pybind11::array_t<T> &src, xt::xarray<T> &dst)
{
// Get the infos regarding the python array.
pybind11::buffer_info info = src.request();
// Transform the python shape.
auto shape = pyinfo_to_shape<T>(info);
// Transforms the buffer size to unsigned.
auto size = static_cast<size_t>(info.size);
// Re-initialize the xarray if the shape is different.
if (dst.shape() != shape)
dst = xt::xarray<T>(shape);
// Get the pointers to the buffers.
T *dst_ptr = static_cast<T *>(dst.data()), *src_ptr = static_cast<T *>(info.ptr);
// Copy the data.
for (size_t i = 0; i < size; ++i)
dst_ptr[i] = src_ptr[i];
return dst;
}
Then, I've done the same for the views:
template <typename T>
inline typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &pyarray_to_view(
const pybind11::array_t<T> &src,
typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &dst)
{
// Get the infos regarding the python array.
pybind11::buffer_info info = src.request();
// Transform the python shape.
auto shape = pyinfo_to_shape<T>(info);
// Transforms the buffer size to unsigned.
auto size = static_cast<size_t>(info.size);
if (shape != dst.shape())
throw std::runtime_error("Shapes must match");
if (size != dst.size())
throw std::runtime_error("Sizes must match");
// Get the pointers to the buffers.
T *dst_ptr = static_cast<T *>(dst.data() + dst.data_offset()), *src_ptr = static_cast<T *>(info.ptr);
// Copy the data.
for (size_t i = 0; i < size; ++i)
dst_ptr[i] = src_ptr[i];
return dst;
}
Error To Avoid
Let us take an extract of the example I attached to the issue. Here is the C++ registration:
/// @brief Just an example of dataset containing xtensor arrays and views.
class DataSet {
public:
/// A set of arrays.
xt::xarray<double> x1;
DataSet() : x1{{9, 8}, {7, 6}, {5, 4}, {3, 2}} {
// Nothing to do.
}
};
PYBIND11_MODULE(pytensor, m) {
pybind11::class_<DataSet>(m, "DataSet")
.def(pybind11::init<>())
.def_property("x1",
[](DataSet &self) { return array_to_pyarray(self.x1); },
[](DataSet &self, const pybind11::array_t<double> &m) { pyarray_to_array(m, self.x1); });
}
In python I need to avoid overriding the same array with itself:
from pytensor import DataSet
# Create the dataset.
ds = DataSet()
# Reshape x1 and override it.
print("ds.x1 :\n", ds.x1, "\n")
ds.x1 = ds.x1.reshape(2, 2, 2)
print("ds.x1 :\n", ds.x1, "\n")
This will output:
ds.x1 :
[[9. 8.]
[7. 6.]
[5. 4.]
[3. 2.]]
ds.x1 :
[[[0.000000e+000 9.381848e-317]
[7.000000e+000 6.000000e+000]]
[[5.000000e+000 4.000000e+000]
[3.000000e+000 2.000000e+000]]]
So except for this, it is behaving nicely the binding. I can also provided a solution to this problem:
from copy import copy
...
ds.x1 = copy(ds.x1.reshape(2, 2, 2))
Meh.. this requires a copy, is not that nice but it works.
Conclusion
I'm no expert, I started working with xtensor a couple of months ago.
I needed to interface something that was already using xt::xtensor
to python, without changing everything to xt::pyarray
. I'm lazy, and also I didn't know how to make all the strided_view
inside the code work with the xt::pyarray
.
I might be doing something nasty with this code, but I don't know all the meticulous details of how xtensor and python are actually handling the memory. I only know that both numpy and xtensor use the row-major ordered arrays and I'm hoping that's enough.
Thank you very much for reading through my explanation.
Cheers,
Enrico