Skip to content

Is this a valid way of interfacing xtensor with python? #254

@Galfurian

Description

@Galfurian

This is a question/idea concerning the xtensor/python wrapping.

I was wondering if the following way I'm currently interfacing xt::xarray to numpy arrays in python is somehow valid and not an abomination. I'm writing down here the ideas, but I also wrote a self-contained and completely commented "example" project attached to this issue.

What this wrapping method allowed me to do:

  • Wrap xarray and xstrided_view;
  • Allow propagation of changes from python to C++ xtensor arrays;
  • Creating numpy views of the xtensor-wrapped arrays, and other similar things.

Full Example: pytensor.zip

C++ to Python

First I had to take care of transforming xtensor strides to numpy strides:

template <typename T>
inline typename xt::xarray<T>::strides_type strides_to_pystrides(typename xt::xarray<T>::strides_type strides)
{
    for (auto &stride : strides)
        stride *= sizeof(T);
    return strides;
}

Once I had a way of adapting the strides, I transformed the xarray into a pybind11::array_t, as follows:

template <typename T>
inline typename pybind11::array_t<T> array_to_pyarray(typename xt::xarray<T> &src)
{
    return pybind11::memoryview::from_buffer(
        src.data(),                             // buffer
        src.shape(),                            // shape
        strides_to_pystrides<T>(src.strides()), // strides
        false                                   // readonly
    );
}

I've tested, and the changes made in python are correctly registered in C++. The only thing I know I must avoid is overriding the array with itself in python, otherwise, it will be a mess (more on this inside the attached full example).

A this point I thought, maybe I can do the same with views? So I wrote the following function:

template <typename T>
inline pybind11::array_t<T> view_to_pyarray(typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &view)
{
    // Return a memory view, allowing changes from python.
    return pybind11::memoryview::from_buffer(
        view.data() + view.data_offset(),       // buffer
        view.shape(),                           // shape
        strides_to_pystrides<T>(view.strides()) // strides
    );
}

I mean, that's all from this side.

Python to C++

The interfacing in the other direction is done in a similar fashion.

I needed some "adaptation" functions.

First, for the shape:

template <typename T>
inline typename xt::xarray<T>::shape_type pyinfo_to_shape(const pybind11::buffer_info &info)
{
    size_t ndim = static_cast<size_t>(info.ndim);
    typename xt::xarray<T>::shape_type shape(ndim);
    for (size_t i = 0; i < ndim; ++i)
        shape[i] = info.shape[i];
    return shape;
}

Second, for the strides:

template <typename T>
inline typename xt::xarray<T>::strides_type pyinfo_to_strides(const pybind11::buffer_info &info)
{
    typename xt::xarray<T>::strides_type strides;
    for (size_t i = 0; i < info.strides.size(); ++i)
        strides.push_back(info.strides[i] / info.itemsize);
    return strides;
}

Once I had both those adaptation function, I could easily modify the xt::xarray:

template <typename T>
inline xt::xarray<T> &pyarray_to_array(const pybind11::array_t<T> &src, xt::xarray<T> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    // Re-initialize the xarray if the shape is different.
    if (dst.shape() != shape)
        dst = xt::xarray<T>(shape);

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}

Then, I've done the same for the views:

template <typename T>
inline typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &pyarray_to_view(
    const pybind11::array_t<T> &src,
    typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    if (shape != dst.shape())
        throw std::runtime_error("Shapes must match");
    if (size != dst.size())
        throw std::runtime_error("Sizes must match");

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data() + dst.data_offset()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}

Error To Avoid

Let us take an extract of the example I attached to the issue. Here is the C++ registration:

/// @brief Just an example of dataset containing xtensor arrays and views.
class DataSet {
public:
    /// A set of arrays.
    xt::xarray<double> x1;
    DataSet() : x1{{9, 8}, {7, 6}, {5, 4}, {3, 2}} {
        // Nothing to do.
    }
};
PYBIND11_MODULE(pytensor, m) {
    pybind11::class_<DataSet>(m, "DataSet")
        .def(pybind11::init<>())
        .def_property("x1",
            [](DataSet &self) { return array_to_pyarray(self.x1); },
            [](DataSet &self, const pybind11::array_t<double> &m) { pyarray_to_array(m, self.x1); });
}

In python I need to avoid overriding the same array with itself:

from pytensor import DataSet
# Create the dataset.
ds = DataSet()
# Reshape x1 and override it.
print("ds.x1    :\n", ds.x1, "\n")
ds.x1 = ds.x1.reshape(2, 2, 2)
print("ds.x1    :\n", ds.x1, "\n")

This will output:

ds.x1    :
 [[9. 8.]
 [7. 6.]
 [5. 4.]
 [3. 2.]] 
ds.x1    :
 [[[0.000000e+000 9.381848e-317]
  [7.000000e+000 6.000000e+000]]

 [[5.000000e+000 4.000000e+000]
  [3.000000e+000 2.000000e+000]]] 

So except for this, it is behaving nicely the binding. I can also provided a solution to this problem:

from copy import copy
...
ds.x1 = copy(ds.x1.reshape(2, 2, 2))

Meh.. this requires a copy, is not that nice but it works.

Conclusion

I'm no expert, I started working with xtensor a couple of months ago.

I needed to interface something that was already using xt::xtensor to python, without changing everything to xt::pyarray. I'm lazy, and also I didn't know how to make all the strided_view inside the code work with the xt::pyarray.

I might be doing something nasty with this code, but I don't know all the meticulous details of how xtensor and python are actually handling the memory. I only know that both numpy and xtensor use the row-major ordered arrays and I'm hoping that's enough.

Thank you very much for reading through my explanation.

Cheers,

Enrico

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions