Is this a valid way of interfacing xtensor with python?

This is a question/idea concerning the xtensor/python wrapping.

I was wondering if the following way I'm currently interfacing `xt::xarray` to numpy arrays in python is somehow valid and not an abomination. I'm writing down here the ideas, but I also wrote a self-contained and completely commented "example" project attached to this issue.

What this wrapping method allowed me to do:

- Wrap `xarray` and `xstrided_view`;
- Allow propagation of changes from python to C++ xtensor arrays;
- Creating numpy views of the xtensor-wrapped arrays, and other similar things.

Full Example: [pytensor.zip](https://github.com/xtensor-stack/xtensor-python/files/6284351/pytensor.zip)

## C++ to Python

First I had to take care of transforming xtensor strides to numpy strides:
```C++
template <typename T>
inline typename xt::xarray<T>::strides_type strides_to_pystrides(typename xt::xarray<T>::strides_type strides)
{
    for (auto &stride : strides)
        stride *= sizeof(T);
    return strides;
}
```

Once I had a way of adapting the strides, I transformed the xarray into a `pybind11::array_t`, as follows:
```C++
template <typename T>
inline typename pybind11::array_t<T> array_to_pyarray(typename xt::xarray<T> &src)
{
    return pybind11::memoryview::from_buffer(
        src.data(),                             // buffer
        src.shape(),                            // shape
        strides_to_pystrides<T>(src.strides()), // strides
        false                                   // readonly
    );
}
```
I've tested, and the changes made in python are correctly registered in C++. The only thing I know I must avoid is overriding the array with itself in python, otherwise, it will be a mess (more on this inside the attached full example).

A this point I thought, maybe I can do the same with views? So I wrote the following function:
```C++
template <typename T>
inline pybind11::array_t<T> view_to_pyarray(typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &view)
{
    // Return a memory view, allowing changes from python.
    return pybind11::memoryview::from_buffer(
        view.data() + view.data_offset(),       // buffer
        view.shape(),                           // shape
        strides_to_pystrides<T>(view.strides()) // strides
    );
}
```
I mean, that's all from this side.

## Python to C++
The interfacing in the other direction is done in a similar fashion.

I needed some "adaptation" functions.

First, for the shape:
```C++
template <typename T>
inline typename xt::xarray<T>::shape_type pyinfo_to_shape(const pybind11::buffer_info &info)
{
    size_t ndim = static_cast<size_t>(info.ndim);
    typename xt::xarray<T>::shape_type shape(ndim);
    for (size_t i = 0; i < ndim; ++i)
        shape[i] = info.shape[i];
    return shape;
}
```
Second, for the strides:
```C++
template <typename T>
inline typename xt::xarray<T>::strides_type pyinfo_to_strides(const pybind11::buffer_info &info)
{
    typename xt::xarray<T>::strides_type strides;
    for (size_t i = 0; i < info.strides.size(); ++i)
        strides.push_back(info.strides[i] / info.itemsize);
    return strides;
}
```

Once I had both those adaptation function, I could easily modify the xt::xarray:
```C++

template <typename T>
inline xt::xarray<T> &pyarray_to_array(const pybind11::array_t<T> &src, xt::xarray<T> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    // Re-initialize the xarray if the shape is different.
    if (dst.shape() != shape)
        dst = xt::xarray<T>(shape);

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}
```

Then, I've done the same for the views:
```C++
template <typename T>
inline typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &pyarray_to_view(
    const pybind11::array_t<T> &src,
    typename xt::xstrided_view<xt::xarray<T> &, xt::xindex> &dst)
{
    // Get the infos regarding the python array.
    pybind11::buffer_info info = src.request();
    // Transform the python shape.
    auto shape = pyinfo_to_shape<T>(info);
    // Transforms the buffer size to unsigned.
    auto size = static_cast<size_t>(info.size);

    if (shape != dst.shape())
        throw std::runtime_error("Shapes must match");
    if (size != dst.size())
        throw std::runtime_error("Sizes must match");

    // Get the pointers to the buffers.
    T *dst_ptr = static_cast<T *>(dst.data() + dst.data_offset()), *src_ptr = static_cast<T *>(info.ptr);

    // Copy the data.
    for (size_t i = 0; i < size; ++i)
        dst_ptr[i] = src_ptr[i];
    return dst;
}
```

## Error To Avoid
Let us take an extract of the example I attached to the issue. Here is the C++ registration:
```C++
/// @brief Just an example of dataset containing xtensor arrays and views.
class DataSet {
public:
    /// A set of arrays.
    xt::xarray<double> x1;
    DataSet() : x1{{9, 8}, {7, 6}, {5, 4}, {3, 2}} {
        // Nothing to do.
    }
};
PYBIND11_MODULE(pytensor, m) {
    pybind11::class_<DataSet>(m, "DataSet")
        .def(pybind11::init<>())
        .def_property("x1",
            [](DataSet &self) { return array_to_pyarray(self.x1); },
            [](DataSet &self, const pybind11::array_t<double> &m) { pyarray_to_array(m, self.x1); });
}
```
In python I need to avoid overriding the same array with itself:
```python
from pytensor import DataSet
# Create the dataset.
ds = DataSet()
# Reshape x1 and override it.
print("ds.x1    :\n", ds.x1, "\n")
ds.x1 = ds.x1.reshape(2, 2, 2)
print("ds.x1    :\n", ds.x1, "\n")
```
This will output:
```
ds.x1    :
 [[9. 8.]
 [7. 6.]
 [5. 4.]
 [3. 2.]] 
ds.x1    :
 [[[0.000000e+000 9.381848e-317]
  [7.000000e+000 6.000000e+000]]

 [[5.000000e+000 4.000000e+000]
  [3.000000e+000 2.000000e+000]]] 
```

So except for this, it is behaving nicely the binding. I can also provided a solution to this problem:
```python
from copy import copy
...
ds.x1 = copy(ds.x1.reshape(2, 2, 2))
```
Meh.. this requires a copy, is not that nice but it works.

## Conclusion
I'm no expert, I started working with xtensor a couple of months ago.

I needed to interface something that was already using `xt::xtensor` to python, without changing everything to `xt::pyarray`. I'm lazy, and also I didn't know how to make all the `strided_view` inside the code work with the `xt::pyarray`.

I might be doing something nasty with this code, but I don't know all the meticulous details of how xtensor and python are actually handling the memory. I only know that both numpy and xtensor use the row-major ordered arrays and I'm hoping that's enough.

Thank you very much for reading through my explanation.

Cheers,

Enrico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is this a valid way of interfacing xtensor with python? #254

C++ to Python

Python to C++

Error To Avoid

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is this a valid way of interfacing xtensor with python? #254

Description

C++ to Python

Python to C++

Error To Avoid

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions