Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] NumPy 2 ABI changes (elsize) #40376

Closed
Tracked by #39532
seberg opened this issue Mar 6, 2024 · 4 comments
Closed
Tracked by #39532

[Python] NumPy 2 ABI changes (elsize) #40376

seberg opened this issue Mar 6, 2024 · 4 comments

Comments

@seberg
Copy link
Contributor

seberg commented Mar 6, 2024

Parent issue:

Describe the enhancement requested

As soon as we merge gh- in NumPy, it would be nice to have a small fix in here. I suspect the following changes should do things (the first two are obvious, the last is the annoying one).

The tricky remaining thing is that itemsize_ would be nice to be an intp/(Py_)ssize_t or int64 now and I am not sure if that has knock-on effects

diff --git a/python/pyarrow/src/arrow/python/arrow_to_pandas.cc b/python/pyarrow/src/arrow/python/arrow_to_pandas.cc
index cb9cbe5b9..e29f5b31f 100644
--- a/python/pyarrow/src/arrow/python/arrow_to_pandas.cc
+++ b/python/pyarrow/src/arrow/python/arrow_to_pandas.cc
@@ -278,3 +278,3 @@ Status PyArray_NewFromPool(int nd, npy_intp* dims, PyArray_Descr* descr, MemoryP
   // * Get better performance through custom allocators
-  int64_t total_size = descr->elsize;
+  int64_t total_size = PyDataType_ELSIZE(descr);
   for (int i = 0; i < nd; ++i) {
diff --git a/python/pyarrow/src/arrow/python/numpy_convert.cc b/python/pyarrow/src/arrow/python/numpy_convert.cc
index dfee88c09..f46487a57 100644
--- a/python/pyarrow/src/arrow/python/numpy_convert.cc
+++ b/python/pyarrow/src/arrow/python/numpy_convert.cc
@@ -48,3 +48,3 @@ NumPyBuffer::NumPyBuffer(PyObject* ao) : Buffer(nullptr, 0) {
     data_ = const_cast<const uint8_t*>(ptr);
-    size_ = PyArray_SIZE(ndarray) * PyArray_DESCR(ndarray)->elsize;
+    size_ = PyArray_NBYTES(ndarray);
     capacity_ = size_;
diff --git a/python/pyarrow/src/arrow/python/numpy_interop.h b/python/pyarrow/src/arrow/python/numpy_interop.h
index ce7baed25..1de28c20d 100644
--- a/python/pyarrow/src/arrow/python/numpy_interop.h
+++ b/python/pyarrow/src/arrow/python/numpy_interop.h
@@ -69,2 +69,7 @@
 
+// Backported NumPy 2 API.  and can be removed if 2 is required.
+#if NPY_ABI_VERSION < 0x02000000
+  #define PyDataType_ELSIZE(descr) ((descr)->elsize)
+#endif
+
 namespace arrow {
diff --git a/python/pyarrow/src/arrow/python/numpy_to_arrow.cc b/python/pyarrow/src/arrow/python/numpy_to_arrow.cc
index 8903df31b..04572536c 100644
--- a/python/pyarrow/src/arrow/python/numpy_to_arrow.cc
+++ b/python/pyarrow/src/arrow/python/numpy_to_arrow.cc
@@ -198,3 +198,3 @@ class NumPyConverter {
     length_ = static_cast<int64_t>(PyArray_SIZE(arr_));
-    itemsize_ = static_cast<int>(PyArray_DESCR(arr_)->elsize);
+    itemsize_ = static_cast<int64_t>(PyArray_ITEMSIZE(arr_));
     stride_ = static_cast<int64_t>(PyArray_STRIDES(arr_)[0]);
@@ -298,3 +298,3 @@ class NumPyConverter {
   int64_t stride_;
-  int itemsize_;
+  int64_t itemsize_;

@jorisvandenbossche do you know this would affect pandas? I am now worried that pandas wheels will fail due to importing arrow. OTOH, it seems like you must have solved that issue (or it is a non-issue).

Component(s)

C++, Python

@jorisvandenbossche
Copy link
Member

I am now worried that pandas wheels will fail due to importing arrow. OTOH, it seems like you must have solved that issue (or it is a non-issue).

PyArrow is not a build dependency, only a run dependency for pandas, so in case numpy breaks something in pyarrow, that wouldn't directly break pandas (only that functionality in pandas that requires pyarrow). Or am I misunderstanding your question?

@seberg
Copy link
Contributor Author

seberg commented Mar 6, 2024

Aha, so I guess we should get pandas wheels (as I suppose we have them and even if tests fail against 2.0, the new wheels tests will work when run against NumPy 1.x). But some things using arrow will be broken until we also have pyarrow wheels (with these fixes).

It is unclear how much of an effect that may be, but if you are not super worried I guess we'll have to see (if you are super worried, we can just stick to keeping elsize an int as few notice the other changes).

@jorisvandenbossche jorisvandenbossche changed the title NumPy 2 updates [Python] NumPy 2 ABI changes (elsize) Mar 6, 2024
@jorisvandenbossche
Copy link
Member

We do have nightly wheels building with numpy 2.0, so once your PR lands on the numpy side and we can do those fixes here, we can rather quickly have pyarrow nightly wheels that should work.

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Mar 8, 2024
…escr->elsize

Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
jorisvandenbossche added a commit that referenced this issue Mar 13, 2024
…elsize (#40418)

### Rationale for this change

NumPy 2.0 is changing some ABI, see the issue description and numpy/numpy#25946 for more details. 

The changes here should make our code compatible both with current numpy 1.x and future numpy 2.x

* GitHub Issue: #40376

Lead-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 16.0.0 milestone Mar 13, 2024
@jorisvandenbossche
Copy link
Member

Issue resolved by pull request 40418
#40418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants