Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to non-deprecated NumPy C API #2498

Closed
rgommers opened this issue Jul 15, 2018 · 35 comments
Closed

Change to non-deprecated NumPy C API #2498

rgommers opened this issue Jul 15, 2018 · 35 comments

Comments

@rgommers
Copy link
Contributor

This is an issue to summarize the current status of NumPy C API support and what can/should be done to update Cython. (cc @mattip, this is what I was talking about yesterday - could be a useful thing to work on).

Currently Cython uses a NumPy API that has been deprecated since NumPy 1.6.0 in 2013.

The Cython docs say here: "For the time being, it is just a warning that you can ignore.".

It is about time to update this, it will save users a lot of confusion and shorten the build logs of many projects by hundreds of lines.

@nouiz wrote a summary of how to support both old (<1.6.0) and new API:

I would suggest that it's either no longer needed to support the old <1.6 API, or make that opt-in rather than the default (to avoid all the build warnings while remaining compatible with very old NumPy versions). Would be good to get the opinion of the Cython team on this before starting work on this.

Here is an example of changing to the new API and avoiding the build warnings for scipy.optimize: https://github.com/scipy/scipy/pull/4351/files. Linking because the use of numpy_nodepr_api there (the defines and passing them to distutils) may be useful to other projects; it's a bit nontrivial to get right.

@scoder
Copy link
Contributor

scoder commented Jul 16, 2018

Thanks for bringing this up and collecting the sources.
It's not entirely trivial to migrate, because we are currently exposing the array attributes directly to user code in numpy.pxd. Fixing Cython's own code is probably easy, but avoiding to break existing user code requires some thought.

@rgommers
Copy link
Contributor Author

It's not entirely trivial to migrate, because we are currently exposing the array attributes directly to user code in numpy.pxd. Fixing Cython's own code is probably easy, but avoiding to break existing user code requires some thought.

Good point. Probably best to respect NumPy's deprecation mechanism, not exposing array attributes only if

#if defined(NPY_NO_DEPRECATED_API) && NPY_NO_DEPRECATED_API >= NPY_1_7_API_VERSION

@jbrockmendel
Copy link
Contributor

I spent some time editing the numpy.pxd file to avoid these warnings, found the main blocker was PyArray_BASE because it is commented out with a note that it has the wrong refcount semantics. If that could be fixed (possibly also PyArray_DESCR, ill have to double check), then I think the pxd file can be made recent-numpy-friendly.

The refcount semantics are above my pay grade though, no idea how to make that viable.

@rgommers
Copy link
Contributor Author

Hmm, that doesn't ring a bell. Let's appeal to the institutional memory: @charris, @njsmith any ideas?

@njsmith
Copy link
Contributor

njsmith commented Jul 26, 2018

No idea about the refcount thing – I'd check git annotate to see who wrote it?

In general unfortunately the the API deprecations weren't really thought through as part of a comprehensive plan, so there may well be some idiosyncrasies. I'm not really sure where we're going with them either, or how much benefit they provide.

For .base in particular it seems pretty unusual that anyone would need to access it so quickly that they can't afford to go through python...

@charris
Copy link

charris commented Jul 26, 2018

Doesn't ring a bell here. A date from git blame might help. I don't know how complete the NumPy version is, it is internal to numpy.random. The new NPY_ARRAY_WRITEBACKIFCOPY probably needs to be in Cython, however.

@charris
Copy link

charris commented Jul 26, 2018

I'm not really sure where we're going with them either, or how much benefit they provide.

I think they are stalled out. We should probably make a decision if we are going to make further attempts to hide the internals, or just give up on the whole thing. Backwards compatibility is a real problem here, and I don't think f2py is completely compliant, it does some content swapping on ndarrays for which we have no interface. The functions are nice for reasons other than hiding, type checking, for instance, but they are far from complete.

@charris
Copy link

charris commented Jul 26, 2018

The only thing I can think of regarding the base is that there were several changes to the chain of references to work around chain length limitations. Or so ISTR.

@jbrockmendel
Copy link
Contributor

Now that I have a real keyboard:

The place saying there is an issue with refcount semantics: https://github.com/cython/cython/blob/master/Cython/Includes/numpy/__init__.pxd#L399

The place that currently accesses .base: https://github.com/cython/cython/blob/master/Cython/Includes/numpy/__init__.pxd#L991

If get_array_base and set_array_base are really needed (I haven't seen them used, so no idea), and PyArray_BASE remains off-limits, then I think we're SOL.

@charris
Copy link

charris commented Jul 27, 2018

I don't see either of get_array_base or set_array_base in NumPy, so I assume that they are somewhere in Cython. PyArray_BASE should work to get its value but I have no idea what the refcount semantics remark refers to, especially as it refers to PyArray_DESCR as well. I suppose if one wants to replace either one, and DECREF the old, that would be a problem.

Numpy itself does a lot of direct access to the structure, because the functions cannot be used as lvalues. Basically, we provide no way to set the structure values outside of NumPy. Maybe that needs to be looked at.

@mattip
Copy link
Contributor

mattip commented Jul 27, 2018

I wonder why cython provides that interface, since inside NumPy they either share the data pointer or should be using writeback semantics. The implementation of {g,s}et_array_base follows neither of those conventions. I hope there is no code in the wild that uses these functions.

@njsmith
Copy link
Contributor

njsmith commented Jul 27, 2018

If you need to mutate the array base in place (and I'm not sure that anyone does), then you're supposed to use PyArray_SetBaseObject rather than modifying it directly.

@jbrockmendel
Copy link
Contributor

@scoder: any idea which parts of the numpy pxd file are load-bearing?

@ numpy people: If the 1.7 deprecation is no longer popular, can a lot of this be solved by just disabling the warning?

@rgommers
Copy link
Contributor Author

@ numpy people: If the 1.7 deprecation is no longer popular, can a lot of this be solved by just disabling the warning?

This is probably not the best place to discuss that, but: generally we do prefer that people use the new over the old API right? Doesn't mean we have to ever turn the deprecation into an error. Aside from this Cython issue I haven't heard many complaints.

@jbrockmendel
Copy link
Contributor

This is probably not the best place to discuss that, but: generally we do prefer that people use the new over the old API right?

I'll be happy to move the conversation elsewhere, have a place in mind? Definitely don't want to hijack the thread.

Aside from this Cython issue I haven't heard many complaints.

This is probably right, but Google tells me there are many people who are confused by these warnings. In pandas we want to silence them largely because they flood the build log and make it easy to miss unrelated warnings.

Seems like the best case is for this to be updated in cython-land. Until that becomes a reality, please consider a mechanism to let downstream authors silence these warnings.

@rgommers
Copy link
Contributor Author

rgommers commented Aug 1, 2018

(sorry for the slow reply)

I'll be happy to move the conversation elsewhere, have a place in mind?

The numpy-discussion mailing list is probably best.

Seems like the best case is for this to be updated in cython-land. Until that becomes a reality, please consider a mechanism to let downstream authors silence these warnings.

The next opportunity for that is NumPy 1.16.0, which is a ways away. If we cannot solve the remaining Cython issue before the 1.16 release, then adding a new environment variable that silences the warning would not be unreasonable. I'll open a NumPy issue for that now (EDIT: done in numpy/numpy#11653)

@mattip
Copy link
Contributor

mattip commented Aug 10, 2018

@jbrockmendel the change to {sg}et_array_base has been merged. Could you try again to update the API version for cython?

@jbrockmendel
Copy link
Contributor

Could you try again to update the API version for cython?

I'm not clear on what you're asking for here. Do you mean see if the NPY_NO_DEPRECATED_API warnings can be gotten rid of?

@mattip
Copy link
Contributor

mattip commented Aug 10, 2018

You commented above

I spent some time editing the numpy.pxd file to avoid these warnings ...

Could you try that again on latest head?

@jbrockmendel
Copy link
Contributor

Could you try that again on latest head?

OK, I've made some progress. I'll make a dummy PR to show the relevant edits and describe what still breaks.

@mattip
Copy link
Contributor

mattip commented Aug 21, 2018

I put some work into this. I did not make it into a PR yet, the work is on a branch in my fork because I am really unsure about this. I made both pandas and scipy work with this branch of cython. Here are way too many of my random thoughts about the process, with some questions at the end.

Cython

getter ctypedef class attribute functions (probably a bad idea)

What to do with python-syntax (or cython-syntax) code like a.ndims? Currently it is compiled to a.nd in C, but now needs to become PyArray_NDIMS(__pyx_v_a). I hacked the syntax feature that turns ndims into "nd" and make it into a getter function. So the "nd" c-level attribute access

ctypedef class numpy.ndarray [object PyArrayObject]:
        cdef __cythonbufferdefaults__ = {"mode": "strided"}
        cdef:
            # Convert python __getattr__ access to c -> access.
            int ndims "nd"

now becomes this PyArray_NDIMS(self) getter function

ctypedef class numpy.ndarray [object PyArrayObject]:
        cdef __cythonbufferdefaults__ = {"mode": "strided"}
        cdef:
            # Convert python __getattr__ access to c functions.
            int ndims PyArray_NDIMS

I added a new option for getter functions inside ctypedef class, to support this. If it is a desired addition, I can flesh this out into a full PR with error checking and a test or two.

I wanted to document this new syntax extension, but could not find documentation for specifying cname attributes in ctypedef classes. There are examples of ctypedef use here and here but neither use the attribute renaming like where python ndims becomes c nd. Where should/does this exist in the documentation?

Changed defines like NPY_WRITEABLE -> NPY_ARRAY_WRITEABLE

I could not find a way to gracefully warn rather than simply error and did not feel like creating another questionable cython hack without asking first. It would be nice to have a way to mark the old ones as enum_deprecate or so, to provide a path for upgrading without erroring

Pandas

Pandas has fast routines for reductions and windowing functions that require the old API (direct access to the data pointer as a LHS value). I simply degraded the API version to the old one. Compiled code still uses the newer getter syntax. Here is an example of cython code and how it is compiled to C:

 /* "pandas/_libs/reduction.pyx":115
 *         chunk = self.dummy
 *         dummy_buf = chunk.data
 *         chunk.data = arr.data             # <<<<<<<<<<<<<<
 *         labels = self.labels
 *         has_labels = labels is not None
 */
  __pyx_t_2 = PyArray_BYTES(__pyx_v_arr);
  PyArray_BYTES(__pyx_v_chunk) = __pyx_t_2;

In pre-1.7 API PyArray_BYTES is a macro not a function and so can be on the LHS.

There was no use of the deprecated enums in cython code, so the changes were all to simply allow the old API.

Scipy

It was tedious but relatively painless to convert scipy to use the newer APIs. I found no places where code could not be converted. But after a few hours of sed replacement of var->data with PyArray_DATA(var)and such I began to ask more existential questions about code churn and why we need this.

Conclusion

I am not sure how I feel about all of this. Adopting the new API in cython will cause a lot of churn for downstream users. On the one hand this is water under the bridge, as @rgommers said the "new" API was released with NumPy 1.7 in Feb 2013. On the other hand, it is hard to see how the changes

  • enums, which replace NPY with NPY_ARRAY
  • attribute hiding of PyArrayObject attributes behind getter functions
    contribute to the advancement of NumPy as stated in its scope as an "in-memory, N-dimensional, homogeneously typed (single pointer + strided) array".

I am left with more questions than answers:

  • Is there interest in the approach to getter functions for attributes in ctypedef classes in cython?
  • Is there interest in a cython enum_deprecated extension to warn of renamed enums (I assume it is possible, but is it a good thing)?
  • Any opinions, as language designers and experienced developers, about the advantages of hiding struct attributes behind getter functions, and how necessary it is?
  • How annoying is it to put an enormous text block as the 21st comment on an issue tracker?
    This is not the correct forum for the discussion of the future of the NumPy API but I wanted to put this out there.

@robertwb
Copy link
Contributor

robertwb commented Sep 1, 2018

Cython itself should avoid all use of the deprecated API, so thanks for #2559. I am also in favor of a mapping of attributes to function calls, there is precedence for it with Python's @property and it would save downstream users from a huge amount of (uglyifying, in my opinion) churn.

@eric-wieser
Copy link

eric-wieser commented Oct 31, 2018

@njsmith:

If you need to mutate the array base in place ... then you're supposed to use PyArray_SetBaseObject rather than modifying it directly.

Note that almost all the internal uses of this function in numpy were removed as a fix for numpy/numpy#11265

@mattip
Copy link
Contributor

mattip commented Sep 22, 2019

Once gh-3115 goes in, we can move tests from the numpy_old tag to the numpy tag which should eliminate the warnings in tests.

@scoder
Copy link
Contributor

scoder commented Mar 22, 2020

Is this done now, or is there still anything missing?

Cython doesn't currently set the NumPy deprecated API marker itself, so that's left to users. Is that ok, or can/should we do anything else to help them?

@mattip
Copy link
Contributor

mattip commented Mar 22, 2020

I think we can close this. Things like this PR in scipy can be finished once cython releases 3.0.

@matanox
Copy link

matanox commented Mar 12, 2024

So is the warning text still expected with the latest cython releases?

@rgommers
Copy link
Contributor Author

@matanox yes it is, but you can now explicitly ask NumPy to not expose deprecated API since Cython itself doesn't use it anymore. Like so:

https://github.com/scipy/scipy/blob/33e93e04b0ca65b4938e685b28fda9846d68f657/scipy/meson.build#L73-L77

@matanox
Copy link

matanox commented Mar 12, 2024

Thanks a lot for all the work on cython and this issue.

With cython 3.0.8 I got the original warning, and by adding '-DNPY_NO_DEPRECATED_API=NPY_1_9_API_VERSION' in my extra_compile_args in my setup.py, as mentioned in the source where you referred me to, indeed the warning no longer shows.

As I gather from gleaning the comment trail, this is left for users to specify as I have, due to some reasons of correctness which I couldn't completely gather myself.

Should we expect to have to change from 1.9 in there to something else in the short term?

@rgommers
Copy link
Contributor Author

Should we expect to have to change from 1.9 in there to something else in the short term?

No, you should be all set, it shouldn't reappear.

due to some reasons of correctness which I couldn't completely gather myself.

NumPy is warning here that those APIs are deprecated. It cannot just not expose them, since that would break users that actually do need the deprecated APIs (like Cython used to).

@matanox
Copy link

matanox commented Mar 12, 2024

Thanks!!! I was half-guessing something along those lines.

I guess in highly deprecation-lagging code projects, leaving this suppression optional enables one to write/maintain code using deprecated numpy api going backwards than 1.9 while using cython for compilation.

Since the warning comes from numpy (pre-compilation?) code then this warning supression is practically global to the compilation process and cython does not want to globally block deprecated numpy api for those who do use deprecated numpy api in their code.

Probably because numpy either exposes and allows deprecated api use with a warning, or blocks the use of deprecated api, and hence blocking old api breaks user code that expects it.

@jakirkham
Copy link
Contributor

Random question (and happy to raise this in a new issue if needed)

Thought NumPy's Cython API moved upstream, but just saw this file

cdef extern from "numpy/npy_math.h" nogil:

Should this file be dropped? Was it already added upstream in NumPy somewhere?

@rgommers
Copy link
Contributor Author

See gh-5967 for that.

@jakirkham
Copy link
Contributor

Oops thanks Ralf! 🙏

GH search failed me 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants