Ambiguous return values #1

iritkatriel · 2023-05-09T14:38:23Z

Some C API functions can return a value which may or may not indicate an error, and require disambiguation with PyErr_Occurred(). They should be replaced by new versions with unambiguous return values.

Ambiguous return values

PyCapsule_GetDestructor
PyCapsule_GetContext
PyCapsule_GetName

PyDict_GetItemWithError

PyFloat_AsDouble
PyFloat_Unpack2
PyFloat_Unpack4
PyFloat_Unpack8

PyIter_Next

PyLong_AsLong
PyLong_AsLongAndOverflow
PyLong_AsLongLong
PyLong_AsLongLongAndOverflow
PyLong_AsSsize_t
PyLong_AsUnsignedLong
PyLong_AsSize_t
PyLong_AsUnsignedLongLong
PyLong_AsUnsignedLongMask
PyLong_AsDouble
PyLong_AsVoidPtr

PyMarshal_ReadLongFromFile
PyMarshal_ReadShortFromFile

PyNumber_AsSsize_t

PyUnicode_Compare

no return value, but can set an exception that can only be checked with PyErr_Occurred:

PyMarshal_WriteObjectToFile

iritkatriel · 2023-05-31T10:29:54Z

PyUnicode_Compare is an interesting one. It has only 3 possible return values: -1, 0 and 1. And -1 is ambiguous.

iritkatriel · 2023-06-01T18:10:33Z

~~We will track work on this here: python/cpython#105201~~

encukou · 2023-06-05T08:01:44Z

Maybe ambiguous return values are OK themselves, but it's the way we disambiguate that needs improving?

Maybe we should look focus on PyErr_Occurred itself. Why is it bad? Is it slow? Does it not work well with alternative implementations? Does it prevent optimizations?

Why is an int *err argument better than a variable in the interpreter state?

erlend-aasland · 2023-06-05T08:04:21Z

Maybe we should look focus on PyErr_Occurred itself.

IMO that's a related, but slightly different issue. I'd consider creating a separate issue for those problems.

iritkatriel · 2023-06-05T09:24:20Z

It' not a different issue, it's this issue. And I'm sure we've already discussed it elsewhere.

Ambiguous return values force us to access globals state. This has two problems:

It forces all implementations of the API to maintain this global state, whether it makes sense for them or not.
The usual problem with global state: you can never be sure where the value came from. You think PyIter_Next set the error indicator, but it may have been something else in the last iteration of the loop.

encukou · 2023-06-05T09:53:06Z

It forces all implementations of the API to maintain this global state, whether it makes sense for them or not.

Trouble is, they need to do that anyway if PyErr_Occurred stays in the API. Which it needs to, unless we overhaul all of error handling. Ambiguous return values themselves aren't the real problem.

you can never be sure where the value came from

Yup. Users ideally need assert !PyErr_Occurred() (or save/restore it). But that's a relatively minor papercut, so, we run into a general problem with incremental improvement: if the next small step is too small relative to the disruption (deprecation cycles or proliferation of similar API), it doesn't make sense to do it (yet). Perfect is the enemy of good :(

iritkatriel · 2023-06-05T10:07:14Z

we run into a general problem with incremental improvement

We should have an issue for that, and think how we can avoid being stuck with every mistake forever in the new api.

iritkatriel · 2023-06-05T14:28:05Z

It forces all implementations of the API to maintain this global state, whether it makes sense for them or not.

Trouble is, they need to do that anyway if PyErr_Occurred stays in the API. Which it needs to, unless we overhaul all of error handling.

I don't see why. If all functions are explicit in their output on whether there was an error or not, then why should users need PyErr_Occurred() to be part of the API? Internally the interpreter can continue to have the error in globals state, but it doesn't have to expose that through the API.

iritkatriel · 2023-06-05T14:29:30Z

We could change the title of this issue to "PyErr_Occurred() should not be part of the API".

encukou · 2023-06-05T14:56:10Z

Because you need PyErr_Occurred() to get the exception.
PyErr_ExceptionMatches works with the same kind of global state. (Currently it's a very thin wrapper over PyErr_Occurred+PyErr_GivenExceptionMatches).

Removing PyErr_Occurred but keeping PyErr_ExceptionMatches would do nothing for alternative implementations.
Removing both would mean all functions would need to return (somehow) the exception object, not just an indicator that it was set. That would be the overhaul all our error handling.

gvanrossum · 2023-06-05T17:56:39Z

I don't think this issue is about alternative implementations or about the problems with maintaining global state. It is about the uncertainty that remains when an API requires calling PyErr_Occurred() to be able to tell apart two different "irregular" outcomes.

For example, with PyIter_Next(), to tell the difference between "end of sequence" and "something went wrong". In particular, the problem is that anything you do between observing the NULL return value (in this example) and calling PyErr_Occurred() may affect the global state returned by the latter, and cause the wrong conclusion to be drawn.

Redesigning these "ambiguous" APIs incrementally to avoid this trap feels like a worthwhile improvement, since it can be done on a shorter timescale than redesigning the entire global error state system. We just need to word the motivation carefully, in particular we want to avoid the requirement to call PyErr_Occurred() immediately after certain API calls return NULL or 0 or -1, not the requirement to maintain global state about exceptions.

(Separately, we might want to do an audit of code that calls PyErr_Clear() in order to be able to cleanly call PyErr_Occurred() later, and consider a save+restore operation rather than unconditional clearing of the global exception state.)

erlend-aasland · 2023-06-05T20:01:11Z

We could change the title of this issue to "PyErr_Occurred() should not be part of the API".

PyErr_Occurred() can be useful in extension modules that deal with callbacks from external libraries. In sqlite3, for example, we use PyErr_Occurred() to bail early from the "step" callback (IIRC) in case the previous callback failed.

gvanrossum · 2023-06-05T20:32:15Z

PyErr_Occurred() can be useful in extension modules that deal with callbacks from external libraries. In sqlite3, for example, we use PyErr_Occurred() to bail early from the "step" callback (IIRC) in case the previous callback failed.

I presume you are referring to collation_callback. I'm guessing that C code not under your control calls this function multiple times without checking for an error response (since that sqlite API doesn't seem to be designed to have an error response).

That makes sense -- and yeah, if we're solving hypothetical problems, if the interpreter did not manage global error state at all (or at least did not provide access to it) you could solve it differently (e.g. put a flag in the callback_context), but since the interpreter does maintain such state, and PyErr_Occurred() does exist, (and it would be a lot of effort to remove them), it's practical to just use it.

To me, the problem we're trying to fix here is APIs that require using PyErr_Occurred() to distinguish between a valid "special" result and an outright error.

erlend-aasland · 2023-06-05T20:39:23Z

[...] if the interpreter did not manage global error state at all (or at least did not provide access to it) you could solve it differently (e.g. put a flag in the callback_context), but since the interpreter does maintain such state, and PyErr_Occurred() does exist, (and it would be a lot of effort to remove them), it's practical to just use it.

Exactly.

To me, the problem we're trying to fix here is APIs that require using PyErr_Occurred() to distinguish between a valid "special" result and an outright error.

Yep.

erlend-aasland · 2023-06-22T10:27:36Z

See also python/devguide#1121

serhiy-storchaka · 2023-06-22T10:33:33Z

I do not think that there is a serious problem urging us to to fix it. In most of these cases the value used to indicate error is an uncommon, unexpected result, so it usually does not have significant performance impact.

PyUnicode_Compare could return -2 on error, like PyUnicode_Find.

PyMarshal_* functions are legacy API, they are almost not used. It is not worth to touch them.

Functions like PyDict_GetItemWithError and PyIter_Next could return a special value (e.g. a pointer to a singleton or (PyObject*)-1) for absent value or for error. The question is what chose as an error indicator and what as a special case indicator. In any case this is incompatible with existing functions, so needs introducing new functions.

An alternate approach is used in _PyObject_LookupAttr. It returns one of three values (error, absent, exists) and requires providing an address of variable to save the result. In theory the disadvantage of this is that the compiler no longer able to use only register for the result and eliminate the stack variable.

vstinner · 2023-06-23T00:49:56Z

I do not think that there is a serious problem urging us to to fix it.

I don't think that this project only lists issues which MUST be fixed immediately. It's also about issues which should be avoided when designing new APIs.

markshannon · 2023-06-23T10:28:59Z

Maybe we should look focus on PyErr_Occurred itself. Why is it bad? Is it slow? Does it not work well with alternative implementations? Does it prevent optimizations?

Why is it bad? It is global state, with all the problems that entails
Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.
Does it prevent optimizations? Yes, for both of the above reasons.

markshannon · 2023-06-23T10:34:43Z

PyUnicode_Compare could return -2 on error

Consistency is vital.
Everything should return -1 for an error, and it makes sense for comparison functions to potentially return unordered, even if string comparisons are always ordered.

vstinner · 2023-06-23T14:21:37Z

Consistency is vital. Everything should return -1 for an error

The Python C API has a big pile of legacy APIs. Many functions with known issues are kept for "backward compatibility": remove them is not worth it. Are you suggesting to quicky shift towards a "sane only" API by eaggerly remove the legacy APIs?

I propose to design a new API which would be smaller and with no known issues: issue #54.

erlend-aasland · 2023-06-24T08:38:12Z

See python/devguide#1123 for proposed guidelines.

encukou · 2023-07-03T07:53:04Z

A 2016 bug for this: python/cpython#70963

encukou · 2023-10-12T14:52:37Z

Guideline proposal: capi-workgroup/api-evolution#13

vstinner · 2024-06-12T07:56:40Z

I propose new PyLong API which returns 0 on success and -1 on error: python/cpython#120389

serhiy-storchaka · 2024-06-12T08:34:18Z

For converters from Python object there is another convention, used with the O& format unit in PyArg_Parse: returning 0 on failure and non-zero on success (usually 1, but can also be Py_CLEANUP_SUPPORTED for some converters).

serhiy-storchaka · 2024-06-12T08:44:28Z

In addition to standard converters in the public C API (PyUnicode_FSConverter, PyUnicode_FSDecoder), there is a large number of private converters (_PyLong_UnsignedShort_Converter, _Py_convert_optional_to_ssize_t, _PyEval_SliceIndex, _PyLong_FileDescriptor_Converter, _PyUnicode_WideCharString_Converter, etc).

iritkatriel added stakeholder: everyone fixable theme: error handling labels May 10, 2023

davidhewitt mentioned this issue May 17, 2023

C API functions should never be called when the error indicator is set #2

Open

iritkatriel mentioned this issue Jun 1, 2023

PyIter_Next has ambiguous return value python/cpython#105201

Open

iritkatriel added stakeholder: alternative implementations impacts implementations of python other than cpython stakeholder: extension authors and removed stakeholder: everyone labels Jun 5, 2023

erlend-aasland mentioned this issue Jun 19, 2023

C API: Add PyWeakref_GetRef() function python/cpython#105927

Closed

vstinner mentioned this issue Jun 23, 2023

C API: Add PyDict_GetItemRef() function python/cpython#106004

Closed

iritkatriel added the v label Jul 18, 2023

erlend-aasland mentioned this issue Aug 24, 2023

gh-108444: Add PyLong_AsInt() public function python/cpython#108445

Merged

encukou added the evolution-proposed label Oct 12, 2023

vstinner mentioned this issue Oct 13, 2023

Return value conventions capi-workgroup/api-evolution#13

Open

iritkatriel removed the v label Oct 23, 2023

encukou mentioned this issue Dec 4, 2023

Mandatory error handling capi-workgroup/api-evolution#5

Open

erlend-aasland mentioned this issue Apr 10, 2024

Introduce replacement API for PyIter_Next with a non-ambiguous return value capi-workgroup/decisions#21

Closed

vstinner mentioned this issue Jun 12, 2024

[C API] Add PyLong_FromInt64() and PyLong_ToInt64() python/cpython#120389

Open

vstinner mentioned this issue Jun 24, 2024

Add PyLong_FromInt64() and PyLong_ToInt64() capi-workgroup/decisions#32

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguous return values #1

Ambiguous return values #1

iritkatriel commented May 9, 2023 •

edited

Loading

iritkatriel commented May 31, 2023

iritkatriel commented Jun 1, 2023 •

edited

Loading

encukou commented Jun 5, 2023

erlend-aasland commented Jun 5, 2023

iritkatriel commented Jun 5, 2023

encukou commented Jun 5, 2023

iritkatriel commented Jun 5, 2023

iritkatriel commented Jun 5, 2023

iritkatriel commented Jun 5, 2023

encukou commented Jun 5, 2023

gvanrossum commented Jun 5, 2023

erlend-aasland commented Jun 5, 2023

gvanrossum commented Jun 5, 2023

erlend-aasland commented Jun 5, 2023

erlend-aasland commented Jun 22, 2023

serhiy-storchaka commented Jun 22, 2023

vstinner commented Jun 23, 2023

markshannon commented Jun 23, 2023

markshannon commented Jun 23, 2023

vstinner commented Jun 23, 2023

erlend-aasland commented Jun 24, 2023

encukou commented Jul 3, 2023

encukou commented Oct 12, 2023

vstinner commented Jun 12, 2024 •

edited

Loading

serhiy-storchaka commented Jun 12, 2024

serhiy-storchaka commented Jun 12, 2024

Ambiguous return values #1

Ambiguous return values #1

Comments

iritkatriel commented May 9, 2023 • edited Loading