Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce replacement API for PyIter_Next with a non-ambiguous return value #21

Open
erlend-aasland opened this issue Apr 10, 2024 · 16 comments

Comments

@erlend-aasland
Copy link

Previous discussion:

I would like to pursue this issue. IMO, PyIter_Next having an ambiguous return value is a good enough motivation for adding a replacement API. If you disagree, feel free to just close this issue. AFAICS, three APIs have been discussed; I'm in favour of Mark's proposal (item 3):

1: PyObject *PyIter_NextItem(PyObject* iter, int *err)

Return values: a NULL pointer (loop termination or error) or a PyObject pointer (item returned)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *item;
int err;
while (item = PyIter_NextItem(iterator, &err)) {
    /* do something with item */
    ...
    /* release reference when done */
    Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
    /* error */
}
else {
    /* no error */
}

2: int PyIter_NextItem(PyObject* iter, PyObject **item)

Return values: -1 (error) or 0 (item returned or loop termination)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *item;
int err;
while ((err = PyIter_NextItem(iterator, &item)) == 0 && *item != NULL) {
  /* do something with item */
  ...
  /* release reference when done */
  Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
  /* error */
}
else {
  /* no error */
}

3: int PyIter_NextItem(PyObject *iter, PyObject **next)

Return values: -1 (error), 0 (loop terminated), or 1 (item returned)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *next;
int rc;
while ((rc = PyIter_NextItem(iterator, &next)) > 0) {
    use(next);
    Py_DECREF(next);
}
if (rc < 0) {
    /* cleanup */
    return -1;
}
/* done */
@gvanrossum
Copy link

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

@erlend-aasland
Copy link
Author

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

I'll quote @markshannon in capi-workgroup/problems#1 (comment):

  • Why is it bad? It is global state, with all the problems that entails
  • Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.
  • Does it prevent optimizations? Yes, for both of the above reasons.

@erlend-aasland
Copy link
Author

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

@gvanrossum
Copy link

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Well, it saves one call to PyErr_Occurred(), so it should be a little bit faster. But it's still a call to PyErr_Occurred(), which still has all the disadvantages Mark mentions.

@erlend-aasland
Copy link
Author

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Ah, yes that is true. I guess it boils down to a cosmetic issue, for now. I'll let the WG decide if a cleaner API is worth it.

@gvanrossum
Copy link

My personal view is that we have bigger fish to fry.

@encukou
Copy link
Collaborator

encukou commented Apr 11, 2024

+1 for the third one, it matches where we've been going so far and I don't see a reason to change course. But there are some details that make this fish not worth frying right now, like:

  • on -1/0 return, should *next be set to NULL or left as it was?
  • can next itself be NULL (to advance the iterator without getting the value)?

@erlend-aasland
Copy link
Author

  • on -1/0 return, should *next be set to NULL or left as it was?

FWIW: yes, that would make for a saner public API.

  • can next itself be NULL (to advance the iterator without getting the value)?

What's the use-case? I could not find such a use-case in the code base.

@encukou
Copy link
Collaborator

encukou commented Apr 23, 2024

What's the use-case?

Consistency in whether NULL can be passed to an output parameter.

(Deciding whether we want that -- and how to treat output parameters in general -- is one of the proverbial bigger fish to fry.)

@gvanrossum
Copy link

I feel it's better not to allow setting an output parameter's address to NULL. It's a rare case, and it slows down the API function (it has to check whether the pointer is NULL) and may bulk it up (with a DECREF, assuming the output value is produced anyways).

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

@vstinner
Copy link

I like this API: int PyIter_NextItem(PyObject *iter, PyObject **item) (I renamed next to item):

  • Set an exception, set *item to NULL, and return -1 on error.
  • Set *item to NULL, and return 0 on StopIteration.
  • Set *item to a strong reference to the iterator item, and return 1 on success.

iter and item cannot be NULL.

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

I agree 100% :-) We already did that in other recently added functions such as PyDict_GetItemRef().

@vstinner
Copy link

IMO it's worth it to propose a better API for PyIter_Next() which has an ambiguous return value (require to check for PyErr_Occurred()).

@erlend-aasland @gvanrossum @zooba: What do you think of the proposed API #21 (comment) ? I understand that @encukou likes it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

@zooba
Copy link

zooba commented May 31, 2024

I like it. It makes this pattern work nicely (due to both errors and valid results coercing to true):

PyObject *iter = PyObject_GetIter(o);
if (!iter) goto error;

PyObject *item = NULL;
while (PyIter_NextItem(iter, &item)) {
    if (!item) goto error;
    // work on item
    Py_DECREF(item);
}
Py_DECREF(iter);

@erlend-aasland
Copy link
Author

What do you think of the proposed API #21 (comment) ?

I like it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

Yeah; it is unfortunate, but IMO it is not a blocker for providing a better PyIter_Next API.

@vstinner
Copy link

vstinner commented Jun 10, 2024

Ok, let's do a formal vote on the API #21 (comment):

@vstinner
Copy link

See also: capi-workgroup/problems#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants