Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyLong_FromInt64() and PyLong_ToInt64() #32

Open
vstinner opened this issue Jun 24, 2024 · 39 comments
Open

Add PyLong_FromInt64() and PyLong_ToInt64() #32

vstinner opened this issue Jun 24, 2024 · 39 comments

Comments

@vstinner
Copy link

vstinner commented Jun 24, 2024

Add 8 new functions to the limited C API to convert C <stdint.h> numbers from/to Python int:

// Return a Python int on success.
// Set an exception and return NULL on error.
PyAPI_FUNC(PyObject*) PyLong_FromInt32(int32_t value);
PyAPI_FUNC(PyObject*) PyLong_FromUInt32(uint32_t value);
PyAPI_FUNC(PyObject*) PyLong_FromInt64(int64_t value);
PyAPI_FUNC(PyObject*) PyLong_FromUInt64(uint64_t value);

// Set *value and return 0 on success.
// Set an exception and return -1 on error.
PyAPI_FUNC(int) PyLong_AsInt32(PyObject *obj, int32_t *value);
PyAPI_FUNC(int) PyLong_AsUInt32(PyObject *obj, uint32_t *value);
PyAPI_FUNC(int) PyLong_AsInt64(PyObject *obj, int64_t *value);
PyAPI_FUNC(int) PyLong_AsUInt64(PyObject *obj, uint64_t *value);

UPDATE 1: I removed PyLong_FromInt32() and PyLong_FromUInt32() functions.

UPDATE 2: I renamed To functions to As functions.

UPDATE 3: I added back PyLong_FromInt32() and PyLong_FromUInt32() functions for 32-bit systems.

  • As functions call obj.__index__() if obj is not a Python int.
  • As functions converting to unsigned integer raises ValueError if value is negative
  • As functions raise OverflowError if the value is too big (doesn't fit into the C integer range)

Compared to PyLong_AsLong(), As function return value is status (success or error, 0 or -1) and the result is set in the value argument (on success). It avoids having to call PyErr_Occurred() to distinguish -1 (success) and -1 (error). See Problem #1 for details.

Links:

@zooba
Copy link

zooba commented Jun 24, 2024

There's nothing particularly controversial about this API, right? Other than the question of whether it should be "real" or macros?

I'm quite happy with it being a real API.

@serhiy-storchaka
Copy link

Since PyLong_To* functions have signature compatible with the converter signature used with the O& format unit in PyArg_Parse(), I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed. So these functions could be used with PyArg_Parse() (and other functions in this family).

Otherwise we can end with two sets of converters with opposite return conventions. It will be very errorprone.

@vstinner
Copy link
Author

I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed.

I don't think that this use case is common enough to not follow the C API Guidelines:
https://devguide.python.org/developer-workflow/c-api/index.html#guidelines-for-expanding-changing-the-public-api

For converters, I would prefer to add support for int32_t and uint64_t to Argument Clinic and reuse these functions.

@zooba
Copy link

zooba commented Jun 24, 2024

think that it would be better to make the return value compatible with the converter convention

It's tempting, for sure, but I think we're better to keep converters as the exception. We really want developers to have a single mental model for how to handle return values, and that unfortunately means converters need to be clearly labelled, and not just convenient functions.

Adding some form of sized version of i and u to PyArg_ParseTuple wouldn't be a bad step. That's our public API - Argument Clinic is still internal only, so we don't fix that until we need it, and it doesn't help public users at all when we do.

@encukou
Copy link
Collaborator

encukou commented Jun 25, 2024

Other than the question of whether it should be "real" or macros?

Thank you for humoring me on the macros thing; I think it's clear we want real functions :)

keep converters as the exception

Yes. We settled on using -1 (or NULL) for errors. Unfortunately some existing functions -- or even families of functions -- don't do that; we have to live with that.

Eventually, if we get to the point where O& sticks out too much, we might want to add a new PyArg_Parse format code. I don't think we're there yet.

@vstinner
Copy link
Author

@serhiy-storchaka has some concerns about calling obj.__index__() which can release the GIL indirectly:

Calling __index__ can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity of PyLong_AsUnsignedLong(), so calling __index__ in it is an unsafe change.

IMO it's good that signed and unsigned functions have the same behavior, that both call __index__(). Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call __index__(), whereas PyLong_AsUnsignedLong() don't call the method.

The GIL issue can be documented.

@vstinner
Copy link
Author

@serhiy-storchaka @mdboom: Do you have any additional comment about these APIs, or can I just open a vote?

@mdboom
Copy link

mdboom commented Jun 27, 2024

I have nothing to add that hasn't already been said.

@serhiy-storchaka
Copy link

My comment was about the existing C API like PyLong_AsUnsignedLong() and why it is not safe to change it.

We need a C API which respects special methods like __index__, and it should be default. On other hand, we need a C API to access the content of the object. This is a contradiction. Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions. I have an idea how to mitigate this problem, but it is for a different discussion. It does not stay on the way of this set of functions, because they all are just wrappers around PyLong_FromNativeBytes/PyLong_AsNativeBytes.

PyLong_FromInt32 and PyLong_FromUInt32 are redundant, because the integer argument can always be promoted to 64 bits for PyLong_FromInt64/PyLong_FromUInt64. We have both PyLong_FromLong and PyLong_FromLongLong because the latter was optional, not all compilers supported long long.

On other hand, the lack of PyLong_ToInt16 and PyLong_ToInt8 requires adding additional check after calling PyLong_ToInt32, and producing OverflowError with inconsistent error messages (depending on the input value) or writing more verbose code with PyLong_AsNativeBytes.

@zooba
Copy link

zooba commented Jun 27, 2024

Right, this is essentially the difference between the abstract and concrete APIs, a difference which has been eroding over the last few years.

If we think we can recover it, then we could argue the caller should PyLong_Check before using these functions, and use a more abstract function if they don't actually have an int already.

Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if PyLong_Check is true, these functions will not call back into Python code in order to convert to an integer.

I'm +0 on the latter, just out of pragmatism. But it saddens me to lose the clear abstract/concrete separation. (It will make those who want our API to be strongly typed sadder than me, though. I like it being largely abstract, I just think we ought to know which APIs are which.)

@vstinner
Copy link
Author

@serhiy-storchaka: It's uneasy for me to understand what do you propose.

Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions.

So are you ok with PyLong_ToInt64() calling __index__() or not? What do you suggest?

PyLong_FromInt32 and PyLong_FromUInt32 are redundant

Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.

On other hand, the lack of PyLong_ToInt16 and PyLong_ToInt8 requires adding additional check after calling PyLong_ToInt32 (...)

I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.

@vstinner
Copy link
Author

If we think we can recover it, then we could argue the caller should PyLong_Check before using these functions, and use a more abstract function if they don't actually have an int already.

I agree that PyLong_Check(), or even PyLong_CheckExact(), is a good way to check if an argument is an integer.

In general, users don't care much if it's exactly an integer or not (if the object type is Python int). For example, I'm used to pass a boolean when an integer is expected (bool inherits from int), even if a boolean is a different concept than an integer (to me at least).

@zooba
Copy link

zooba commented Jun 27, 2024

In general, users don't care much if it's exactly an integer or not (if the object type is Python int).

Right, but the case here is when the user cares a lot that the GIL is not released during their PyLong_ToInt## call. They're more concerned about that than the exact type of the value.

I suspect we could probably get away with PyArg_ParseTuple and friends doing the __index__ conversion and the specific converter functions just raising a TypeError.1 Surely the vast majority of not-ints appearing in an int-like context will be in function arguments or in our code (e.g. any GetItem implementations).

The three main questions for me are:

  • can users write correct code at all? (yes in both cases here)
  • is the simplest code going to be the most correct? (unclear here, but probably best served by removing transparent __index__ use)
  • can we explain the difference easily? (apparently not)

Footnotes

  1. I'd also be okay with removing the extra work from AsNativeBytes, since we haven't properly released that yet.

@erlend-aasland
Copy link

Why To and not As?

@serhiy-storchaka
Copy link

Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if PyLong_Check is true, these functions will not call back into Python code in order to convert to an integer.

Yes, this was my idea. It was discussed earlier, but I lose previous rounds (perhaps I was not persistent enough). It affects more than the C API, so it should be discussed separately. Where is the best place for this? We need not only the C API Workgroup.

So are you ok with PyLong_ToInt64() calling __index__() or not? What do you suggest?

I am OK with calling __index__() in a new C API. Details is a matter of a different discussion.

Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.

Do you have a use case for PyLong_FromInt32()? Wait, do you have a use case for PyLong_FromInt64()? long long has at least 64 bits, so PyLong_FromLongLong() can be used for converting from int64_t. It seems that we do not need new functions for this direction of conversion, except maybe PyLong_FromMaxInt().

I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.

Just keep in mind that there is a need for such features.

@cavokz
Copy link

cavokz commented Jun 28, 2024

@serhiy-storchaka has some concerns about calling obj.__index__() which can release the GIL indirectly:

Calling __index__ can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity of PyLong_AsUnsignedLong(), so calling __index__ in it is an unsafe change.

IMO it's good that signed and unsigned functions have the same behavior, that both call __index__(). Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call __index__(), whereas PyLong_AsUnsignedLong() don't call the method.

The GIL issue can be documented.

Is it possible to have it documented anyway? I've no clue of why calling __index__() should/could release the GIL and what kind of atomicity is lost. From a user prospective, I've never doubted that PyLong_AsUnsignedLong() and any of its friends may release the GIL although, hopefully, just temporarily.

@zooba
Copy link

zooba commented Jun 28, 2024

Where is the best place for this? We need not only the C API Workgroup.

Either the api-evolution repository would be good, if we want to develop a "rule", though you'd have to announce it on a relevant issue to get people's attention. Or I'm sure we can find a bug related to implicit __index__ calls releasing the GIL and causing a crash (we had one recently where PySys_Audit would do it, and there's one about bytearray somewhere, too). Probably since there's multiple bugs, we'd want to discuss a proposal in api-evolution on how to approach it (i.e. 1. don't do it, 2. document it and how to avoid it, 3. it's okay and users should use magic to be thread safe 🙃 )

Do you have a use case for PyLong_FromInt32()? Wait, do you have a use case for PyLong_FromInt64()?

Convenience, discovery, and pairing with the other API is a totally fine reason. (It may not be justified every time, but it's an okay reason.) We're looking at API development on a long scale (10+ years), and the goal is to have the right people (those who will benefit from it) using the right API in the right way. Making it easy to discover which APIs you should be using (and making it hard to discover the ones you shouldn't be using) is a big part of this, and the final decision/guess on how to do that is the WG's responsibility.

Is it possible to have it documented anyway?

Yes, documentation is cheap, at least for the APIs that already do this and won't be changing. It's worth discussing for any new APIs though.

@vstinner
Copy link
Author

@erlend-aasland:

Why To and not As?

As functions such as PyLong_AsLong() return the value directly, but can also return -1 on error and so PyErr_Occurred() should be checked.

long value = PyLong_AsLong(obj);
if (value == -1 && PyErr_Occurred()) { /* error */ }

vs

int64_t value;
if (PyLong_ToInt64(obj, &value) < 0) { /* error */ }

@vstinner
Copy link
Author

@serhiy-storchaka:

Wait, do you have a use case for PyLong_FromInt64()?

My initial motivation to propose these new <stdint.h> functions was capi-workgroup/api-evolution#10 discussion: replace int/unsigned long with int32_t/uint64_t to have a better ABI. I would like <stdint.h> types to be as convenient to use with the Python C API as the classic int/long types.

@erlend-aasland
Copy link

As functions such as PyLong_AsLong() return the value directly, but can also return -1 on error and so PyErr_Occurred() should be checked.

Not all As APIs are designed like this; it's not a defining trait for As APIs.

The current To APIs often suggests a transformation (ToUppercase, ToLowercase, etc.); its a different kind of APIs.

@serhiy-storchaka
Copy link

Currently it looks like a legend about Newton and a cat. He made a hole in the door to allow the cat to enter or leave without taking him away from his studies. When she brought in 5 kittens, he made 5 smaller holes for the kittens.

All cases with int64_t, int32_t, int16_t, int8_t, long, short, and signed char can be handled by PyLong_FromLongLong(). No need to add PyLong_From64Int(), PyLong_From32Int(), PyLong_From16Int(), PyLong_From8Int(), PyLong_FromInt(), PyLong_FromShort(), and PyLong_FromSignedChar(). If you want to cover absolute all cases, add a function that takes intmax_t argument, it will add a new value.

@zooba
Copy link

zooba commented Jun 28, 2024

I'm not sure we'll ever again see another programming language that won't allow implicitly upcasting integers (like Ada), so yeah, there's probably not a lot gained. If intmax_t is also in stdtypes, then can we have FromInt()?

it's not a defining trait for As APIs.

We should probably figure out what is a defining trait. I have this vague idea that To APIs convert PyObject->PyObject and As APIs convert PyObject->native, but I can't back that up fully.

@encukou
Copy link
Collaborator

encukou commented Jun 28, 2024

If intmax_t is also in stdtypes, then can we have FromInt()?

The size of intmax_t depends on the compiler (which perhaps delegates the decision to one of various platform ABI specs).
AFAIK, the reason we're adding FromInt64 even though we already have FromLongLong is making it possible to avoid types like that.

@zooba
Copy link

zooba commented Jun 28, 2024

The size of intmax_t depends on the compiler

True, and that matters most for cross-language callers, who probably need to rely on the exact size.

So FromInt64 is justified. You could argue that FromInt32 is justified on 32-bit platforms, where conversion to 64-bit might be more expensive (though I'd be incredibly surprised if it was prohibitive), and perhaps that's all we need to justify both?

@vstinner
Copy link
Author

@serhiy-storchaka @zooba: Would you feel better if PyLong_ToInt64() and similar functions don't call the __index__() method?

@serhiy-storchaka
Copy link

I think that the new C API should calls __index__ when the input is not a Python integer. Otherwise the users of this API will produce flawed extensions that only work with int, not other integer-like objects. We should not create new inconsistencies. But it should not call __index__ when the input is a Python integer, so the user can use PyLong_Check() to check whether the call will be safe regarding the GIL.

If Rust needs PyLong_FromInt64(), well, let add it. But PyLong_FromInt32 is redundant, I do not believe the cost of upcasting 32-bit integer to 64 bits is measureable in comparison with memory managing and other complex operations.

I am going to add a full set of converters compatible with PyArg_Parse(), including converters to native integer types and 8- and 16-bit integers to replace existing private functions, so PyLong_ToInt64() as it is presented here will not be in great help to me. I only need to find good names for them.

@erlend-aasland
Copy link

PyLong_FromInt32 may be redundant, but it makes for code that is easier to reason about. I think the readability win is worth it. I definitely prefer a set of APIs where size is explicitly included in the name.

Regarding naming, I still think we use As, not To.

@zooba
Copy link

zooba commented Jul 1, 2024

Would you feel better if PyLong_ToInt64() and similar functions don't call the __index__() method?

Only very slightly. I'd feel better overall if it's clearly documented (and if those people working on documenting threading guarantees are made aware that int conversions are not thread-safe1).

Footnotes

  1. Well, strictly the conversion is safe, but the rest of your program could be collapsing around you because of it.

@vstinner
Copy link
Author

vstinner commented Jul 2, 2024

Updates:

  • I renamed To functions to As functions.
  • I added back PyLong_FromInt32() and PyLong_FromUInt32() functions for 32-bit systems.

@vstinner
Copy link
Author

vstinner commented Jul 2, 2024

It seems like there is an overall agreement on the proposed API. Let me check with a vote:

@serhiy-storchaka
Copy link

I am sorry, how to vote?

@vstinner
Copy link
Author

vstinner commented Jul 2, 2024

I am sorry, how to vote?

If you agree, check your checkbox. If you disagree, add a message :-)

@serhiy-storchaka
Copy link

It seems that checking the checkbox does not have permanent effect. In any case I disagree.

  • PyLong_FromInt32() and PyLong_FromUInt32() are not needed.
  • The As function creates conflicts with already existing PyLong_As*() functions.

I am also not happy with incompatibility with PyArg_Parse , but I can live with this. We just need to add more new functions.

@zooba
Copy link

zooba commented Jul 2, 2024

One of the admins here needs to add Serhiy (and Mike) as an org member.

  • The As function creates conflicts with already existing PyLong_As*() functions.

What conflict? We don't have any fixed size As functions yet. (Perhaps you mean inconsistency in the API design?)

I am also not happy with incompatibility with PyArg_Parse

I missed the incompatibility. Last I saw you proposed to also add new converters for fixed-width ints? That's great, I'll be happy to see it, but it doesn't change anything about these APIs.

@serhiy-storchaka
Copy link

What conflict? We don't have any fixed size As functions yet. (Perhaps you mean inconsistency in the API design?)

All PyLong_As* functions have similar design. They take a Python integer object (or integer-like object) and return a C integer type. New functions have completely different design. It will be confusing. There is a reason to use similar style in naming similar functions.

Last I saw you proposed to also add new converters for fixed-width ints? That's great, I'll be happy to see it, but it doesn't change anything about these APIs.

It is not great to have two functions which do the same, but one of them returns 0 on success and other returns 0 on failure.

@zooba
Copy link

zooba commented Jul 2, 2024

There is a reason to use similar style in naming similar functions.

This is a valid point. So I guess one purpose of the vote is to decide whether we as a whole think that consistency (with APIs that we consider a bad design) outweighs introducing the preferred pattern.

I would be happy either way, but only between these two options.

It is not great to have two functions which do the same, but one of them returns 0 on success and other returns 0 on failure.

Converters are the exception here. If we could design them again, they'd be made consistent.

But these new functions are not intended to be converters, so they don't have to follow that pattern. We can insist on documenting that while they may look like converters, they're not suitable, but I suspect that pattern is obscure enough that it won't come up. Especially if we were to add format characters to PyArg_ParseTuple (which is what I intended when I mentioned your earlier proposal).

@erlend-aasland
Copy link

Serhiy:

New functions have completely different design. It will be confusing. There is a reason to use similar style in naming similar functions.

You don't know that; you're of course free to say that you believe it will be confusing, though.

I don't believe it will be confusing; it should be sufficient to provide good documentation.

@vstinner
Copy link
Author

@encukou @mdboom: Do you have concerns about the proposed API? You didn't vote.

@encukou
Copy link
Collaborator

encukou commented Aug 1, 2024

Sorry for the delay, +1 from me.

The As/To naming is a detail I'll happily leave to the rest of the group. Unlike things like “stealing” or “borrowing”, which I always want reflected from the name, in this case:

  • The compiler will reliably complain if you get the signature wrong (and so you can check the docs, or an IDE tooltip).
  • If you find these calls in someone else's code, the behaviour should be quite clear (if the code compiles).

We should, however, stick with the precedent we establish here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants