Bits counting #52

serhiy-storchaka · 2024-08-30T10:15:32Z

Previous attempt to add a C API for int.bit_length() failed because in most cases the user can use PyLong_AsNativeBytes() (see capi-workgroup/decisions#28). I still think that there are use cases for exact bits counting, so we may add such API in future. There are also other private C API functions that work with bits counts and may be exposed to public in future.

The main problem to me with this API was that it returns the bits count as size_t or Py_ssize_t. On 32-bit platform it is not difficult to create an integer or even several integers whose bits count does not fit in size_t or Py_ssize_t (This is only 0.5 or 0.25 GB), so this is a practical issue. I recently solved this by making these functions always using 64-bit integers, even on 32-bit platforms. And we can solve it on 64-bit platforms by forbidding creation of Python integers whose bits count does not fit in 64 bits (see https://discuss.python.org/t/imposing-a-hard-limit-on-python-integers/62302/10). There is no much practical difference between $2^{63}$, $2^{64}$ and $2^{66}$ bits -- all limits are insanely large and not achievable on practice. This makes _PyLong_NumBits() and _PyLong_Frexp() always successful.

There is still one implementation question: should we use signed or unsigned 64-bit integer type for bits count? It was a very practical question for 32-bit counts, but for 64-bit counts both $2^{63}$ and $2^{64}$ are very large, so no need to squeeze a bit. On other hand, the bits count is always non-negative, there is no use case for negative values.

This will affect private C API functions _PyLong_NumBits(), _PyLong_Frexp(), _PyLong_Rshift() and _PyLong_Lshift().

Implementation for unsigned count: python/cpython#123498.

The text was updated successfully, but these errors were encountered:

encukou · 2024-08-30T11:55:04Z

My vote goes to signed, so the sign bit can serve its usual purpose as an error indicator.

serhiy-storchaka · 2024-09-05T10:53:27Z

We can reserve (uint64_t)-1 for error indication. Actually, I already reserved it to avoid integer overflow in _PyLong_Frexp().

Implementation for signed count: python/cpython#123724

It is slightly more complex than the unsigned variant, because most of existing code already use uint64_t. If we even made _PyLong_Rshift() public, we will need to replace assert(shiftby >= 0) with a runtime check.

We can also use INT64_MAX instead of UINT64_MAX as the limit with the uint64_t type. Then the code can use uint64_t or int64_t without integer overflow.

vstinner · 2024-09-06T14:54:45Z

Using signed int64_t would be consistent with signed Py_ssize_t used basically everywhere in the Python C API.

I don't think that INT64_MAX bits is too small: such int number should be almost impossible to create or use in practice on nowadays computers.

serhiy-storchaka · 2024-09-29T07:40:52Z

Merged the signed version.

serhiy-storchaka mentioned this issue Sep 5, 2024

gh-123497: New limit for Python integers on 64-bit platforms python/cpython#123498

Closed

serhiy-storchaka closed this as completed Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bits counting #52

Bits counting #52

serhiy-storchaka commented Aug 30, 2024 •

edited

Loading

encukou commented Aug 30, 2024

serhiy-storchaka commented Sep 5, 2024 •

edited

Loading

vstinner commented Sep 6, 2024

serhiy-storchaka commented Sep 29, 2024

Bits counting #52

Bits counting #52

Comments

serhiy-storchaka commented Aug 30, 2024 • edited Loading

encukou commented Aug 30, 2024

serhiy-storchaka commented Sep 5, 2024 • edited Loading

vstinner commented Sep 6, 2024

serhiy-storchaka commented Sep 29, 2024

serhiy-storchaka commented Aug 30, 2024 •

edited

Loading

serhiy-storchaka commented Sep 5, 2024 •

edited

Loading