Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bits counting #52

Closed
serhiy-storchaka opened this issue Aug 30, 2024 · 4 comments
Closed

Bits counting #52

serhiy-storchaka opened this issue Aug 30, 2024 · 4 comments

Comments

@serhiy-storchaka
Copy link

serhiy-storchaka commented Aug 30, 2024

Previous attempt to add a C API for int.bit_length() failed because in most cases the user can use PyLong_AsNativeBytes() (see capi-workgroup/decisions#28). I still think that there are use cases for exact bits counting, so we may add such API in future. There are also other private C API functions that work with bits counts and may be exposed to public in future.

The main problem to me with this API was that it returns the bits count as size_t or Py_ssize_t. On 32-bit platform it is not difficult to create an integer or even several integers whose bits count does not fit in size_t or Py_ssize_t (This is only 0.5 or 0.25 GB), so this is a practical issue. I recently solved this by making these functions always using 64-bit integers, even on 32-bit platforms. And we can solve it on 64-bit platforms by forbidding creation of Python integers whose bits count does not fit in 64 bits (see https://discuss.python.org/t/imposing-a-hard-limit-on-python-integers/62302/10). There is no much practical difference between $2^{63}$, $2^{64}$ and $2^{66}$ bits -- all limits are insanely large and not achievable on practice. This makes _PyLong_NumBits() and _PyLong_Frexp() always successful.

There is still one implementation question: should we use signed or unsigned 64-bit integer type for bits count? It was a very practical question for 32-bit counts, but for 64-bit counts both $2^{63}$ and $2^{64}$ are very large, so no need to squeeze a bit. On other hand, the bits count is always non-negative, there is no use case for negative values.

This will affect private C API functions _PyLong_NumBits(), _PyLong_Frexp(), _PyLong_Rshift() and _PyLong_Lshift().

Implementation for unsigned count: python/cpython#123498.

@encukou
Copy link
Contributor

encukou commented Aug 30, 2024

My vote goes to signed, so the sign bit can serve its usual purpose as an error indicator.

@serhiy-storchaka
Copy link
Author

serhiy-storchaka commented Sep 5, 2024

We can reserve (uint64_t)-1 for error indication. Actually, I already reserved it to avoid integer overflow in _PyLong_Frexp().

Implementation for signed count: python/cpython#123724

It is slightly more complex than the unsigned variant, because most of existing code already use uint64_t. If we even made _PyLong_Rshift() public, we will need to replace assert(shiftby >= 0) with a runtime check.

We can also use INT64_MAX instead of UINT64_MAX as the limit with the uint64_t type. Then the code can use uint64_t or int64_t without integer overflow.

@vstinner
Copy link
Contributor

vstinner commented Sep 6, 2024

Using signed int64_t would be consistent with signed Py_ssize_t used basically everywhere in the Python C API.

I don't think that INT64_MAX bits is too small: such int number should be almost impossible to create or use in practice on nowadays computers.

@serhiy-storchaka
Copy link
Author

Merged the signed version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants