gh-135487: fix `reprlib.Repr.repr_int` when given very large integers #135506

picnixz · 2025-06-14T10:46:19Z

Getting the number of digits may be wrong due to some round-off errors but I don't want to overcomplicate the code.

Issue: Using reprlib.repr fails for very large numbers #135487

serhiy-storchaka

I think that we can try to reproduce the result of old Python versions, before introducing the limitation on maximal digit count.

If the length exceeds maxlong, repr_int() returns the first i digits followed by fillvalue followed by the last j digits. I afraid that calculating the last j digits has the same complexity as calculating all digits. But for the first i digits it may be faster.

We could also use 10**math.floor(math.log(x)) to get first digits. There are several caveats. Firs, the precision of float is limited, and the larger x, the less significant digits we can get. But it still can be more than 10 digits in most practical cases. We can estimate the error and the number of reliable digits. Second, even with best effort, it can give wrong even the first digit -- it is difficult to differentiate 9999... from 10000.... In this case we should calculate the first i digits, multiply them by 100**(k-i), and then calculate the correction. This can be repeated iteratively, so we can get arbitrary number of first digits.

cc @tim-one

Doc/library/reprlib.rst

serhiy-storchaka · 2025-06-15T05:47:04Z

Lib/reprlib.py

+            # When math.log10(abs(x)) is overestimated or underestimated,
+            # the number of digits should be k - 1 or k + 1 respectively.
+            # For simplicity, we do not compute the exact number of digits.


It can differ more than by 1 for large numbers with more than 2**53 digits. Yes, it will take long time (years?) to calculate the decimal digits, but let be more future proof.

Ah yes, I actually didn't consider the possibility of having 2**53 digits...

MarcellPerger1 · 2025-06-19T09:40:45Z

I afraid that calculating the last j digits has the same complexity as calculating all digits

Calculating the last few digits of a number N with n digits is $O(n)$ using some modular arithmetic tricks:
We can find the last j digits of N by doing something like (pseudo-Python) sum((2**i) % (10**j) for i, v in enumerate(N.get_bits()) if v == 1). ¹
We can calculate (2**i) % (10**j) more efficiently:

First, notice how the last j digits of 2**i repeat after a while (e.g. the last 2 digits of $2^0,2^1,\ldots$ is 1,2,4,8,16,32,64,28,56,12,24,48,96,92,84,68,36,72,44,88,76,52,4,8,16,...).
So, ahead of time, we can work out where this cycle loops back round to and then use 2 lookup tables² to find the last j digits of (2**i) and this will be used as (2**i) % (10**j).
This will only involve only a subtraction and a i % (some value) and as mod is O(digits in i), this takes at most $O(\log n)$ time (although this is rather insignificant in practice as any integer where number of digits > $2^{64}$ wouldn't fit into memory so log(n) = log(number of digits) must be < 64)

Therefore, as we need to add $O(n)$ things each taking $O(\log n)$ time to compute, we can find the last few digits of an integer in $O(n\log n)$.
And I'm pretty sure calculating all digits has a worse time complexity than that.

Also, we would have to mod (10**j) after every few addition steps to keep the addition operation constant-time ↩
One for the initial non-repeating section (1,2 in the example above) and another one for the repeating section 4,8,16,32,...,76,52 ↩

picnixz · 2025-06-21T10:03:24Z

First of all, the "hard" issue only appears when we're considering integers with more than $2^{53}$ digits. We can handle those under that limit by knowing that math.log will be off by 1 (I think the off-by-1 issue only holds for $\log_{10}(N) < 2^{53}$).

So, if we were to enumerate over the bits of N to get the last digits, we will need to iterate over $\log_2(N) \approx 2^{53}$ bits... which is impossible. In particular,

log(n) = log(number of digits)

is incorrect. It is possible to have a number with $2^{48}$ digits and it is also possible to have a number with $2^{64}$ digits in the future.

Some comments to your solution:

sum((2**i) % (10**j) for i, v in enumerate(N.get_bits()) if v == 1)

This is not entirely accurate. What should be more accurate is:

def tail(n, j):
    M = 10 ** j
    bits = reversed(format(n, 'b'))
    res = sum((2 ** i) % M for i, v in bits if v == '1') % M
    return str(res)

There is a small correction to make at the end where you need to apply once again the modulus.

So, ahead of time, we can work out where this cycle loops back round to and then use 2 lookup tables

The lookup table requires $j$ but $j$ depends on maxstring. So we can't build the lookup table in advance (we don't know how large $j$ will be and it would be meaningless to build it for each instance, or hold a global cache as it could grow indefinitely).

More generally, the pattern is $(2^0, 2^1, ..., 2^{(j-1)}, \gamma, \gamma, \cdots)$ with $\gamma$ the cycle. For $j=2$, the cycle has length 20, for $j = 3$, it has length 100 and for $j = 4$ it has length 500 and for $j = 5$ it has length 2500 (it seems to be multiplied by 5 every time, though it's just an observation). It may grow fast very quick! (though we might have a general formula solely depending on $i$ and $j$ but that wouldn't solve the issue with the fact that we need to iterate over the bits of a very large integer)

MarcellPerger1 · 2025-06-22T18:19:54Z

iterate over $\log_2{N} ≈ 2^{53}$ bits... which is impossible

My apologies if I wasn't being clear: I was suggesting an algorithm for integers of less than a few million digits (that should be faster or at least better time complexity than just calculating all digits). These are more likely to actually occur in practice than integers with $2^{53}$ bits. Maybe it should only be run when the number of digits is below a certain threshold.

This is not entirely accurate.

I know, I just didn't want to over-complicate the explanation of the algorithm with the implementation details - it shouldn't affect the time complexity (I tried to make this clear by explicitly writing pseudo-Python there, I know int.get_bits() doesn't exist)

The lookup table requires $j$ but $j$ depends on maxstring.

Personally, I don't see why we have to fill the entire maxlong length with digits. Maybe we could just show the last (for example) 4 digits - I doubt any extra digits would provide much more useful information to the user.

Now that I think about it, the last few digits of the integer aren't even really that useful to the user. What's probably more useful is the length of the number and the first few digits in it.
Maybe a format like 123456...<N digits omitted> would be better.
Or if we still want the last (3 or 4) digits for consistency, 123456...<N digits omitted>...789

picnixz · 2025-06-24T10:16:35Z

Now that I think about it, the last few digits of the integer aren't even really that useful to the user. What's probably more useful is the length of the number and the first few digits in it.

I don't think we should make a guess of what's useful or not. We should rather try to comply to the contract that we ought to deliver. In this case, if the user wants to render many digits even for very large numbers, then so be it.

I think we can just use _pylong.int_to_decimal_string. It will be easier. And we can just cut down the resulting string. Or we can adapt int_to_decimal_string to be smarter and only get the first digits and last ones (though the last ones, we could just perform a modulus directly).

serhiy-storchaka

LGTM, except the index entry.

We can add more details later. The current goal is to get rid of ValueError.

Doc/library/reprlib.rst

miss-islington-app · 2025-06-24T11:37:28Z

Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington-app · 2025-06-24T11:37:28Z

Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

…tegers (pythonGH-135506) (cherry picked from commit e5f03b9) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

bedevere-app · 2025-06-24T11:37:37Z

GH-135886 is a backport of this pull request to the 3.13 branch.

…tegers (pythonGH-135506) (cherry picked from commit e5f03b9) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

bedevere-app · 2025-06-24T11:37:42Z

GH-135887 is a backport of this pull request to the 3.14 branch.

…ntegers (GH-135506) (#135886) gh-135487: fix `reprlib.Repr.repr_int` when given very large integers (GH-135506) (cherry picked from commit e5f03b9) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

…ntegers (GH-135506) (#135887) gh-135487: fix `reprlib.Repr.repr_int` when given very large integers (GH-135506) (cherry picked from commit e5f03b9) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

picnixz requested a review from serhiy-storchaka June 14, 2025 10:46

bedevere-app bot added the awaiting core review label Jun 14, 2025

bedevere-app bot mentioned this pull request Jun 14, 2025

Using reprlib.repr fails for very large numbers #135487

Closed

fix reprlib.Repr.repr_int

4ac21bb

picnixz force-pushed the fix/reprlib/value-error-digits-135487 branch from a50f94a to 4ac21bb Compare June 14, 2025 10:57

serhiy-storchaka reviewed Jun 15, 2025

View reviewed changes

update comment

813ab8c

serhiy-storchaka approved these changes Jun 24, 2025

View reviewed changes

Doc/library/reprlib.rst Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels Jun 24, 2025

address review

2ffbec9

picnixz enabled auto-merge (squash) June 24, 2025 10:55

picnixz merged commit e5f03b9 into python:main Jun 24, 2025
40 checks passed

bedevere-app bot removed the awaiting merge label Jun 24, 2025

picnixz added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Jun 24, 2025

picnixz deleted the fix/reprlib/value-error-digits-135487 branch June 24, 2025 11:37

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jun 24, 2025

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Jun 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-135487: fix `reprlib.Repr.repr_int` when given very large integers #135506

gh-135487: fix `reprlib.Repr.repr_int` when given very large integers #135506

Uh oh!

picnixz commented Jun 14, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

serhiy-storchaka left a comment

Uh oh!

Uh oh!

serhiy-storchaka Jun 15, 2025

Uh oh!

picnixz Jun 15, 2025

Uh oh!

MarcellPerger1 commented Jun 19, 2025

Uh oh!

picnixz commented Jun 21, 2025 •

edited

Loading

Uh oh!

MarcellPerger1 commented Jun 22, 2025

Uh oh!

picnixz commented Jun 24, 2025

Uh oh!

serhiy-storchaka left a comment

Uh oh!

Uh oh!

Uh oh!

miss-islington-app bot commented Jun 24, 2025

Uh oh!

miss-islington-app bot commented Jun 24, 2025

Uh oh!

bedevere-app bot commented Jun 24, 2025

Uh oh!

bedevere-app bot commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!

gh-135487: fix reprlib.Repr.repr_int when given very large integers #135506

gh-135487: fix reprlib.Repr.repr_int when given very large integers #135506

Uh oh!

Conversation

picnixz commented Jun 14, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

serhiy-storchaka Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

picnixz Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

MarcellPerger1 commented Jun 19, 2025

Footnotes

Uh oh!

picnixz commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcellPerger1 commented Jun 22, 2025

Uh oh!

picnixz commented Jun 24, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

miss-islington-app bot commented Jun 24, 2025

Uh oh!

miss-islington-app bot commented Jun 24, 2025

Uh oh!

bedevere-app bot commented Jun 24, 2025

Uh oh!

bedevere-app bot commented Jun 24, 2025

Uh oh!

Uh oh!

gh-135487: fix `reprlib.Repr.repr_int` when given very large integers #135506

gh-135487: fix `reprlib.Repr.repr_int` when given very large integers #135506

picnixz commented Jun 14, 2025 •

edited by bedevere-app bot

Loading

picnixz commented Jun 21, 2025 •

edited

Loading