Skip to content

BigInteger.ToString always has a leading zero in bin and hex #115618

@SolalPirelli

Description

@SolalPirelli

Description

I expect BigInteger to behave like other integer types when printed, but for bin and hex that's currently not the case.

This can mess with naive algorithms such as trying to find the bits in a given range using Substring, since the string is not always of the same length.

It's also somewhat annoying when debugging objects that print themselves using BigInteger, e.g., I have a "bit vector" class and a quick glance at something starting with 0... makes me go "wait, I expected this value to be negative when interpreted as a signed integer...".

Reproduction Steps

using System;
using System.Numerics;


Console.WriteLine(9.ToString("D1"));
Console.WriteLine(new BigInteger(9).ToString("D1"));

Console.WriteLine(1.ToString("B1"));
Console.WriteLine(new BigInteger(1).ToString("B1"));

Console.WriteLine(15.ToString("X1"));
Console.WriteLine(new BigInteger(15).ToString("X1"));

Run (e.g. on dotnetfiddle) and you get:

9
9
1
01
F
0F

Expected behavior

BigInteger's bin and hex printing should be consistent with both its own decimal printing and standard integer types' bin/hex/dec printing.

Actual behavior

There is always a leading zero, even when the result can fit exactly in the requested number of digits.

Regression?

Yes for bin, no for hex, changing the dotnetfiddle to .NET 8 outputs:

9
9
1
1
F
0F

Known Workarounds

Checking the length and truncating the string when needed

Configuration

.NET 9, x64, Windows 11. I don't have other OSes to try on and I don't know what dotnetfiddle runs on but I assume this isn't arch- or OS-dependent.

Other information

No response

Activity

added
needs-area-labelAn area label is needed to ensure this gets routed to the appropriate area owners
on May 15, 2025
added and removed
needs-area-labelAn area label is needed to ensure this gets routed to the appropriate area owners
on May 15, 2025
elgonzo

elgonzo commented on May 15, 2025

@elgonzo

It's like this since forever, at least for the hex-formatting. While i personally also find this inconsistent with how the X specifier works for other integral types and lament the documentation not mentioning anything about it as far as i can tell (also since forever), there is a method to the "madness".

The underlying reason for the behavior seems to be the ability to roundtrip the BigInteger type using the X specifier like it would also be possible for other integral types. The problem with doing what you expect however is that BigInteger can hold negative numbers while also having an ungodly arbitrarily large value range.

How would you represent for example -1 as a hex value here?

For long that would be easy: FFFFFFFFFFFFFFFF, because long is just 64-bit, so the resulting hex string is still relatively short.
For a as of now still hypothetical Int256 type, the hex-string conversion would be a longer FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF but still be a (borderline) manageable string length. Parsing a hex number with fewer digits than that into one of these types is relatively easy by implying leading zeros that would "fill" the missing hex positions.

But for BigInteger, this simple approach is impractical. What would be the hex-formatted two's complement of -1? We can't do it like for int, long or even Int256. I don't know what a practical limit for the BigInteger value range is, but i guess the hex representation of the two's complement of a negative value would amount to many more F characters that would fill more than just a few text lines if we would choose the same approach as for int/long/etc...

I don't think i am going out on a limb here by claiming that nobody really wants the hex-formatting of a negative BigInteger number being a desolate landscape of F characters filling a whole paragraph without a good reason to do so :-)

Therefore, to be able to distinguish between positive values and the two's complement of negative values while keeping the hex strings of manageable size for values of small-ish orders of magnitude, the hex-formatting of any positive value that would also be a two's complement of a negative value gets prefixed with a 0 to avoid it being mistaken as a two's complement of a negative value:

  • new BigInteger(-1).ToString("X1") results in the string "F"
  • new BigInteger(15).ToString("X1") results in the string "0F" (two hex digits required to distinguish the value 15 from the two's complement of -1)

This is also the underlying working principle of the BigInteger.Parse method:

  • BigInteger.Parse("0F", NumberStyles.HexNumber) will yield the BigInteger value 15.
  • BigInteger.Parse("F", NumberStyles.HexNumber) will yield the BigInteger value -1.

The same reasoning would also apply to binary formatting. In .NET 8, BigInteger couldn't parse binary numbers, but this was made possible in .NET 9. So, the binary formatting of BigInteger needed to be updated to allow roundtripping of binary-formatted numbers.


There is always a leading zero,

For the hex-formatting not always. Only for such positive values whose hex representation would also be a valid two's complement of a negative value.

huoyaoyuan

huoyaoyuan commented on May 15, 2025

@huoyaoyuan
Member

The output of .NET 8 is incorrect. The binary and hexadecimal representation of BigInteger is always considered as signed, using the specified MSB as sign. 1 in binary always means -1, and 01 in binary means 1.

KalleOlaviNiemitalo

KalleOlaviNiemitalo commented on May 16, 2025

@KalleOlaviNiemitalo

Common Lisp would simply print the negative hexadecimal number with a minus sign, just like in any other radix.

printf in C uses "%x" for unsigned integers only, and I suppose .NET inherited the ToString "X" behaviour from there. If compatibility now prevents the behaviour of BigInteger.ToString("X") from being changed, perhaps one can instead define a different format string for signed hexadecimal numbers.

SolalPirelli

SolalPirelli commented on May 17, 2025

@SolalPirelli
Author

Thanks all, perhaps this is more of a documentation bug then?

AFAIK this isn't explicitly documented anywhere.
In fact I interpreted the encouragement to use "R" as a specifier to round-trip BigInteger to mean other formats may not roundtrip (here, "Recommended for the BigInteger type").

I guess the analogy with other integer types has flaws no matter what behavior is chosen, especially since BigInteger doesn't have the traditional signed/unsigned distinction other types have.

tannergooding

tannergooding commented on May 28, 2025

@tannergooding
Member

Thanks all, perhaps this is more of a documentation bug then?

A lot of this is covered under https://learn.microsoft.com/en-us/dotnet/api/system.numerics.biginteger.tostring?view=net-9.0

There's potentially more explicit documentation that could be provided and changes are welcome.

In fact I interpreted the encouragement to use "R" as a specifier to round-trip BigInteger to mean other formats may not roundtrip

A lot of the format specifiers are standardized across types. It is not guaranteed that all formats or format specifies will produce a roundtrippable value in all scenarios. R is meant to guarantee this regardless of type. For the built-in numeric types, we try to ensure the default case is roundtrippable and that we generally produce something roundtrippable for other values when no precision is specified.

Common Lisp would simply print the negative hexadecimal number with a minus sign, just like in any other radix.

While many languages support something like -F (or -0xF for actual language syntax) to mean -15, this is not as common to see in the wild. Hex and binary are often used to get the "raw bits" and defaulting to giving -F instead of F1 (or FFF1, etc) is potentially misleading and confusing to people typically working with binary/hex.

tannergooding

tannergooding commented on May 28, 2025

@tannergooding
Member

I guess the analogy with other integer types has flaws no matter what behavior is chosen, especially since BigInteger doesn't have the traditional signed/unsigned distinction other types have.

All big integers are signed. The nuance is that it isn't of fixed width and so you don't know how many leading sign bits are needed.

This causes it to default to the shortest 2's complement sequence, so for binary 0b1 is -1 (you only need the sign bit) and 0b01 is +1 (you need a sign bit and the magnitude after).

added
needs-author-actionAn issue or pull request that requires more info or actions from the author.
and removed
untriagedNew issue has not been triaged by the area owner
on May 28, 2025
dotnet-policy-service

dotnet-policy-service commented on May 28, 2025

@dotnet-policy-service
Contributor

This issue has been marked needs-author-action and may be missing some important information.

7 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @vcsjones@SolalPirelli@huoyaoyuan@elgonzo@tannergooding

      Issue actions

        BigInteger.ToString always has a leading zero in bin and hex · Issue #115618 · dotnet/runtime