Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Support binary format specifier 'b' and 'B' (standard numeric format strings) #83619

Closed
RaphaelTetreault opened this issue Mar 17, 2023 · 24 comments · Fixed by #85392
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@RaphaelTetreault
Copy link

RaphaelTetreault commented Mar 17, 2023

Background and motivation

.NET/C# supports a wide range of standard numeric format specifiers when calling .ToString on an integer value. Binary numeric formatting is currently only available by calling Convert.ToString([integer], base = 2). It would be convenient if int.ToString([format specifier][precision specifier]) would accept the unused character b and B as the format specifier followed by an optional precision specifier to denote the minimum number of binary digits to display.

API Proposal

namespace System.Globalization;

[Flags]
public partial enum NumberStyles
{
    AllowBinarySpecifier = 0x00000400,
    BinaryNumber = AllowLeadingWhite | AllowTrailingWhite | AllowBinarySpecifier,
}

API Usage

The API would be like so:

string byteBinary = ((byte)42).ToString("B");
string intBinary = (42).ToString("b16");

such that:

Console.WriteLine(byteString);
Console.WriteLine(intString);

will output:

101010
0000000000101010

Alternative Designs

Enable all integer types' (Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128) ToString function to accept the format specifier b and B with an optional precision specifier in order to output the value's binary representation as a string.

Risks

There should be none as the b and B format specifiers are not currently in use.
https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

@RaphaelTetreault RaphaelTetreault added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 17, 2023
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 17, 2023
@ghost
Copy link

ghost commented Mar 18, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

.NET supports a wide range of standard numeric format specifiers when calling .ToString on an integer value. Binary numeric formatting is currently only available by calling Convert.ToString([integer], base = 2). It would be convenient if int.ToString([format specifier][precision specifier]) would accept the unused character b and B as the format specifier followed by an optional precision specifier to denote the minimum number of binary digits to display.

API Proposal

n/a

API Usage

The API would be like so:

string byteBinary = ((byte)42).ToString("B");
string intBinary = (42).ToString("b16");

such that:

Console.WriteLine(byteString);
Console.WriteLine(intString);

will output:

101010
0000000000101010

Alternative Designs

Enable all integer types' (Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128) ToString function to accept the format specifier b and B with an optional precision specifier in order to output the value's binary representation as a string.

Risks

There should be none as the b and B format specifier is not currently in use.
https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

Author: RaphaelTetreault
Assignees: -
Labels:

api-suggestion, area-System.Numerics, untriaged

Milestone: -

@huoyaoyuan
Copy link
Member

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Mar 23, 2023
@tannergooding
Copy link
Member

The design can/should be that all integer types support the B format specifier. We will not want to limit this to only a subset of types/values.

@stephentoub
Copy link
Member

Would we need to consider the parsing direction as well? e.g. NumberStyles.AllowBinarySpecifier (even though the existing AllowHexSpecifier is, IMHO, poorly named)

@tannergooding
Copy link
Member

It's probably worth considering them together, yes. I updated the OP to include AllowBinarySpecifier and BinaryNumber -- CC. @RaphaelTetreault as an FYI that I updated it

@RaphaelTetreault
Copy link
Author

@tannergooding Thanks for the notice. My understanding here is that it would be of value to also implement binary string integer parsing? If so, there are at least three relevant issues currently open related to that.

[API Proposal]: Convert.ToString Methods #61719
Allow int.ToString and Parse to support radixes other than base-10 and base-16 #50491
Support plan for parsing of Binary numeric literal strings, e.g., Convert.Int32("0b1011") #19642

@tannergooding
Copy link
Member

Yes, but those are either of Convert.ToString or under a new API surface.

In this case, your proposal actually fits the existing semantics e.g. int.ToString("B") is binary like int.ToString("X") is hex.

It then makes sense to consider the inverse (parsing) at the same time and that simply requires two new members to NumberStyleOptions.

I'll close the other three in favor of this proposal.

@terrajobst
Copy link
Member

terrajobst commented Apr 11, 2023

Video

  • Looks good as proposed
    • b and B are the formatting modifiers
namespace System.Globalization;

[Flags]
public partial enum NumberStyles
{
    AllowBinarySpecifier = 0x00000400,
    BinaryNumber = AllowLeadingWhite | AllowTrailingWhite | AllowBinarySpecifier,
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 11, 2023
@stephentoub
Copy link
Member

stephentoub commented Apr 16, 2023

Implementation progress:

General

Formatting... support 'b'/'B' in ToString/TryFormat:

Parsing... support AllowBinarySpecifier/BinaryNumber in {Try}Parse

@stephentoub stephentoub removed their assignment Apr 19, 2023
@stephentoub stephentoub added the help wanted [up-for-grabs] Good issue for external contributors label Apr 19, 2023
@stephentoub stephentoub added this to the 8.0.0 milestone Apr 19, 2023
@stephentoub
Copy link
Member

stephentoub commented Apr 19, 2023

Everything here is done except for BigInteger. It still needs formatting and parsing
support added.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 26, 2023
@lateapexearlyspeed
Copy link
Contributor

Hi, I am trying to work on BigInteger part, considering BigInteger is not fixed-length numeric, so I would ask binary format definition so that its binary format can be unique to represent one numeric (include its sign). Following is just an example to indicate one possible format and purpose, please clarify eventual definition:

  • No digits

    • positive number:
      • When highest bit in byte is not 1, eg: 54:
        11_0110
      • When highest bit in byte is 1, eg: 214:
        0_1101_0110 - use additional bit '0' in next higher byte to distinguish from negative -42 (1101_0110)
    • negative number, eg: -42:
      1101_0110
  • Has specified digits and larger than minimal necessary bits:

    • positive number, eg: 22, and requires 7 digits:
      001_0110 - just add additional two '0' until total is 7 digits
    • negative number, eg: -42, and requires 9 digits:
      1111_1111_1101_0110 - need to add eight '1', to distinguish from positive 470 (1_1101_0110)

@RaphaelTetreault
Copy link
Author

RaphaelTetreault commented May 5, 2023

The precision specifier asks for the desired number of digits. As mentioned, what this means is any positive number needs a leading 0 while a negative needs a leading 1. Making an assumption about how BigInteger works, I would think it just prints all required digits of the underlying byte buffer and appends a 0 if needed (negative numbers would always end with 1). I see that most reasonable given it is the minimum number of bits required to unambiguously represent a non-fixed integer.

+234 == 0b11101010, thus output 0_11101010
-106 == 0b11101010, thus output   11101010

The other option which might cause incompatibility with binary parsing would be to substitute the appended (conditional) 0 or 1 with a fixed + or - sign. I'm not sure what the implications would be for fixed-length integer parsing, eg:

int.Parse("+1111", NumberStyles.BinaryNumber); // +15
int.Parse("-1111", NumberStyles.BinaryNumber); //  -1

Would that be allowed/supported?

Back to the original example but using signs, BigInteger could behave like so when printed:

+234 == 0b11101010, thus output +11101010 // 8 bits
-106 == 0b11101010, thus output  -1101010 // 7 bits

This style of signed binary is sort of like what Google Search does. You can try searching +234 in binary and -106 in binary and you will see negative signs on negative values, with no sign being interpreted as positive.

If it's worth anything, my vote goes to the former formatting.

@tannergooding
Copy link
Member

tannergooding commented May 5, 2023

There is an existing behavior for how hex formatting works and we'd need to be consistent with it.

https://source.dot.net/#System.Runtime.Numerics/System/Numerics/BigNumber.cs,260b817fae02d08e,references

@lateapexearlyspeed
Copy link
Contributor

lateapexearlyspeed commented May 6, 2023

Hi @RaphaelTetreault thanks, yes I agree we should not prefix signs char in binary.

-106 == 0b11101010, thus output 11101010

Just confirm, the output binary should be 2's complement so -106 should output 1001_0110

What I would confirm is binary format definition for non-fixed length's BigInteger so that its binary format can be unique to represent one numeric in both "no digits" (no precision specifier) case and "digits" (with precision specifier) case, so that in any cases the output binary string can be parsed back to BigInteger.

@tannergooding yes I noticed existing hex formatting definition previously, and this sample binary proposal was made by just aligning existing hex formatting as possible and it gives examples for 5 cases. However, because I feel there are still some detailed difference between hex and binary so I would need confirm if it is proper or need to update. Could you help have a look at it ?

Just highlight one of differences here:
image

because -921's binary (2's complement)'s highest 1 (non-sign bit) is 10th bit (from 0), which is lower than 13th bit, so current hex format of BigInteger is: C67 rather than FC67 (because one hex char covers 4 bits), so what about binary output for BigInteger for -921 ? (In my sample proposal, it will output 1111_1100_0110_0111 to align one whole byte rather than 1 or 4 bits).
Something like these consideration (including case of specifying digits )..

@huoyaoyuan
Copy link
Member

Notes related to this when working with #28657:

The decimal parsing/formatting of BigInteger will easily be shared. The binary/hexadecimal won't be shared because the difference between BigInteger and small integers (arithmetic operations creating new values have huge cost). A rewritten path for Hex/Bin is desired for BigInteger.

@tannergooding
Copy link
Member

You'd need to return the shortest string. Hex returns C67 because that's sufficient and it returns at most 0 leading zero. It is always a multiple of 4-bits because 4 is the smallest unit needed for every hex digit.

In the case of binary, this would be at most 1 leading zero.

@lateapexearlyspeed
Copy link
Contributor

So let me try to confirm understanding of desired binary format.

based on:

You'd need to return the shortest string

and existing hex format convention (prefixing zero for positive numeric (for some cases), to distinguish from negative numeric with same bits), one of "shortest" string formats for binary is as following:

  • -921's shortest binary is: 100 0110 0111 (minimal 11 digits by only keep 1 high one)
  • to distinguish from it, +1127's binary will be: 0100 0110 0111 (minimal 12 digits, because we prefix 1 zero to distinguish from above)

This way, binary string of positive numeric will always prefix additional zero, for the shortest negative numeric binary format.

If not correct, please just provide desired binary content for -921, thanks @tannergooding :)

Ref:
image
image

@tannergooding
Copy link
Member

tannergooding commented May 9, 2023

That sounds correct. This then matches that for hex, 0..7 is 0..7 because the smallest unit is 4 bits and therefore the sign is present and 0. While 8..F is -8..-15, because the sign bit is set. Thus to represent 8-15 you need 08..0F

For binary, then since the smallest unit is 1 bits, you therefore always need a leading 0 for positive numbers, except 0 itself:

  • etc
  • -4 = 100
  • -3 = 11
  • -2 = 10
  • -1 = 1
  • 0 = 0
  • 1 = 01
  • 2 = 010
  • 3 = 011
  • 4 = 0100
  • etc

@lateapexearlyspeed
Copy link
Contributor

Almost same understanding now, except of using 2's complement ? I can see existing hex format output of BigInteger used it, so:

hex 8..F is -8..-15

not always be, eg, hex '9' should be -7

If new binary format should also use 2's complement, then binary examples of negative numbers as above should be (note that -3):

  • -4 = 100
  • -3 = 101
  • -2 = 10
  • -1 = 1

Should we also use 2's complement for binary ?

@tannergooding
Copy link
Member

not always be, eg, hex '9' should be -7

Yes, sorry. This should have said "hex 8..F is -8..-1".

if new binary format should also use 2's complement, then binary examples of negative numbers as above should be (note that -3):

Yes. 11 would be an alternative representation of -1, much as FF is also -1 for hex. Just messed up my mental math when writing it down 😄

@lateapexearlyspeed
Copy link
Contributor

Hi, PR is ready for review now, could you please help review, thanks !

@tannergooding tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023
@tannergooding
Copy link
Member

The BigInteger support is, unfortunately, not going to make it for .NET 8. We had several bug fixes around BigInteger crop up and several other higher priority features that needed to land.

We can finish reviewing/merging the remaining PR anytime after main opens for .NET 9 next month.

@lateapexearlyspeed
Copy link
Contributor

lateapexearlyspeed commented Aug 23, 2023

Hi @tannergooding just kindly remind PR can be reviewed now as main branch should already open for .NET 9, thanks !

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 12, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants