Added toBytes property to std.bigint.BigInt #6437

JonathanWilbur · 2018-04-09T00:36:29Z

Per issue 13804, I created a toBytes property for std.bigint.BigInt. This generates an arbitrarily-long array of unsigned bytes that represents the signed, native-endian binary representation of the BigInt.

https://issues.dlang.org/show_bug.cgi?id=13804

A Punycode encoder and decoder based on the original implementation in RFC 3492. I am submitting this to the D Standard Library (Phobos) because because I believe it is a suitable candidate for a standard library module, on these grounds: 1.) It is critical to Uniform Resource Identifiers (URIs), which are ubiquitous, and are themselves critical for many programs. 2.) Phobos already has a module for Uniform Resource Identifiers: std.uri, yet no functionality for Punycode. 3.) It is critical to the Domain Name System (DNS), which is also ubiquitous, and itself critical for many programs. 4.) There are probably a few other ways that nobody has thought of for encoding and decoding Punycode, but only one way is specified clearly as an example implementation in the original RFC that specifies Punycode. This module, is based upon the original suggested implementation in RFC 3492, and there is little--if any--reason why a developer would prefer an alternative implemen

dlang-bot · 2018-04-09T00:36:30Z

Thanks for your pull request and interest in making D better, @JonathanWilbur! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6437"

JonathanWilbur · 2018-04-09T00:40:07Z

That punycode commit you see there was from a previous pull request. I removed it in a subsequent commit.

JonathanWilbur · 2018-04-09T00:46:43Z

Oh, also, the main rationale--the one I suspect everybody has for wanting this property--is the ability to convert large integers to arrays of bytes for ASN.1 encoding in RSA data structures. That's why I made it, FWIW.

Biotronic · 2018-04-09T05:45:48Z

Have you looked at BigUInt (in std/internal/math/biguintcore.d)? Almost every function in BigInt is just a call to a function of BigUint, and I suggest this should be as well.

BigUint would then have a toBytes function that doesn't deal with signs, and BigInt's toBytes implementation would take that value and do 2's complement if necessary. BigUint.data is simply a uint[] and a simple cast should do most of what this PR does.

ghost

This is mostly a style review. I suggest changing the for loops to foreach equivalent. Maybe an Appender could be used too. Also use do instead of body.

ghost · 2018-04-09T09:10:41Z

std/bigint.d

+    }
+    body
+    {
+        ubyte[] ret;


Use an appender!(ubyte[]) i think

Well, I subsequently set the length of the array to the length that will be needed, then dump the uints in that array. How would Appender improve that?

okay, forget the appender thing.

ghost · 2018-04-09T09:11:19Z

std/bigint.d

+        This is just used by `toBytes` to convert unsigned bytes of the
+        BigInt to two's complement form.
+    */
+    private static void incrementBytes (ref ubyte[] value) nothrow pure @safe


Should take an output range of ubyte (confer with other comment proposing to use an Appender)

Can you clarify? What should the signature of this method be?

a template constraint taking only an output range of ubyte

So something like private static void incrementBytes(Range)(Range result, ubyte[] value) if (isOutputRange!Range)? You still need to pass the original data in there somehow.

Ok. Let's forget the outrange thing in this review and concentrate about the style.

ghost · 2018-04-09T09:12:04Z

std/bigint.d

+        version (BigEndian)
+        {
+            // This loop adds one to an arbitrary length array of bytes.
+            for (size_t i = (value.length - 1); i < 0u; i--)


style: there's no ambiguity with precedence so value.length-1 can live without parens.

ghost · 2018-04-09T09:12:12Z

std/bigint.d

+                value[i] = 0x00u;
+
+                // If the array of bytes is maxed out, append a next byte set to one.
+                if (i == (value.length - 1))


style: there's no ambiguity with precedence so value.length-1 can live without parens.

ghost · 2018-04-09T09:13:07Z

std/bigint.d

+    {
+        assert(ret.length > 0u);
+    }
+    body


use do instead of body. The latter will be deprecated.

Note: this comments stands for all body in the PR.

Good to know. I must have missed the memo.

ghost · 2018-04-09T09:13:44Z

std/bigint.d

+            size_t startOfNonPadding = 0u;
+            if (this >= 0)
+            {
+                for (size_t i = 0u; i < (ret.length - 1); i++)


style: there's no ambiguity with precedence so ret.length-1 can live without parens.

ghost · 2018-04-09T09:13:56Z

std/bigint.d

+            }
+            else
+            {
+                for (size_t i = 0u; i < (ret.length - 1); i++)


style: there's no ambiguity with precedence so ret.length-1 can live without parens.

by the way foreach(i; 0.. N) tends to be used more nowadays in D.

That's a good tip; definitely a cleaner way to do read-only loops. However, that won't work when you have to set the values in an array, because i in your example would be passed in by value, so setting i would not affect the original array from which the slice was taken. (And indeed, I have to do that in my code, particularly when converting an array of bytes to two's complement form.)

I don't know how slicing works (base+bound vs. big dumb copy vs. copy-on-write), but if it is the latter two of those three, then changes to the slice will not modify the original as well. But I'm not qualified to assert anything else. 😝

Biotronic · 2018-04-09T09:23:55Z

std/bigint.d

+    body
+    {
+        ubyte[] ret;
+        ret.length = (this.uintLength << 2); // Multiply by 4


The compiler will do this optimization for you. Don't make the code harder to read without proof that the straightforward code is too slow, or prone to bugs.

Neat! I didn't know the compiler did this!

wilzbach

Do you have a machine to test version(BigEndian)?

wilzbach · 2018-04-09T15:11:38Z

std/bigint.d

+        version (BigEndian)
+        {
+            // This loop adds one to an arbitrary length array of bytes.
+            for (size_t i = value.length - 1; i < 0u; i--)


This would be foreach_reverse (0u ... value.length - 1) or foreach_reverse (e; size_t(0) .. 10) if you want size_t - though I think in this case even int is fine.
Also note that i < 0 can never be true and that IIRC BigEndian isn't tested on any CIs.

wilzbach · 2018-04-09T15:12:59Z

std/bigint.d

+        else version (LittleEndian)
+        {
+            // This loop adds one to an arbitrary length array of bytes.
+            for (size_t i = 0u; i < value.length; i++)


foreach (e; 0 .. value.length)

foreach (e; size_t(0) .. value.length)

foreach (i, ref e; value) <- the best

wilzbach · 2018-04-09T15:13:18Z

std/bigint.d

+    ubyte[] toBytes() const pure @system @property nothrow
+    in
+    {
+        assert(this.uintLength > 0u);


Needs a message

wilzbach · 2018-04-09T15:13:31Z

std/bigint.d

+    }
+    out (ret)
+    {
+        assert(ret.length > 0u);


Needs a message

wilzbach · 2018-04-09T15:13:48Z

std/bigint.d

+
+    /**
+        Converts the `BigInt` to a sequence of bytes that represents the
+        native-endian representation of that number.


Needs "Returns:"

edit: has been addressed.

wilzbach · 2018-04-09T15:13:57Z

std/bigint.d

+        ubyte[] ret = this.data.toBytes;
+        if (this.sign)
+        {
+            for (size_t u = 0u; u < ret.length; u++)


wilzbach · 2018-04-09T15:17:28Z

std/bigint.d

+    {
+        if (value.length == 0u)
+        {
+            value = [ 0x01u ];


JonathanWilbur · 2018-04-10T00:44:16Z

@wilzbach , I do not have a machine for testing on BigEndian. Actually, I assumed that one of the hosts in the D Auto-Tester fleet would be big endian...

JonathanWilbur · 2018-04-10T00:45:56Z

Also, I don't think that Circle CI failure is my fault: it says that Dscanner failed. I didn't change anything about Dscanner.

jmdavis · 2018-04-10T01:15:10Z

Actually, I assumed that one of the hosts in the D Auto-Tester fleet would be big endian...

dmd only supports x86 and x86_64, so it doesn't support Big Endian at all, much as the language does. To do anything with Big Endian and D, you need either ldc or gdc and a Big Endian machine that they support. And since they're maintained separately from dmd, we can't test PRs against them. So, unfortunately, anything we've done with Big Endian in Phobos is done with the hope that it works and that when the changes finally get merged into gdc or ldc, if they're broken for Big Endian, someone will report it. Fortunately, not much code in Phobos cares about endianness.

wilzbach · 2018-04-10T01:44:30Z

Also, I don't think that Circle CI failure is my fault: it says that Dscanner failed. I didn't change anything about Dscanner.

Yep, not your fault. The GC is segfaulting spuriously, but so far no one knows why (or has cared to investigate).
The issue is: https://issues.dlang.org/show_bug.cgi?id=18720

JonathanWilbur · 2018-04-10T11:43:18Z

So what is the next step? Does somebody else need to review this? Do I need to change something?

Biotronic · 2018-04-10T12:04:15Z

I still hold that the correct location for most of this code is in std/internal/math/biguintcore.d. It should handle the unsigned stuff, and std/bigint.d should only do the 2's complement when that's necessary.

JonathanWilbur · 2018-04-10T16:10:59Z

@Biotronic , I guess you missed it, but I already moved most of it to biguintcore.d. Please review it again.

Biotronic

LGTM

JonathanWilbur · 2018-04-15T15:40:18Z

Just following up on this, @wilzbach (or anybody else), is there a reason this has not been merged yet? Do I need to change anything?

I should also add that the auto-tester appears to run on and off, so it looks like this is still in testing even though it has passed before. There was one strange failure a few days ago, which appears to come from:

make[1]: *** [generated/linux/debug/64/unittest/std/algorithm/searching.o] Killed
make[1]: Leaving directory `/media/ephemeral0/sandbox/at-client/pull-3115292-Linux_64_64/phobos'
make: *** [unittest-debug] Error 2
make: *** Waiting for unfinished jobs....

which I don't think has anything to do with my commit.

wilzbach · 2018-04-15T16:02:27Z

Thanks for the ping and sorry for not clarifying this. New symbols need the following:

a changelog entry
approval from @andralex
full ddoc documentation headers

Some CI have unfortunately spurious failures. While we are trying to weed them out, it's not always easy, but you can safely ignore such a failure.

wilzbach · 2018-04-15T19:06:09Z

BTW at the end we need to squash all commits into one, so I recommend to use rebases instead of merge commits as then the final squashing is easier (though it's not a blocking issue if you have troubles with this. We can help you out with this if you need help and force-push over your branch at the end though I'm this case you used the master branch, so you might want to make sure that your punycode commit is safely stored in another branch)

n8sh · 2018-04-24T22:36:28Z

std/bigint.d

+        Returns: a `ubyte[]` array, representing the native-endian
+            representation of that number.
+    */
+    T opCast(T : ubyte[])() pure nothrow @system const


I am a bit wary of an opCast that allocates. Can someone else comment on whether that's normal for Phobos?

Searching for opCast in Phobos yields many results, but none of them seem to be allocating.

I am a little confused. What is the alternative? This opCast() override just calls the toBytes() accessor.

The alternative is to not have the opCast at all, and force the programmer to call toBytes() to get the byte representation. The advantage of this is the allocation is more explicit, while cast(byte[])myBigint may look like it's not doing anything scary - after all, for some possible implementations, it would simply return a different view of the same data.

Ah. I've never actually thought about that, but that makes sense! I will remove the opCast override then.

n8sh · 2018-04-25T14:37:55Z

std/internal/math/biguintcore.d

+        Returns: the native-endian unsigned representation of the `BigInt` in
+            the form of a `ubyte[]` array.
+    */
+    ubyte[] toBytes() const pure @system @property nothrow


Instead of allocating a new buffer would it be feasible to simply return a slice of this.data? Is DIP1000's scope qualifier for function return values useful for that?

Nope. toBytes is 2s complement, while BigInt uses signed magnitude.

For positive numbers.

I suppose that the internal array of uints that is this.data could be cast to a ubyte array if it is positive, and a slice returned, but I will have to test that. But yeah, when it's negative, that goes out the door, as Biotronic said.

Jonathan Wilbur and others added 4 commits February 27, 2017 22:33

Added toBytes property to BigInt.

2bcdf53

Merge https://github.com/JonathanWilbur/phobos

893c82d

Removed punycode.

84e867a

ghost suggested changes Apr 9, 2018

View reviewed changes

Biotronic reviewed Apr 9, 2018

View reviewed changes

Revised BigInt.toBytes per PR dlang#6437.

50dd9cc

wilzbach reviewed Apr 9, 2018

View reviewed changes

std/bigint.d

{

if (value.length == 0u)

{

value = [ 0x01u ];

Copy link

Member

wilzbach Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncovered

Changed for loops to foreach loops; added assert messages.

729e2cf

ghost approved these changes Apr 10, 2018

View reviewed changes

Biotronic approved these changes Apr 10, 2018

View reviewed changes

wilzbach added @andralex Approval from Andrei is required Needs Changelog A changelog entry needs to be added to /changelog labels Apr 15, 2018

JonathanWilbur added 2 commits April 15, 2018 14:39

Merge https://github.com/dlang/phobos

a1c0cc1

Added more ddoc to new methods; added changelog item.

9b3b512

wilzbach removed the Needs Changelog A changelog entry needs to be added to /changelog label Apr 15, 2018

n8sh reviewed Apr 24, 2018

View reviewed changes

n8sh reviewed Apr 25, 2018

View reviewed changes

dlang-bot added the stalled label May 27, 2021

JonathanWilbur closed this Sep 8, 2023

Added toBytes property to std.bigint.BigInt #6437

Added toBytes property to std.bigint.BigInt #6437

Conversation

JonathanWilbur commented Apr 9, 2018

dlang-bot commented Apr 9, 2018

Bugzilla references

Testing this PR locally

JonathanWilbur commented Apr 9, 2018

JonathanWilbur commented Apr 9, 2018

Biotronic commented Apr 9, 2018

ghost left a comment • edited by ghost

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilzbach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wilzbach Apr 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanWilbur commented Apr 10, 2018

JonathanWilbur commented Apr 10, 2018

jmdavis commented Apr 10, 2018

wilzbach commented Apr 10, 2018

JonathanWilbur commented Apr 10, 2018

Biotronic commented Apr 10, 2018

JonathanWilbur commented Apr 10, 2018

Biotronic left a comment

Choose a reason for hiding this comment

JonathanWilbur commented Apr 15, 2018

wilzbach commented Apr 15, 2018

wilzbach commented Apr 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost left a comment •

edited by ghost

wilzbach Apr 9, 2018 •

edited