Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added toBytes property to std.bigint.BigInt #6437

Closed
wants to merge 8 commits into from

Conversation

JonathanWilbur
Copy link

Per issue 13804, I created a toBytes property for std.bigint.BigInt. This generates an arbitrarily-long array of unsigned bytes that represents the signed, native-endian binary representation of the BigInt.

https://issues.dlang.org/show_bug.cgi?id=13804

Jonathan Wilbur and others added 4 commits February 27, 2017 22:33
A Punycode encoder and decoder based on the original implementation in RFC 3492. I am submitting this to the D Standard Library (Phobos) because because I believe it is a suitable candidate for a standard library module, on these grounds:

1.) It is critical to Uniform Resource Identifiers (URIs), which are ubiquitous, and are themselves critical for many programs.
2.) Phobos already has a module for Uniform Resource Identifiers: std.uri, yet no functionality for Punycode.
3.) It is critical to the Domain Name System (DNS), which is also ubiquitous, and itself critical for many programs.
4.) There are probably a few other ways that nobody has thought of for encoding and decoding Punycode, but only one way is specified clearly as an example implementation in the original RFC that specifies Punycode. This module, is based upon the original suggested implementation in RFC 3492, and there is little--if any--reason why a developer would prefer an alternative implemen
@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @JonathanWilbur! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#6437"

@JonathanWilbur
Copy link
Author

That punycode commit you see there was from a previous pull request. I removed it in a subsequent commit.

@JonathanWilbur
Copy link
Author

Oh, also, the main rationale--the one I suspect everybody has for wanting this property--is the ability to convert large integers to arrays of bytes for ASN.1 encoding in RSA data structures. That's why I made it, FWIW.

@Biotronic
Copy link
Contributor

Have you looked at BigUInt (in std/internal/math/biguintcore.d)? Almost every function in BigInt is just a call to a function of BigUint, and I suggest this should be as well.

BigUint would then have a toBytes function that doesn't deal with signs, and BigInt's toBytes implementation would take that value and do 2's complement if necessary. BigUint.data is simply a uint[] and a simple cast should do most of what this PR does.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a style review. I suggest changing the for loops to foreach equivalent. Maybe an Appender could be used too. Also use do instead of body.

std/bigint.d Outdated
}
body
{
ubyte[] ret;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an appender!(ubyte[]) i think

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I subsequently set the length of the array to the length that will be needed, then dump the uints in that array. How would Appender improve that?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, forget the appender thing.

This is just used by `toBytes` to convert unsigned bytes of the
BigInt to two's complement form.
*/
private static void incrementBytes (ref ubyte[] value) nothrow pure @safe
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should take an output range of ubyte (confer with other comment proposing to use an Appender)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify? What should the signature of this method be?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a template constraint taking only an output range of ubyte

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like private static void incrementBytes(Range)(Range result, ubyte[] value) if (isOutputRange!Range)? You still need to pass the original data in there somehow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let's forget the outrange thing in this review and concentrate about the style.

std/bigint.d Outdated
version (BigEndian)
{
// This loop adds one to an arbitrary length array of bytes.
for (size_t i = (value.length - 1); i < 0u; i--)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: there's no ambiguity with precedence so value.length-1 can live without parens.

std/bigint.d Outdated
value[i] = 0x00u;

// If the array of bytes is maxed out, append a next byte set to one.
if (i == (value.length - 1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: there's no ambiguity with precedence so value.length-1 can live without parens.

std/bigint.d Outdated
{
assert(ret.length > 0u);
}
body
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use do instead of body. The latter will be deprecated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this comments stands for all body in the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I must have missed the memo.

std/bigint.d Outdated
size_t startOfNonPadding = 0u;
if (this >= 0)
{
for (size_t i = 0u; i < (ret.length - 1); i++)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: there's no ambiguity with precedence so ret.length-1 can live without parens.

std/bigint.d Outdated
}
else
{
for (size_t i = 0u; i < (ret.length - 1); i++)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: there's no ambiguity with precedence so ret.length-1 can live without parens.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way foreach(i; 0.. N) tends to be used more nowadays in D.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good tip; definitely a cleaner way to do read-only loops. However, that won't work when you have to set the values in an array, because i in your example would be passed in by value, so setting i would not affect the original array from which the slice was taken. (And indeed, I have to do that in my code, particularly when converting an array of bytes to two's complement form.)

I don't know how slicing works (base+bound vs. big dumb copy vs. copy-on-write), but if it is the latter two of those three, then changes to the slice will not modify the original as well. But I'm not qualified to assert anything else. 😝

std/bigint.d Outdated
body
{
ubyte[] ret;
ret.length = (this.uintLength << 2); // Multiply by 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will do this optimization for you. Don't make the code harder to read without proof that the straightforward code is too slow, or prone to bugs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! I didn't know the compiler did this!

Copy link
Member

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a machine to test version(BigEndian)?

std/bigint.d Outdated
version (BigEndian)
{
// This loop adds one to an arbitrary length array of bytes.
for (size_t i = value.length - 1; i < 0u; i--)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be foreach_reverse (0u ... value.length - 1) or foreach_reverse (e; size_t(0) .. 10) if you want size_t - though I think in this case even int is fine.
Also note that i < 0 can never be true and that IIRC BigEndian isn't tested on any CIs.

std/bigint.d Outdated
else version (LittleEndian)
{
// This loop adds one to an arbitrary length array of bytes.
for (size_t i = 0u; i < value.length; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • foreach (e; 0 .. value.length)
  • foreach (e; size_t(0) .. value.length)
  • foreach (i, ref e; value) <- the best

std/bigint.d Outdated
ubyte[] toBytes() const pure @system @property nothrow
in
{
assert(this.uintLength > 0u);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a message

std/bigint.d Outdated
}
out (ret)
{
assert(ret.length > 0u);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a message


/**
Converts the `BigInt` to a sequence of bytes that represents the
native-endian representation of that number.
Copy link
Member

@wilzbach wilzbach Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs "Returns:"

edit: has been addressed.

std/bigint.d Outdated
ubyte[] ret = this.data.toBytes;
if (this.sign)
{
for (size_t u = 0u; u < ret.length; u++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach

{
if (value.length == 0u)
{
value = [ 0x01u ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncovered

@JonathanWilbur
Copy link
Author

@wilzbach , I do not have a machine for testing on BigEndian. Actually, I assumed that one of the hosts in the D Auto-Tester fleet would be big endian...

@JonathanWilbur
Copy link
Author

Also, I don't think that Circle CI failure is my fault: it says that Dscanner failed. I didn't change anything about Dscanner.

@jmdavis
Copy link
Member

jmdavis commented Apr 10, 2018

Actually, I assumed that one of the hosts in the D Auto-Tester fleet would be big endian...

dmd only supports x86 and x86_64, so it doesn't support Big Endian at all, much as the language does. To do anything with Big Endian and D, you need either ldc or gdc and a Big Endian machine that they support. And since they're maintained separately from dmd, we can't test PRs against them. So, unfortunately, anything we've done with Big Endian in Phobos is done with the hope that it works and that when the changes finally get merged into gdc or ldc, if they're broken for Big Endian, someone will report it. Fortunately, not much code in Phobos cares about endianness.

@wilzbach
Copy link
Member

Also, I don't think that Circle CI failure is my fault: it says that Dscanner failed. I didn't change anything about Dscanner.

Yep, not your fault. The GC is segfaulting spuriously, but so far no one knows why (or has cared to investigate).
The issue is: https://issues.dlang.org/show_bug.cgi?id=18720

@JonathanWilbur
Copy link
Author

So what is the next step? Does somebody else need to review this? Do I need to change something?

@Biotronic
Copy link
Contributor

I still hold that the correct location for most of this code is in std/internal/math/biguintcore.d. It should handle the unsigned stuff, and std/bigint.d should only do the 2's complement when that's necessary.

@JonathanWilbur
Copy link
Author

@Biotronic , I guess you missed it, but I already moved most of it to biguintcore.d. Please review it again.

Copy link
Contributor

@Biotronic Biotronic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JonathanWilbur
Copy link
Author

Just following up on this, @wilzbach (or anybody else), is there a reason this has not been merged yet? Do I need to change anything?

I should also add that the auto-tester appears to run on and off, so it looks like this is still in testing even though it has passed before. There was one strange failure a few days ago, which appears to come from:

make[1]: *** [generated/linux/debug/64/unittest/std/algorithm/searching.o] Killed
make[1]: Leaving directory `/media/ephemeral0/sandbox/at-client/pull-3115292-Linux_64_64/phobos'
make: *** [unittest-debug] Error 2
make: *** Waiting for unfinished jobs....

which I don't think has anything to do with my commit.

@wilzbach
Copy link
Member

Thanks for the ping and sorry for not clarifying this. New symbols need the following:

  • a changelog entry
  • approval from @andralex
  • full ddoc documentation headers

Some CI have unfortunately spurious failures. While we are trying to weed them out, it's not always easy, but you can safely ignore such a failure.

@wilzbach wilzbach added @andralex Approval from Andrei is required Needs Changelog A changelog entry needs to be added to /changelog labels Apr 15, 2018
@wilzbach wilzbach removed the Needs Changelog A changelog entry needs to be added to /changelog label Apr 15, 2018
@wilzbach
Copy link
Member

BTW at the end we need to squash all commits into one, so I recommend to use rebases instead of merge commits as then the final squashing is easier (though it's not a blocking issue if you have troubles with this. We can help you out with this if you need help and force-push over your branch at the end though I'm this case you used the master branch, so you might want to make sure that your punycode commit is safely stored in another branch)

Returns: a `ubyte[]` array, representing the native-endian
representation of that number.
*/
T opCast(T : ubyte[])() pure nothrow @system const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit wary of an opCast that allocates. Can someone else comment on whether that's normal for Phobos?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for opCast in Phobos yields many results, but none of them seem to be allocating.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused. What is the alternative? This opCast() override just calls the toBytes() accessor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to not have the opCast at all, and force the programmer to call toBytes() to get the byte representation. The advantage of this is the allocation is more explicit, while cast(byte[])myBigint may look like it's not doing anything scary - after all, for some possible implementations, it would simply return a different view of the same data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I've never actually thought about that, but that makes sense! I will remove the opCast override then.

Returns: the native-endian unsigned representation of the `BigInt` in
the form of a `ubyte[]` array.
*/
ubyte[] toBytes() const pure @system @property nothrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of allocating a new buffer would it be feasible to simply return a slice of this.data? Is DIP1000's scope qualifier for function return values useful for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. toBytes is 2s complement, while BigInt uses signed magnitude.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For positive numbers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that the internal array of uints that is this.data could be cast to a ubyte array if it is positive, and a slice returned, but I will have to test that. But yeah, when it's negative, that goes out the door, as Biotronic said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@andralex Approval from Andrei is required stalled
Projects
None yet
6 participants