New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added toBytes property to std.bigint.BigInt #6437
Conversation
A Punycode encoder and decoder based on the original implementation in RFC 3492. I am submitting this to the D Standard Library (Phobos) because because I believe it is a suitable candidate for a standard library module, on these grounds: 1.) It is critical to Uniform Resource Identifiers (URIs), which are ubiquitous, and are themselves critical for many programs. 2.) Phobos already has a module for Uniform Resource Identifiers: std.uri, yet no functionality for Punycode. 3.) It is critical to the Domain Name System (DNS), which is also ubiquitous, and itself critical for many programs. 4.) There are probably a few other ways that nobody has thought of for encoding and decoding Punycode, but only one way is specified clearly as an example implementation in the original RFC that specifies Punycode. This module, is based upon the original suggested implementation in RFC 3492, and there is little--if any--reason why a developer would prefer an alternative implemen
Thanks for your pull request and interest in making D better, @JonathanWilbur! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + phobos#6437" |
That punycode commit you see there was from a previous pull request. I removed it in a subsequent commit. |
Oh, also, the main rationale--the one I suspect everybody has for wanting this property--is the ability to convert large integers to arrays of bytes for ASN.1 encoding in RSA data structures. That's why I made it, FWIW. |
Have you looked at
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a style review. I suggest changing the for
loops to foreach
equivalent. Maybe an Appender could be used too. Also use do
instead of body
.
std/bigint.d
Outdated
} | ||
body | ||
{ | ||
ubyte[] ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an appender!(ubyte[])
i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I subsequently set the length of the array to the length that will be needed, then dump the uint
s in that array. How would Appender
improve that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, forget the appender thing.
This is just used by `toBytes` to convert unsigned bytes of the | ||
BigInt to two's complement form. | ||
*/ | ||
private static void incrementBytes (ref ubyte[] value) nothrow pure @safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should take an output range of ubyte (confer with other comment proposing to use an Appender)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify? What should the signature of this method be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a template constraint taking only an output range of ubyte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like private static void incrementBytes(Range)(Range result, ubyte[] value) if (isOutputRange!Range)
? You still need to pass the original data in there somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Let's forget the outrange thing in this review and concentrate about the style.
std/bigint.d
Outdated
version (BigEndian) | ||
{ | ||
// This loop adds one to an arbitrary length array of bytes. | ||
for (size_t i = (value.length - 1); i < 0u; i--) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: there's no ambiguity with precedence so value.length-1 can live without parens.
std/bigint.d
Outdated
value[i] = 0x00u; | ||
|
||
// If the array of bytes is maxed out, append a next byte set to one. | ||
if (i == (value.length - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: there's no ambiguity with precedence so value.length-1 can live without parens.
std/bigint.d
Outdated
{ | ||
assert(ret.length > 0u); | ||
} | ||
body |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use do
instead of body
. The latter will be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this comments stands for all body
in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I must have missed the memo.
std/bigint.d
Outdated
size_t startOfNonPadding = 0u; | ||
if (this >= 0) | ||
{ | ||
for (size_t i = 0u; i < (ret.length - 1); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: there's no ambiguity with precedence so ret.length-1 can live without parens.
std/bigint.d
Outdated
} | ||
else | ||
{ | ||
for (size_t i = 0u; i < (ret.length - 1); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: there's no ambiguity with precedence so ret.length-1 can live without parens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way foreach(i; 0.. N)
tends to be used more nowadays in D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good tip; definitely a cleaner way to do read-only loops. However, that won't work when you have to set the values in an array, because i
in your example would be passed in by value, so setting i
would not affect the original array from which the slice was taken. (And indeed, I have to do that in my code, particularly when converting an array of bytes to two's complement form.)
I don't know how slicing works (base+bound vs. big dumb copy vs. copy-on-write), but if it is the latter two of those three, then changes to the slice will not modify the original as well. But I'm not qualified to assert anything else. 😝
std/bigint.d
Outdated
body | ||
{ | ||
ubyte[] ret; | ||
ret.length = (this.uintLength << 2); // Multiply by 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will do this optimization for you. Don't make the code harder to read without proof that the straightforward code is too slow, or prone to bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I didn't know the compiler did this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a machine to test version(BigEndian)
?
std/bigint.d
Outdated
version (BigEndian) | ||
{ | ||
// This loop adds one to an arbitrary length array of bytes. | ||
for (size_t i = value.length - 1; i < 0u; i--) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be foreach_reverse (0u ... value.length - 1)
or foreach_reverse (e; size_t(0) .. 10)
if you want size_t
- though I think in this case even int
is fine.
Also note that i < 0
can never be true and that IIRC BigEndian
isn't tested on any CIs.
std/bigint.d
Outdated
else version (LittleEndian) | ||
{ | ||
// This loop adds one to an arbitrary length array of bytes. | ||
for (size_t i = 0u; i < value.length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach (e; 0 .. value.length)
foreach (e; size_t(0) .. value.length)
foreach (i, ref e; value)
<- the best
std/bigint.d
Outdated
ubyte[] toBytes() const pure @system @property nothrow | ||
in | ||
{ | ||
assert(this.uintLength > 0u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a message
std/bigint.d
Outdated
} | ||
out (ret) | ||
{ | ||
assert(ret.length > 0u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a message
|
||
/** | ||
Converts the `BigInt` to a sequence of bytes that represents the | ||
native-endian representation of that number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs "Returns:"
edit: has been addressed.
std/bigint.d
Outdated
ubyte[] ret = this.data.toBytes; | ||
if (this.sign) | ||
{ | ||
for (size_t u = 0u; u < ret.length; u++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
foreach
{ | ||
if (value.length == 0u) | ||
{ | ||
value = [ 0x01u ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncovered
@wilzbach , I do not have a machine for testing on |
Also, I don't think that Circle CI failure is my fault: it says that Dscanner failed. I didn't change anything about Dscanner. |
dmd only supports x86 and x86_64, so it doesn't support Big Endian at all, much as the language does. To do anything with Big Endian and D, you need either ldc or gdc and a Big Endian machine that they support. And since they're maintained separately from dmd, we can't test PRs against them. So, unfortunately, anything we've done with Big Endian in Phobos is done with the hope that it works and that when the changes finally get merged into gdc or ldc, if they're broken for Big Endian, someone will report it. Fortunately, not much code in Phobos cares about endianness. |
Yep, not your fault. The GC is segfaulting spuriously, but so far no one knows why (or has cared to investigate). |
So what is the next step? Does somebody else need to review this? Do I need to change something? |
I still hold that the correct location for most of this code is in std/internal/math/biguintcore.d. It should handle the unsigned stuff, and std/bigint.d should only do the 2's complement when that's necessary. |
@Biotronic , I guess you missed it, but I already moved most of it to biguintcore.d. Please review it again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just following up on this, @wilzbach (or anybody else), is there a reason this has not been merged yet? Do I need to change anything? I should also add that the auto-tester appears to run on and off, so it looks like this is still in testing even though it has passed before. There was one strange failure a few days ago, which appears to come from:
which I don't think has anything to do with my commit. |
Thanks for the ping and sorry for not clarifying this. New symbols need the following:
Some CI have unfortunately spurious failures. While we are trying to weed them out, it's not always easy, but you can safely ignore such a failure. |
BTW at the end we need to squash all commits into one, so I recommend to use rebases instead of merge commits as then the final squashing is easier (though it's not a blocking issue if you have troubles with this. We can help you out with this if you need help and force-push over your branch at the end though I'm this case you used the master branch, so you might want to make sure that your punycode commit is safely stored in another branch) |
Returns: a `ubyte[]` array, representing the native-endian | ||
representation of that number. | ||
*/ | ||
T opCast(T : ubyte[])() pure nothrow @system const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit wary of an opCast
that allocates. Can someone else comment on whether that's normal for Phobos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Searching for opCast
in Phobos yields many results, but none of them seem to be allocating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little confused. What is the alternative? This opCast()
override just calls the toBytes()
accessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to not have the opCast
at all, and force the programmer to call toBytes()
to get the byte representation. The advantage of this is the allocation is more explicit, while cast(byte[])myBigint
may look like it's not doing anything scary - after all, for some possible implementations, it would simply return a different view of the same data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I've never actually thought about that, but that makes sense! I will remove the opCast
override then.
Returns: the native-endian unsigned representation of the `BigInt` in | ||
the form of a `ubyte[]` array. | ||
*/ | ||
ubyte[] toBytes() const pure @system @property nothrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of allocating a new buffer would it be feasible to simply return a slice of this.data
? Is DIP1000's scope
qualifier for function return values useful for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. toBytes
is 2s complement, while BigInt
uses signed magnitude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For positive numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that the internal array of uint
s that is this.data
could be cast to a ubyte
array if it is positive, and a slice returned, but I will have to test that. But yeah, when it's negative, that goes out the door, as Biotronic said.
Per issue 13804, I created a
toBytes
property forstd.bigint.BigInt
. This generates an arbitrarily-long array of unsigned bytes that represents the signed, native-endian binary representation of theBigInt
.https://issues.dlang.org/show_bug.cgi?id=13804