-
-
Notifications
You must be signed in to change notification settings - Fork 741
Improve speed of to!string for integral types #1452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I like this. If I'm not mistaken, it looks like by generating radix specic code, the divisions run a lot faster, making the overal code faster, yet the algorithm remains unchanged. Power to us. IMO, the big "if/else if/else if" block could be better re-written as a switch? Also, it seems like your algorithm is basically: do
{
switch(radix)
{}
}while (value) Wouldn't it work even better if you A further (minor) advantage is that you could get a better sized buffer for each case (values not double checked):
Also, for the That's my review for the algorithm. Feel free to disagree. In any case, it appears to work better than before anyways, so it's already a good improvement as-is. In regards to the code itself, I see a lot of repetition in each case Minor: Your editor is leaving trailing spaces as indentation on empty lines. Phobos doesn't keep trailing spaces. Could you remove them? |
if (value < 0) | ||
{ | ||
if (radix == 10) | ||
return "-" ~ toImpl!(T)(-cast(long)value, radix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't your code, but I think this cast is wrong, as it involves integral promotion. The correct code would be:
toImpl!(T)(unsigned(value), radix);
The difference is this:
byte b = -3;
writeln(to!string(b, 16))
This prints FFFFFFFFFFFFFFFD
, which is the long representation of my argument. It really should have only printed FD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with you.
@monarchdodra I make some changes, which reflects your comments. |
How does it impact your timings? Are they still good? I know I suggested mixin, but I'm wondering if we can't make due without? They are usually a sign of bad design. Can't we just dispatch to a specialized parameterized sub-function? That'd be the logical thing to do. BTW, with your new design, if I'm thinking: //Static cases here to create buffer.
case 10:
toStringRadixConvert!(S, 10)(value, buffer[]);
break; ... Or just duplicate a bit of code :D Sometimes, it's the best thing to do ;) |
What I remember my timings has been little better with yours proposal. this version with optimalization flags this version without optimalization flags original version with optimalization flags original version without optimalization flags actual version with optimalization flags actual version without optimalization flags |
@monarchdodra "I know I suggested mixin, but I'm wondering if we can't make due without? They are usually a sign of bad design. Can't we just dispatch to a specialized parameterized sub-function?" Good idea 👍 |
caseHexDigits = lowerHexDigits; | ||
} | ||
if (radix == 10) | ||
return "-" ~ toImpl!(T)(-cast(long)value, radix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me (the concatenation operators in Phobos code is always a red flag...). With this, aren't we paying the heap allocation cost twice instead of just once, producing a small piece of short-lived GC garbage in the process? Surely there's a reasonably easy way to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't mentioned it yet, but these kinds of tricks always make me nervous, because of edge cases. For example, it'll fail to correctly print long.min
, as long.min == -long.min
. This should (seems) to work though:
return "-" ~ toImpl!(T)(unsigned(cast(T)-value), radix);
This has the double advantage of handling T.min
cases, and it doesn't trigger promotion either. The "cast" is to un-promote types that are smaller than int. AFAIK, there is 0 runtime overhead.
I don't see any other edge cases... Do you?
But yeah, the ~
is guaranteed relocation :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JakobOvrum
Yes, the concatenation is a bad idea, I has been rewrite it.
That looks really really very good! Well done! I don't think I have anything more to add. I'm just wondering if it might be better to make Also, now that you added new code, rather than "re-use" format, could you add some unittests? In particular, to test the above mentioned issues. EG: |
|
||
char baseChar = letterCase == LetterCase.lower ? 'a' : 'A'; | ||
EEType[] buffer = void; | ||
char mod = void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod
can be placed directly in toStringRadixConvert
, which should help with locality.
As for buffer
, arguably, you could just have it returned by toStringRadixConvert
, and have your code do return toStringRadixConvert!(S.sizeof * 3, 10);
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monarchdodra done :)
@monarchdodra "I'm just wondering if it might be better to make toStringRadixConvert a full standalone function?" |
Looks good to me. I'll still leave it open for review for a little bit, in case somebody else has anything to say. |
Improve speed of to!string for integral types
Update Phobos for druntime PR #1452 core.bitop changes
string a;
auto A = Clock.currStdTime();
for(int i = 0; i < 1_000_000; ++i)
{
a = to!string(i);
}
auto B = Clock.currStdTime() - A;
writeln(a);
writefln("%s", convert!("hnsecs", "msecs")(B)/1000.0);
this version with optimalization flags
gdc: 0.142
ldmd2: 0.127
dmd2: 0.214
this version without optimalization flags
gdc: 0.407
ldmd2: 0.366
dmd2: 0.248
original version with optimalization flags
gdc: 0.445
ldmd2: 0.438
dmd2: 0.595
original version without optimalization flags
gdc: 0.875
ldmd2: 0.925
dmd2: 0.751