Skip to content

Conversation

Kozzi11
Copy link
Contributor

@Kozzi11 Kozzi11 commented Aug 5, 2013

string a;
auto A = Clock.currStdTime();

for(int i = 0; i < 1_000_000; ++i)
{
a = to!string(i);
}

auto B = Clock.currStdTime() - A;
writeln(a);
writefln("%s", convert!("hnsecs", "msecs")(B)/1000.0);

this version with optimalization flags
gdc: 0.142
ldmd2: 0.127
dmd2: 0.214

this version without optimalization flags
gdc: 0.407
ldmd2: 0.366
dmd2: 0.248

original version with optimalization flags
gdc: 0.445
ldmd2: 0.438
dmd2: 0.595

original version without optimalization flags
gdc: 0.875
ldmd2: 0.925
dmd2: 0.751

@monarchdodra
Copy link
Collaborator

I like this. If I'm not mistaken, it looks like by generating radix specic code, the divisions run a lot faster, making the overal code faster, yet the algorithm remains unchanged. Power to us.

IMO, the big "if/else if/else if" block could be better re-written as a switch? Also, it seems like your algorithm is basically:

do
{
    switch(radix)
    {}
}while (value)

Wouldn't it work even better if you switch once on radix, and then loop inside the cases? It avoids running the switch every time. Also, it may be worth re-organizing the cases in the order 10,16 then (2 and 8) or (8 and 2), since that's supposed to be the most common cases? (the compiler, according to spec, makes the assumptions that the switch cases are organized by frequency). As for if else, they are explicitly run in order.

A further (minor) advantage is that you could get a better sized buffer for each case (values not double checked):

  • 2 : S.sizeof * 8;
  • 8 : S.sizeof * 3;
  • 10 : S.sizeof * 3;
  • 16 : S.sizeof * 2;
  • default (S.sizeof * 6)
    Also, you can void initialize that buffer too.

Also, for the radix == 16 case, you could even benefit of checking letterCase just once, before starting your loop.

That's my review for the algorithm. Feel free to disagree. In any case, it appears to work better than before anyways, so it's already a good improvement as-is.


In regards to the code itself, I see a lot of repetition in each case 2/8/10 cases. I understand this is the root of the algorithm, but maybe this would be better implemented with a mixin template? Eg mixin template toStrConvert(uint radix) You'd write the code once, but you'd still benefit from a compile time known radix. This would be especially relevant if you move the loop inside the case, which would mean adding more code.

Minor: Your editor is leaving trailing spaces as indentation on empty lines. Phobos doesn't keep trailing spaces. Could you remove them?

if (value < 0)
{
if (radix == 10)
return "-" ~ toImpl!(T)(-cast(long)value, radix);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't your code, but I think this cast is wrong, as it involves integral promotion. The correct code would be:

toImpl!(T)(unsigned(value), radix);

The difference is this:

   byte b = -3;
   writeln(to!string(b, 16))

This prints FFFFFFFFFFFFFFFD, which is the long representation of my argument. It really should have only printed FD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with you.

@Kozzi11
Copy link
Contributor Author

Kozzi11 commented Aug 5, 2013

@monarchdodra I make some changes, which reflects your comments.

@monarchdodra
Copy link
Collaborator

I make some changes, which reflects your comments.

How does it impact your timings? Are they still good?

I know I suggested mixin, but I'm wondering if we can't make due without? They are usually a sign of bad design. Can't we just dispatch to a specialized parameterized sub-function? That'd be the logical thing to do. BTW, with your new design, if T is not a string, you'll allocated twice: First, from buffer to string, and second, from string to whatever new string type. It might be better to simply allocate a the buffer outside the switch.

I'm thinking:

//Static cases here to create buffer.

        case 10:
            toStringRadixConvert!(S, 10)(value, buffer[]);
            break;

... Or just duplicate a bit of code :D Sometimes, it's the best thing to do ;)

@Kozzi11
Copy link
Contributor Author

Kozzi11 commented Aug 5, 2013

What I remember my timings has been little better with yours proposal.
Here are new timings on my other computer

this version with optimalization flags
ldmd2: 0.128
dmd2: 0.155

this version without optimalization flags
ldmd2: 0.361
dmd2: 0.255

original version with optimalization flags
ldmd2: 0.311
dmd2: 0.394

original version without optimalization flags
ldmd2: 0.679
dmd2: 0.514

actual version with optimalization flags
ldmd2: 0.129
dmd2: 0.146

actual version without optimalization flags
ldmd2: 0.170
dmd2: 0.215

@Kozzi11
Copy link
Contributor Author

Kozzi11 commented Aug 6, 2013

@monarchdodra "I know I suggested mixin, but I'm wondering if we can't make due without? They are usually a sign of bad design. Can't we just dispatch to a specialized parameterized sub-function?"

Good idea 👍

caseHexDigits = lowerHexDigits;
}
if (radix == 10)
return "-" ~ toImpl!(T)(-cast(long)value, radix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me (the concatenation operators in Phobos code is always a red flag...). With this, aren't we paying the heap allocation cost twice instead of just once, producing a small piece of short-lived GC garbage in the process? Surely there's a reasonably easy way to avoid this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't mentioned it yet, but these kinds of tricks always make me nervous, because of edge cases. For example, it'll fail to correctly print long.min, as long.min == -long.min. This should (seems) to work though:

return "-" ~ toImpl!(T)(unsigned(cast(T)-value), radix);

This has the double advantage of handling T.min cases, and it doesn't trigger promotion either. The "cast" is to un-promote types that are smaller than int. AFAIK, there is 0 runtime overhead.

I don't see any other edge cases... Do you?

But yeah, the ~ is guaranteed relocation :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JakobOvrum
Yes, the concatenation is a bad idea, I has been rewrite it.

@monarchdodra
Copy link
Collaborator

That looks really really very good! Well done!

I don't think I have anything more to add. I'm just wondering if it might be better to make toStringRadixConvert a full standalone function? It is currently using the context pointer, rather than pass by value, to evaluate its "arguments", which may incur overhead? I guess we'd have to test.

Also, now that you added new code, rather than "re-use" format, could you add some unittests? In particular, to test the above mentioned issues. EG: to!string(cast(byte)-10, 16) and to!string(long.min)


char baseChar = letterCase == LetterCase.lower ? 'a' : 'A';
EEType[] buffer = void;
char mod = void;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mod can be placed directly in toStringRadixConvert, which should help with locality.

As for buffer, arguably, you could just have it returned by toStringRadixConvert, and have your code do return toStringRadixConvert!(S.sizeof * 3, 10);:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@monarchdodra done :)

@Kozzi11
Copy link
Contributor Author

Kozzi11 commented Aug 7, 2013

@monarchdodra "I'm just wondering if it might be better to make toStringRadixConvert a full standalone function?"
I tried it, and it does not have impact on speed

@monarchdodra
Copy link
Collaborator

Looks good to me. I'll still leave it open for review for a little bit, in case somebody else has anything to say.

andralex added a commit that referenced this pull request Aug 12, 2013
Improve speed of to!string for integral types
@andralex andralex merged commit ec62f9d into dlang:master Aug 12, 2013
andralex added a commit that referenced this pull request Jan 14, 2016
Update Phobos for druntime PR #1452 core.bitop changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants