Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a switch table for enum->string conversion, which avoids runtime memory allocation. #1540

Merged
merged 1 commit into from Sep 8, 2013

Conversation

ghost
Copy link

@ghost ghost commented Sep 2, 2013

We'll surely need a good review of this, but I think it's an interesting optimization!

string s1 = to!string(E.foo);
string s2 = to!string(E.foo);
assert(s1 == s2);
assert(s1.ptr is s2.ptr);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where you can see a difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the issues encountered, i'd be reassured if we had tests that were a bit more thourough, and also tested the oposite:

    foreach(S; TypeTuple!(string, wstring, dstring, const(char[]), const(wchar[]), const(dchar[])))
    {
        auto s1 = to!S(E.foo);
        auto s2 = to!S(E.foo);
        assert(s1 == s2);
        assert(s1  is s2);
    }
    foreach(S; TypeTuple!(char[], wchar[], dchar[]))
    {
        auto s1 = to!S(E.foo);
        auto s2 = to!S(E.foo);
        assert(s1 == s2);
        assert(s1 !is s2);
    }

Also, simple array is is enough, no need for .ptr.

@JakobOvrum
Copy link
Member

Case statements can also be generated with static foreach. I'd hate to use string mixins when it's not necessary.

@monarchdodra
Copy link
Collaborator

Tweaking what you did just a little:

    else static if (isSomeString!T && is(S == enum) && canSwitchEnum!S)
    {
        enum rep(size_t I) = toStr!(immutable(T))(EnumMembers!S[I]);

        switch(value)
        {
            foreach (I, member; EnumMembers!S)
            {
                case member:
                    return to!T(rep!I);
                break;
            }

            default:
                return toStr!T(value);
        }
    }

This does it without mixins. Furthermore, it also handles wstring or dstring without allocating.

@monarchdodra
Copy link
Collaborator

I want to add about my above code: Originally, it was "just":

enum rep(size_t I) = toStr!T(EnumMembers!S[I]); //Simple initialization
//Later:
    return rep!I; //simple return

But this hits issue: http://d.puremagic.com/issues/show_bug.cgi?id=10950 , where the compiler doesn't dup for us. So we have to do it for the compiler.

This is triggered with the test case:

enum E {a}
unittest
{
    auto s1 = to!(char[])(E.a);
    auto s2 = to!(char[])(E.a);
    assert(s1.ptr != s2.ptr);
}

@ghost
Copy link
Author

ghost commented Sep 2, 2013

Case statements can also be generated with static foreach.

That's news to me!

@ghost
Copy link
Author

ghost commented Sep 2, 2013

@monarchdodra:

With your code this fails:

assert(s1.ptr is s2.ptr);

So it's still allocating.

Edit: The full example:

unittest
{
    enum E
    {
        foo,
        bar
    }

    string s1 = to!string(E.foo);
    string s2 = to!string(E.foo);
    assert(s1 == s2);
    assert(s1.ptr is s2.ptr);
}

@ghost
Copy link
Author

ghost commented Sep 2, 2013

The problem is as soon as you call toStr you're allocating, hence why I didn't call it but used string literals in the string mixin version.

@ghost
Copy link
Author

ghost commented Sep 2, 2013

Btw, the failing test-case seems to be due to a bug, but I can work around it: http://d.puremagic.com/issues/show_bug.cgi?id=10951

@monarchdodra
Copy link
Collaborator

With your code this fails:

Really ??? It's working on my machine :/ ???

The problem is as soon as you call toStr you're allocating, hence why I didn't call it but used string literals in the string mixin version.

I'm storing it in a manifest constant, so it shouldn't be allocating :/

@ghost
Copy link
Author

ghost commented Sep 2, 2013

@monarchdodra: We may have found a new regression. The following passes in 2.063, but fails in git-head:

module test;

import std.array;
import std.conv;
import std.traits;
import std.range;
import std.format;

T toStr(T, S)(S src)
    if (isSomeString!T)
{
    auto w = appender!T();
    FormatSpec!(ElementEncodingType!T) f;
    formatValue(w, src, f);
    return w.data;
}

template rep(T, S, size_t I) { enum rep = toStr!(immutable(T))(EnumMembers!S[I]); }

auto convert(T, S)(S value)
{
    switch (value)
    {
        foreach (I, member; EnumMembers!S)
        {
            case member:
                return to!T(rep!(T, S, I));
            break;
        }

        default:
            return toStr!T(value);
    }
}

void main()
{
    enum E
    {
        foo,
        bar
    }

    string s1 = convert!string(E.foo);
    string s2 = convert!string(E.foo);
    assert(s1 == s2);
    assert(s1.ptr is s2.ptr);
}

Note that I've had to refactor that template a bit since 2.063 doesn't support that new syntax (and nested templates are also fairly recent IIRC).

@ghost
Copy link
Author

ghost commented Sep 2, 2013

I don't understand the failures in the DMD test-runner. :s

@monarchdodra
Copy link
Collaborator

I don't understand the failures in the DMD test-runner. :s

Not you: dmd is failing all platforms.

http://d.puremagic.com/test-results/?projectid=1

@ghost
Copy link
Author

ghost commented Sep 2, 2013

Thanks.

I'm not sure whether that 2.063-2.064 difference is a compiler regression or just a library change.

@monarchdodra
Copy link
Collaborator

Reduced:

module test;

import std.array;

string toStr()
{
    auto w = appender!string();
    w.put('1');
    return w.data;
}

void main()
{
    enum string s = toStr();
    string s1 = s;
    string s2 = s;
    assert(s1 == s2);
    assert(s1.ptr is s2.ptr);
}

That said, I think it's just the extended http://d.puremagic.com/issues/show_bug.cgi?id=10950 (please checkout my latest comment).

Proof it's not an actual regression:

import std.array;

char[] toStr()
{
    auto w = appender!(char[])();
    w.put('1');
    return w.data;
}

void main()
{
    enum char[] s = toStr();
    char[] s1 = s;
    char[] s2 = s;
    assert(s1 == s2);
    assert(s1.ptr !is s2.ptr);
}

This passes on HEAD, but FAILS on 2.063.2. Long story short, an implementation change "flipped" the behavior, but neither is wrong nor correct...

foreach (member; NoDuplicates!(EnumMembers!S))
result ~= format(`case S.%s: return to!T("%s"); `, member, member);
// preserve old non-throwing behavior
result ~= format("default: return toStr!T(value); }");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for format here.

@monarchdodra
Copy link
Collaborator

I am saddened that my solution ended up not working due to a bug. In any case, your solution works, and I didn't see anything blocker.

@monarchdodra
Copy link
Collaborator

I am saddened that my solution ended up not working due to a bug.

I was able to improve the situation to this:

        enum rep(S val) = mixin(format(`"%s"%s`, toStr!string(val), ['c', 'w', 'd'][ElementEncodingType!T.sizeof/2]));

        switch(value)
        {
            foreach (member; EnumMembers!S)
            {
                case member:
                    return to!T(rep!member);
            }
            default:
                return toStr!T(value);
        }

It's horrific to have to jump through such hoops just to declare an enum ctfe string.

@ghost
Copy link
Author

ghost commented Sep 3, 2013

I ended up doing a similar thing to your suggestions, although I've factored out the char literal generation into another template, and also added checks to avoid duplicate case statement generation.

All in all this was a pretty intense adventure. :)

Thanks for all the help @monarchdodra.

{
// generate a switch statement with string literals instead of allocating memory
enum rep(S val) = mixin(format(`"%s"%s`,
toStr!string(val), charLiteralSuffix!(ElementEncodingType!T)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just about ready to merge. May I just request you add a @@@10950@@@ here (or something), in case someone was wondering "WTF he is using format on a string, to mixin a string... just ot get the original string...?"

@ghost
Copy link
Author

ghost commented Sep 6, 2013

Added comment for Issue 10950.

@JakobOvrum
Copy link
Member

LGTM now

@ghost
Copy link
Author

ghost commented Sep 7, 2013

I wonder if this can also be implemented for regular integrals? I'm not sure what the cost/benefit ratio is to use a switch table versus whatever toImpl does now for ints.

But, it does seem to currently allocate:

assert(to!string(1) is to!string(1));  // fails

Seems like we could have a bit of a performance win with a switch table?

@DmitryOlshansky
Copy link
Member

Seems like we could have a bit of a performance win with a switch table?

With 4 billion entries? Gosh.

It could be interesting to use a simple table for the first ~10 or ~100 tough (no need to switch at all btw).
Even better return different slices of some immutable string "0123456789" (which we seem to have in std.ascii) for anything under 10.

@ghost
Copy link
Author

ghost commented Sep 7, 2013

Well yeah I didn't mean an entire int range, chances are you'd run into compiler bugs at that point anyway. :)

monarchdodra added a commit that referenced this pull request Sep 8, 2013
Use a switch table for enum->string conversion, which avoids runtime memory allocation.
@monarchdodra monarchdodra merged commit d39696d into dlang:master Sep 8, 2013
@ghost
Copy link
Author

ghost commented Sep 8, 2013

Thanks for pulling. I wonder if it has any impact on performance in someones code.

@monarchdodra
Copy link
Collaborator

Thanks for pulling. I wonder if it has any impact on performance in someones code.

Individually? Probably not. But these things add up. Incremental improvements FTW.

@monarchdodra
Copy link
Collaborator

This may have had negative actual impacts:

http://d.puremagic.com/issues/show_bug.cgi?id=11009

I think we should bench this, and study its (compile) memory consumption.

@JakobOvrum
Copy link
Member

I suspect it's due to the format and string mixin...

edit:

The amount of template instantiations (see EraseAll) caused by NoDuplicates might be O(n^2), that could also be a killer.

@monarchdodra
Copy link
Collaborator

I suspect it's due to the format and string mixin...

Yeah. That said, instead of using an enum, we could just use an immutable static, as suggested in http://d.puremagic.com/issues/show_bug.cgi?id=10950:

template rep(S val)
{
    static immutable(T) rep = toStr!(immutable(T))(val);
}

This is clean, and idiomatic, and entirelly bypasses issue 10950: A further improvement which could improve the situation would be to move rep outside of the to template, so that it is only instanciated once for each character width (as immutable):

template enumRep(T, S, S val)
{
    static if (is(immutable(T) == T))
        static T enumRep = toStr!T(val);
    else
        alias enumRep = enumRep!(immutable(T), S, val);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants