Skip to content

Conversation

monarchdodra
Copy link
Collaborator

...when it compiles enum containing many items

http://d.puremagic.com/issues/show_bug.cgi?id=11009

This is a fixup of pull #1540.

Turns out NoDuplicates is scary inefficient for large input.

This cleans up and simplifies the pull somewhat.

@ghost ghost self-assigned this Sep 11, 2013
@monarchdodra
Copy link
Collaborator Author

Assigning to @AndrejMitrovic

@Orvid
Copy link
Contributor

Orvid commented Sep 11, 2013

Any chance you have a large enum that I could use to test another version of the NoDuplicates template? It would be a bit slower, but should be quite a bit easier on the memory usage, due to the fact only 1 template would be instantiated, rather than 4-5 for every enum member. Also, any suggestion the number of elements at which I should switch between the faster, more memory intensive implementation, and the slower, less memory intensive implementation?

@monarchdodra
Copy link
Collaborator Author

It would be a bit slower, but should be quite a bit easier on the memory usage, due to the fact only 1 template would be instantiated, rather than 4-5 for every enum member. Also, any suggestion the number of elements at which I should switch between the faster, more memory intensive implementation, and the slower, less memory intensive implementation?

I'm unsure, you'd have to weight time/memory for both implementations, but it's kind of hard, since we are talking about a compile time function.

Any chance you have a large enum that I could use to test another version of the NoDuplicates template?

You can try this one mixin(format("enum IDs\n{\n%( ID%03s,\n%|%)}", iota(0, 200)));

It has no actual duplicates, but that shouldn't be a problem, since it needs to be processed fast too.

@Orvid
Copy link
Contributor

Orvid commented Sep 11, 2013

Well, so far I have it so that, provided there are no duplicate values, you can have as many values as EnumMembers allows (dmd actually dies thinking it's a recursive template instantiation if the enum has more than 496 members), but for enums with multiple values that are the same, I'm still trying to work through the memory issues. (I'm using DPaste #A3BCD5BB as my testbed)

@ghost
Copy link

ghost commented Sep 11, 2013

It seems like most of those traits that end up using GenericEraseAll are memory intensive.

Btw, you've changed the switch into a series of if statements. This now improves memory usage at compile-time, but what happens with performance at runtime? I hope the compiler is smart enough to optimize this and turn it into a switch table in the optimizer.

@JakobOvrum
Copy link
Contributor

I wrote a NoDuplicates specialized for expression tuples with a common element type (which is the case when used on the result of EnumMembers) that uses a CTFE function supported by an AA to do the actual operation (then toTypeTuple at the result to turn it back into a tuple), and with some preliminary testing it looks to have performance benefits, at least for certain cases. I'll do more testing once I have time, but in the meantime, here's the code:

import std.traits : CommonType;

private CommonType!values[] auto noDuplicates(values...)()
{
    auto asArray = [values]; // relies on values having a common type
    alias Key = typeof(asArray[0]);

    bool[Key] foundSet; // In lieu of a hash set
    Key[] result; // foundSet.keys not used because it does not preserve order

    foreach(ref value; asArray)
    {
        if(value !in foundSet)
        {
            foundSet[value] = true;
            result ~= value;
        }
    }

    return result;
}

// TODO: does not account for invalid AA keys. What are the current rules
// on key types?
template NoDuplicates(values...) if(!is(CommonType!values == void))
{
    alias NoDuplicates = toTypeTuple!(noDuplicates!values());
}

edit:

Oh, and I was testing on 2.063 and encountered a "recursive alias" error when noDuplicates was nested in NoDuplicates, so that's why there are two templates.

edit2:

The auto after the return type of noDuplicates is because the proper return type was commented out for a while; it's not supposed to be there. That's also why Key isn't simply typeof(return), which it should be.

@monarchdodra
Copy link
Collaborator Author

It seems like most of those traits that end up using GenericEraseAll are memory intensive.

Btw, you've changed the switch into a series of if statements. This now improves memory usage at compile-time, but what happens with performance at runtime? I hope the compiler is smart enough to optimize this and turn it into a switch table in the optimizer.

Well, the implementation is now practically copied verbatim from formatValue() if (is(T == enum)). So at worst, it is just as fast as before, but also non-allocating for wstring and dstring.

Yes, GenericEraseAll is very memory intensive.

That said, I think I can bring the switch back into the equation. Instead of doing an "eager" NoDuplicate, we could just test in each loop. This would be MUCH less compiler intensive, since we don't actually need a Duplicate-less list. We just want to skip the duplicates. EG:

        switch (value)
        foreach (I, member; EnumMembers!S)
        {
            static if (staticIndexOf!(member, EnumMembers!S[0 .. I] == -1)
            {
                case member:
                ...

I think staticIndexOf is "Log(N)". If not, it can pretty easilly be made to be so.

In any case, sleep now, tests later.

@JakobOvrum
Copy link
Contributor

staticIndexOf is O(n) and your example code is a textbook example of how an O(n^2) algorithm is born, as far as I can see.

@monarchdodra
Copy link
Collaborator Author

staticIndexOf is O(n) and your example code is a textbook example of how an O(n^2) algorithm is born, as far as I can see.

It can be made O(log(N)) depth (which is mostly what counts in meta programming).

@JakobOvrum
Copy link
Contributor

Yes, that will help memory use, but it still leaves processing time as an O(n^2). That should never be necessary.

edit:

Remember that the number of iterations here is directly linked to the number of instantiations so it strongly affects memory use as well.

@monarchdodra
Copy link
Collaborator Author

Yes, that will help memory use, but it still leaves processing time as an O(n^2). That should never be necessary.

Well, O(n^2) operations, which may or may not end up being o(n^2) time. We'd have to test, the compiler can surprise us sometimes.

Another simple solution is to simply insert a test if the EnumMembers has less than, say, 20 elements, and only do the switch in that case. If not, we just go "toStr" straight away.

@ghost
Copy link

ghost commented Sep 11, 2013

The compiler could even be smart and avoid generating duplicate case errors if the statements for those cases are also equal. But that's pushing it. :)

…ry when it compiles enum containing many items

This is a fixup of pull #1540.

Turns out NoDuplicates is scary inefficient for large input.

This cleans up and simplifies the pull somewhat.
@monarchdodra
Copy link
Collaborator Author

Alright. I update a new version. Observations:

  1. I was wrong. A straight up NoDuplicates is faster.
  2. By placing a simple "upper bound", we can have an efficient switch for smaller enums, while still being able to handle the larger enum inputs.
  3. Given we now have a "fallback" functionality, I oppened up support for non-switchable types. This means that even double based enums (for example), can benefit from a non-allocating to!string.
  4. In the case that nothing matches, instead of "just" calling formatValue, which would duplicate the entire body of the switch, I simply re-coded the very last part of the formatting: Copying the word "cast", and formating the value.

I think this "fix" is a good compromise, that retains the original intent of the change. @AndrejMitrovic : How do you feel about this version. Should I change anything. If there is anything you don't like, do tell me. Or if you have a better idea, you can submit your own version. I'd review.

@ghost
Copy link

ghost commented Sep 12, 2013

Looks good, although 50 seems kind of a low number. But it doesn't matter, we can easily tweak this later once we improve the NoDuplicates implementation. I'll wait for green and then pull.

ghost pushed a commit that referenced this pull request Sep 12, 2013
fix Issue 11009 - Regression (2.064 git-head): DMD consumes huge memory ...
@ghost ghost merged commit cf88224 into dlang:master Sep 12, 2013
@ghost
Copy link

ghost commented Sep 12, 2013

@monarchdodra: Thanks for these fixups!

@monarchdodra
Copy link
Collaborator Author

Looks good, although 50 seems kind of a low number.

The cost for exactly 50 is +.5 seconds. It seems a little expensive to me. We'd have to bench the run-time to see if there is even any difference anyways...

Regardless, the "true" optimization which you initially created is to make it non-allocating, and this is preserved regardless of what we decide the bounds are, so I'm not worrying about it none too much.

BTW, whilst we are improving "To", I have two open pulls:
#1570 and #1491. Care to review?

@monarchdodra monarchdodra deleted the fix11009 branch October 25, 2013 06:45
@ghost ghost removed their assignment Apr 28, 2015
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants