Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cartesianProduct improvement #2276

Merged
merged 3 commits into from
Jul 10, 2014
Merged

Conversation

quickfur
Copy link
Member

Partial fix for issues:
https://issues.dlang.org/show_bug.cgi?id=9878
https://issues.dlang.org/show_bug.cgi?id=10693
https://issues.dlang.org/show_bug.cgi?id=10779
https://issues.dlang.org/show_bug.cgi?id=12957

The full fix is a little complex, so I thought at least the partial fix should go in first. Basically, this PR includes two fixes: standardize on lexicographic output order (for now), and for finite forward ranges, use a much less template-heavy implementation that doesn't involve an exponential number of template instantiations. For infinite ranges and non-forward ranges, we fall back to the old implementation, but I think the finite forward range case is probably the most common, and benefits the most from this fix, so it should be merged now rather than later.

@bearophile
Copy link

For infinite ranges and non-forward ranges, we fall back to the old implementation

So the output order changes according to the kind of range you are giving to this function? If this true, then it looks a little dangerous.

I'd also like some way to perform the optional "repeat" argument of the Python version of this function (named just product), is this possible, even just as template argument? (If it's possible is it to be asked for in a new ehancement request?)
https://docs.python.org/2/library/itertools.html#itertools.product

@quickfur
Copy link
Member Author

Once you have a non-forward input range or an infinite range as one of the arguments, you can no longer dictate what order the elements of the product should be returned in, because there is only a very limited number of orders that are actually implementable. For example, demanding that the elements of the product return tuples where elements of the input range are permuted, is impossible to satisfy, since you can't move backwards in an input range. Similarly, if one or more argument ranges are infinite, then certain element orderings may not be possible, because in order to cover all possible elements of the infinite product, you must traverse it in a limited number of ways. Requiring only forward ranges restricts the possible return orderings even further, so generally, allowing the user to choose the return order only needlessly complicates the implementation, and most of the time you have to reject the request order anyway since it's impossible to satisfy.

The most common use case of the cartesian product, however, involves finite forward ranges. This is the case targeted by this PR, giving them an efficient implementation that (for now) returns elements in the expected lexicographic ordering. For this reason, I also folded in a previous PR that changes the 2-argument case, so that for the finite forward range case, it will return the elements in the same lexicographic ordering. Allowing the user to specify other orderings in this case is possible, since this is the most flexible case, but that will be more complex to implement, and can be done later.

For now, I think the most pressing issue is that the current implementation works poorly in the common, finite forward range case. The exponential number of templates used by the current variadic cartesianProduct makes it impossible to use more than a small number of arguments before the compiler runs out of memory at compile-time. This PR at least addresses this most common use case, so that most people will find cartesianProduct usable, then we can focus on improving the more obscure cases later (infinite ranges and non-forward input ranges).

@quickfur
Copy link
Member Author

As for the "repeat" argument in Python, I believe you have an open enhancement request for it. I think it should be relatively easy to add once this PR is checked in. I'd prefer to implement it separately.

@bearophile
Copy link

Thank you for the answers, and for the work.

@bearophile
Copy link

In theory cartesianProduct should be pure nothrow @safe @nogc if the inputs are the same.

@quickfur
Copy link
Member Author

quickfur commented Jul 2, 2014

Attributes for template functions (including methods of templated types) are inferred by the compiler, so generally I wouldn't bother annotating them. Is something broken here that makes it non-pure, throwing, etc., when it shouldn't be?

@bearophile
Copy link

It's right to not annotate template functions, because the attributes are inferred. But I think it's also a very good idea to add to Phobos an unit test like this, that makes sure all attributes are inferred correctly in a simple use case:

unittest {
    void foo() pure nothrow @safe @nogc {
        immutable int[3] arr1 = [1, 2, 3];
        immutable int[2] arr2 = [1, 2];
        foreach (immutable t; cartesianProduct(arr1[], arr2[], arr1[])) {}
    }
}

@quickfur
Copy link
Member Author

quickfur commented Jul 2, 2014

Rebased to prevent compile failure with latest changes in dmd.

Added pure @safe nothrow @nogc to unittest to ensure cartesianProduct does not inadvertently introduce impurity, unsafety, hidden GC allocations, etc..

Also melded some commits together to make cleaner history.

@quickfur
Copy link
Member Author

quickfur commented Jul 7, 2014

Anybody else reviewing this PR? @andralex ? @monarchdodra ? @jmdavis ?

@WalterBright
Copy link
Member

How are we looking on unittest coverage for the new code? I.e.:

dmd -main -unittest -cov std/algorithm

@quickfur
Copy link
Member Author

Fully covered, from what I can tell (there are no 00000 lines in the .lst file inside the cartesianProduct functions).

@WalterBright
Copy link
Member

awesome!

@WalterBright
Copy link
Member

Auto-merge toggled on

WalterBright added a commit that referenced this pull request Jul 10, 2014
@WalterBright WalterBright merged commit bcceb54 into dlang:master Jul 10, 2014
@bearophile
Copy link

Wasn't this covered?
https://issues.dlang.org/show_bug.cgi?id=13091

@quickfur
Copy link
Member Author

Unfortunately, no, because the 2-argument case still uses the original deeply-templated implementation. I'm wondering if I should do a followup PR that replaces that with the new implementation as well, when no input ranges or infinite ranges are involved.

@bearophile
Copy link

I have found a case that I don't understand:

void main() {
    import std.typecons: Tuple, Nullable;
    import std.algorithm: cartesianProduct;
    string[] a;
    const Nullable!(Tuple!(string[])) b;
    //foreach (ab; cartesianProduct(a, b[0])) {} // fails
    //foreach (ab; cartesianProduct(a, b.get[0])) {} // fails
    const string[] c = b.get[0]; // OK
    foreach (ab; cartesianProduct(a, c)) {} // OK
}

@quickfur
Copy link
Member Author

pragma(msg, typeof(b[0])) reveals that its type is const(immutable(char)[][]), which is not a forward range because const arrays are not ranges (you can't modify them, so popFront doesn't work).
Ditto with b.get[0].

@quickfur
Copy link
Member Author

P.S. Hmph, apparently c also has that same type. So looks like the compiler has a hack to make iterating const arrays possible, but for whatever reason, when returned from wrapper types like Nullable, the hack didn't work. Probably file a compiler bug?

@bearophile
Copy link

Once I have understood the situation I'll open another bug report. But has cartesianProduct internal logic to convert const(int[]) to a const(int)[]?

@bearophile
Copy link

@quickfur quickfur deleted the cprod_improve1 branch July 16, 2014 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants