Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more itertools: cartesianPower, combinations, combinationsRepeat #4026

Closed
wants to merge 2 commits into from

Conversation

wilzbach
Copy link
Member

As you might have realized I started to enjoy D and come from a Python background (sorry about the contribution "spam"), so I missed some functionality from Python's itertools in Phobos.
This adds a couple of useful combinatoric generators. They are all lazy and provide a length.

product: Cartesian product of r with repetitions.

For example "AB".combinationsRepeat(2).arrayreturns["AA", "AB", "BA", "BB"]`.

This is different to cartesianProduct in setopts as it allows repeats and avoids to duplicates the input range.

combinations: All k-combinations of r with repetitions.

For example "AB".combinations(2).arrayreturns["AB"]`.

combinationsRepeat: All k-combinations of r with repetitions.

For example "AB".combinationsRepeat(2).arrayreturns["AA", "AB", "BB"]`.

There is also a version which combines the three generators in one struct with statics ifs if you prefer this.

@wilzbach
Copy link
Member Author

I forgot: I didn't know where to put binomial, actually I was quite surprised it wasn't there. Would std.math be a suitable place?

@wilzbach
Copy link
Member Author

@bbasile thanks for reviewing my code - I will try harder next time to stick to the coding style ;-)

Updates

  • its now in "all allman"
  • tests are in pure nothrow @safe - I replace to!(dchar[]) with a couple of postfix d s instead
  • added unittests for binomial

@quickfur
Copy link
Member

IMO, product is a poor choice of name, especially since we already have cartesianProduct (does product mean a different kind of product?). Maybe cartesianPower might be better? See also: https://issues.dlang.org/show_bug.cgi?id=7128

_max_states = length;
if (_length > 0 && _max_states > 0)
{
_state = new size_t[repeat];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why don't you use the .length array property here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do admit this is very confusing - the number of elements (r.length) is cached (in the documentation I read that it is not guaranteed to be constant) and it is used quite often (in the while loop of popFront.
Moreover max_states computes the all possible states as this makes computations a lot easier and provides the user with a nice .length property.
To avoid this confusion I also renamed lengthComputed to isNrStatesComputed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant _state.length = repeat;, but maybe I miss something here (that why I've added the Q before).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind - better naming doesn't hurt ;-)
@bbasile you didn't miss anything - your comment is absolutely valid and changed it to this makes absolutely sense.
I am still not entirely used to D. Sorry & thanks!

@wilzbach wilzbach force-pushed the itertools branch 2 times, most recently from 02d2575 to 55a2ade Compare March 1, 2016 00:05
@wilzbach
Copy link
Member Author

wilzbach commented Mar 1, 2016

@quickfur

IMO, product is a poor choice of name, especially since we already have cartesianProduct (does
product mean a different kind of product?)

I renamed it to cartesianPower and changed the documentation accordingly.
Moreover I also added the methods to the index header ;-)

@@ -11,11 +11,21 @@ $(T2 cache,
Eagerly evaluates and caches another range's $(D front).)
$(T2 cacheBidirectional,
As above, but also provides $(D back) and $(D popBack).)
$(T2 cartesianPower,
Lazily computes the Cartesian product of $(D r) for a number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the text is confusing, e.g. you want "Lazily computes the Cartesian product of r with itself..." The example should show a power of 3.

Also, I'd like more in terms of motivation. From the few people who need Cartesian products at all, a fraction would need the power of a range, and those could write 'cartesianProduct(r, r, r)`. True, that can't be done if the exponent is a variable. But how often does that happen?

@wilzbach wilzbach force-pushed the itertools branch 5 times, most recently from 8b1cbb6 to bde2d10 Compare March 3, 2016 15:16
@wilzbach
Copy link
Member Author

wilzbach commented Mar 3, 2016

this is public and writable - not good

I also added a wrapper around empty for the existing permutation method.

Moreover I updated the code in regards to all your concerns (@andralex) - sorry about violating the code style.
I will try to be better next time ;-)

@wilzbach
Copy link
Member Author

wilzbach commented Mar 3, 2016

Also, I'd like more in terms of motivation. From the few people who need Cartesian products at all, a fraction would need the power of a range, and those could write 'cartesianProduct(r, r, r)`. True, that can't be done if the exponent is a variable. But how often does that happen?

At least for my use cases it is nearly always an variable.

I can argue that at least product is part of Python's core itertools, and for itertools.product one finds 118,373 matches on github. I guess the usage is even higher because - just product is valid too in python if correctly imported.

@wilzbach
Copy link
Member Author

wilzbach commented Mar 4, 2016

Does anyone know why autotester is failing randomly with this message?

make[1]: Leaving directory `/home/braddr/sandbox/at-client/pull-1978040-Linux_32_64/phobos'
timed out after 3600 seconds, step failed

@braddr ?

@wilzbach wilzbach force-pushed the itertools branch 2 times, most recently from 370bf72 to 9430c0d Compare March 6, 2016 14:03
@wilzbach
Copy link
Member Author

wilzbach commented Mar 6, 2016

Updates:

  • I added a lot of comments to the sometimes hard to understand popFront's method (all 3)
  • I made front private and added a getter function
  • I added many more unittests for combinations and combinationsRepeat -> it really turned out there was an edge case not covered in combinationsRepeat (when repeat was > number of elements)

@wilzbach
Copy link
Member Author

ping @andralex - what is your opinion on this addition? What is blocking this PR?

@@ -4257,6 +4267,11 @@ struct Permutations(Range)

next(2);
}

@property bool empty() pure nothrow @safe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this change is doing in this PR, but attributes for member functions of type templates are inferred, so explicit attributes are not necessary here (also @nogc is conspicuous by absence)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this change is doing in this PR

This was done on @andralex request to make empty readonly - I thought if this applies for the methods below I thought it applies for permutation as well.

also @nogc is conspicuous by absence

added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just use attribute inference. Explicit attributes are nice for documentation but we don't generally document range primitives.

@wilzbach wilzbach changed the title add more itertools: product, combinations, combinationsRepeat add more itertools: cartesianPower, combinations, combinationsRepeat Mar 14, 2016
@9il
Copy link
Member

9il commented Mar 14, 2016

@9il If I read the documentation correctly std.math is only for elementary operations, so only std.numeric is left?

Yes. In the same time, would you mind to place this functions to mir instead? std.numeric should will be deprecated in the future. We should think about new sci.* part of Phobos.

size_t binomial(size_t n, size_t k) pure nothrow @safe @nogc
in
{
assert(n > 0, "binomial: n must be > 0.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is unnecessary because n == 0 is also valid (as documented above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now checks for n >= 0, there is a generalization of the binomial coefficient for negative integers which is seldomly used - should it be added too?

@wilzbach
Copy link
Member Author

ping pong @andralex for general feedback on inclusion (see my reasoning above).

void popFront()
{
// check whether we have reached the end
if (empty) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we use assert(!empty); in popFront, making it a logic error to call popFront on an empty range

@JakobOvrum
Copy link
Member

I'm no math guy but is there really no way to do these in constant space?

@wilzbach
Copy link
Member Author

I'm no math guy but is there really no way to do these in constant space?

AFAIK having this state array (aka pools) is the most common way to do this - have a look at the implementation in Python:

Allocation: https://github.com/python/cpython/blob/520c7a75f1a600b3b3a57ced782dfd046ddff678/Modules/itertoolsmodule.c#L2041
Iteration: https://github.com/python/cpython/blob/520c7a75f1a600b3b3a57ced782dfd046ddff678/Modules/itertoolsmodule.c#L2161

(all other comments are addressed & updated)

@wilzbach
Copy link
Member Author

Could someone add the @andralex label? Thanks ;-)

@9il
Copy link
Member

9il commented Mar 25, 2016

Please add algorithms complexity overview

}
body
{
if (k < 0 || k > n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k is unsigned

@9il
Copy link
Member

9il commented Mar 25, 2016

@greenify We need time for experiments with this functions before including them into Phobos.
Please include all this functions to your Mir PR, update module name with mir.combinatorics (or another name, not stat), fix issues with binomial I have listed here, and close this PR.

@9il
Copy link
Member

9il commented Mar 25, 2016

In addition, it is possible to compute FP binomial value with minimal rounding error using precise summation algorithm.

@wilzbach
Copy link
Member Author

Please include all this functions to your Mir PR, update module name with mir.combinatorics (or another name, not stat), fix issues with binomial I have listed here, and close this PR.

Done - even though I would prefer to have these iteration algorithms in std.algorithm.iteration eventually ;-)

@wilzbach wilzbach closed this Mar 28, 2016
@wilzbach
Copy link
Member Author

Update: 15 days and a lot of review & help (thanks so much @9il) later - an initial version of combinatorics is now part of mir. It now supports @nogc via the Allocator API.

See combinatorics at mir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants