-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build a library of iterator tools #12657
Comments
This is up for grabs as a GSoC 2019 project. Some suggestions to GSoC applicants:
|
Though not explicitly listed, this may be considered part of #6329 |
I want to take this up as a GSoC 2019 project. I was just wondering if beginning with implementing the For example, the basic implementation of product for integer ranges could be given as:
So what I need to do is write the tests and documentation, work on determining whether it can be made parallelizable or not, and make this process fast by reducing communication overheads, right? Please let me know if I'm missing something. |
Hi @akshansh2000, Regarding your
Beyond that, documentation, evaluating performance, and determining if the implementation should/can be parallelized would be your next steps (in any order you prefer). |
This comment has been minimized.
This comment has been minimized.
The basic implementation for the repeat tool is given:
Output:
Please let me know what all I need to change in this. This sure could be parallelized (only in the case of a bounded range), and I'll work on that next. The documentation and performace evaluation is further in the queue. |
@akshansh2000 You may want to make this into its own Pull Request. (also side note: If you add write the code block as ```chpl you get syntax highlighting) iter repeat (arg, times = 0) {
if times == 0 then
for count in 1.. do yield arg;
else
for count in 1..#times do yield arg;
} You also will want a parallel iterator. I'll do this one for you so you get the gist of how to do so in Chapel... iter repeat (arg, times = 0, param tag : iterKind) where tag == iterKind.standalone {
if times == 0 then
forall count in 1.. do yield arg;
else
forall count in 1..#times do yield arg;
} You'll also definitely want a leader-follower version so you can do something like this... forall (ix, a) in zip(repeat(fixedIdx), arr) {
a = ix;
} In the above example, you zip over an infinite stream |
Thanks for the clarification on the parallel iterator, @LouisJenkinsCS! I'll work on the leader-follower part and share my progress. (Thanks for the syntax highlighting part as well hahaha) Also, as for the PR, I'm supposed to make a Mason package and make a PR to the mason-registry repository, right? |
Hm... I'm not sure, honestly. Previously students just submit a PR to the repository and it gets stuck somewhere (likely labeled as some kind of prototype/test), but I suppose it makes sense if @ben-albrecht would want you to implement it as a Mason package. |
Alright, I'll wait for @ben-albrecht sir to reply, and work on my code until then. Thank you! |
Mason package vs package module is still an open question at this point. Though, I'm leaning towards a package module right now so that we can utilize the nightly performance testing. Once mason packages have this functionality, we could migrate this to a mason package, like we plan to do for other existing package modules (#10713). |
The repeat iterator is almost complete, I believe. Also, I'm a bit confused here. How do I implement infinite repetition for a parallel loop? Here's the code; I've commented the non working code below. // Serial Iterator
iter repeat (arg, times = 0) {
if times == 0 then
for count in 1.. do yield arg;
else
for count in 1..#times do yield arg;
}
// Standalone Parallel Iterator
iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.standalone {
if times == 0 then
coforall count in 1.. do yield arg; // parallel infinite loop creates problem
else
coforall count in 1..#times do yield arg;
}
// Procedure to compute chunks which are to be iterated through
proc computeChunk(r : range, myChunk, numChunks) where r.stridable == false {
const numElems = r.length;
const elemsPerChunk = numElems / numChunks;
const mylow = r.low + (elemsPerChunk * myChunk);
if (myChunk != numChunks - 1) then
return mylow..#elemsPerChunk;
else
return mylow..r.high;
}
const numTasks = here.maxTaskPar; // max number of tasks supported by locale
// Leader
iter repeat (param tag : iterKind, arg, times = 0) where tag == iterKind.leader {
if times == 0 then
coforall task_id in 0.. do yield(0..1,); // parallel infinite loop creates problem
else
coforall task_id in 0..#numTasks {
const working_iters = computeChunk(0..#times, task_id, numTasks);
yield(working_iters,);
}
}
// Follower
iter repeat (param tag : iterKind, arg, times = 0, followThis)
where tag == iterKind.follower && followThis.size == 1 {
const working_iters = followThis(1);
for idx in working_iters do yield arg;
}
for element in repeat(1, 10) do write(element, ' '); // works - 1 1 1 1 1 1 1 1 1 1
writeln();
forall element in repeat('abc', 4) do write(element, ' '); // works - abc abc abc abc
writeln();
forall element in repeat(123) do write(element, ' '); // doesn't work, parallel infinite loop
writeln();
var arr: [1..5] int;
forall (idx, element) in zip(repeat(3, 5), arr) do element = idx; // works
writeln(arr); // - 3 3 3 3 3
forall (idx, element) in zip(repeat(2), arr) do element = idx; // doesn't work, parallel infinite loop
writeln(arr); Thank you! |
This is not supported today in Chapel. As a result, we will not be able to pursue parallel implementations of any infinite iterators for this project. |
Alright, so should I put the rest of the code into a module and create a PR? |
Oh yeah, I forgot that you can't |
Note that if I make the array the 'leader', it works fine. It has to do with the leader not knowing when to stop, which is definitely a problem. TIO |
Noted. So for now should I exclude the infinite-parallel iteration from the code and make a module out of the rest of my code, and throw an error when trying to use infinite iteration with a parallel loop? Also, should I open an issue regarding the above? |
You could halt when you have an infinite parallel loop for now, sure. The lack of a break statement in forall is a well-known issue, so no issue needed (unless Ben thinks differently) |
That sounds good to me, or just leave the parallel infinite iterator implementations out for now.
Despite being well-known, I think it's useful to have an issue to reference / search for. I've created a simple one here: #12700 |
@ben-albrecht, @LouisJenkinsCS, I completed the serial version of the cycle itertool, which returns separate elements from an iterable, eg: writeln(cycle('ABC'));
writeln(cycle(1..4, 2));
I was working on the parallel version, but I am not sure if it makes sense to parallelize. I don't think that a shuffling of order would be desirable. Is there some case that I'm missing where it might be used? |
I don't think a standalone parallel iterator makes sense for forall (dayOfMonth, dayOfWeek) in zip(1..30, cycle('MWTRFSU')) {
April[dayOfMonth] = dayOfWeek;
} |
@ben-albrecht, @LouisJenkinsCS, I am having some problem implementing the parallel version of the The serial version is working fine. However, the parallel version throws a use RangeChunk;
iter cycle(param tag: iterKind, arg, times = 0) throws
where tag == iterKind.leader {
var numTasks = if dataParTasksPerLocale > 0 then dataParTasksPerLocale else here.maxTaskPar;
if numTasks > times then numTasks = times;
if __primitive("method call resolves", arg, "these") { // to check if `arg` is iterable
coforall tid in 0..#numTasks {
const working_iters = chunk(0..#times, numTasks, tid);
yield(working_iters,);
}
} else
throw new owned IllegalArgumentError(
"non-iterable type argument passed");
}
iter cycle(param tag: iterKind, arg, times = 0, followThis)
where tag == iterKind.follower && followThis.size == 1 {
const working_iters = followThis(1);
for working_iters do
for element in arg do
yield element;
} On running the following code, forall (el, id) in zip(cycle('ABCD', 2), 1..#8) do writeln(el, ' ', id); the error for unequal lengths as described above is thrown. However, on running forall (el, id) in zip(cycle(['ABCD'], 8), 1..#8) do writeln(el, ' ', id); the program works fine. What could I be missing? Thanks. |
|
Actually, One Similarly, in the second example, as the iterable length is 1, repeating it 8 times would require |
use RangeChunk;
iter cycle(arg, times = 0, param tag: iterKind) throws
where tag == iterKind.leader {
writeln(arg.type : string);
var numTasks = if dataParTasksPerLocale > 0 then dataParTasksPerLocale else here.maxTaskPar;
if numTasks > times then numTasks = times;
if __primitive("method call resolves", arg, "these") { // to check if `arg` is iterable
coforall tid in 0..#numTasks {
const working_iters = chunk(0..#times, numTasks, tid);
yield(working_iters,);
}
} else
throw new owned IllegalArgumentError(
"non-iterable type argument passed");
}
iter cycle(arg, times = 0, followThis, param tag: iterKind)
where tag == iterKind.follower && followThis.size == 1 {
const working_iters = followThis(1);
writeln(working_iters);
var upperBound = working_iters.size;
var _times = 0;
label outerLoop
while true do
for element in arg {
_times += 1;
yield element;
if _times == upperBound then break outerLoop;
}
}
iter cycle(arg, times = 0) {}
forall (el, id) in zip(cycle('ABCD', 2), 1..#8) do writeln(el, ' ', id); If you want it to go around in a cycle, it'd have to be a bit more complicated than what we have right now. If this is what you want you should obtain the |
I think I misunderstood the intention here. A string by itself is iterable and yields characters; if you want the string itself to be iterated over, the array literal is the best approach. |
@LouisJenkinsCS, does that mean storing the string as an array of characters? |
TIO That should work |
That is a great approach, @LouisJenkinsCS. I'll try to optimize it and see if we can get it done using even fewer resources. I'll get back here if I have any doubts. Thank you! |
Nice example @LouisJenkinsCS. I have a small suggestion - e.g. use Reflection only;
...
if Reflection.canResolveMethod(arg, "these") {
...
} else {
throw new owned IllegalArgumentError(...);
} |
Added "Itertools" module with repeat itertool Part of #12657 This PR adds a draft `Itertools.chpl` file to the test directory, which contains the `Itertools` module with complete documentation and testing - Contains serial and parallel (standalone and leader-follower) iterators for the `repeat` itertool [Contributed by @akshansh2000] [Reviewed by @ben-albrecht]
This code doesn't work if the I was trying to debug it. iter abc() {
for 1..8 do
for j in 1..1 do
yield j;
} I then switched to a leader-follower one, but that didn't work quite well (TIO4). I set the default value of
This leader-follower one should yield
And I guess the initial error which I got a few days back ( Am I doing something wrong here? |
Most of our standard leader-follower iterators yield a tuple of indices using 0-based indexing. This is simply a convention so that if you're zippering something that's 0..7 with something that's 1..8 with something that's 1..16 by 2, they all have a common basis for what global iteration you're on. This means that an 8-element range's follower is expecting to receive indices in the 0..7 range. I expect that things are going wrong because your leader is yielding indices in the 1..8 range (but am surprised that the range iterator doesn't complain more about being given an iteration that's OOB... that seems like a bug). Specifically, if I make your leader return working_iters-1, it seems to work as expected: TIO |
@bradcray, yes, I should've taken care of that. I understand now. One more thing, though. Why does an unbounded range starting from |
It would've been much easier to take care of if the range follower had complained at you that you were requesting unreasonable indices... It'd be good for one of us to look into why that was. |
I just did this and only now realized that it was complaining at you, athough the error message could've been clearer:
I missed it the first time because I forgot that TIO separates stdout from stderr. Running it from my console made it much clearer... |
I think that's as expected. Unbounded ranges are treated a bit specially in that they are permitted to be zippered with anything without running into a length check validation failure. So since TIO5 wasn't 0-basing its indices, it yielded its second through #8'th element (so 2..9). [edit: There's a long-term intention to give users the ability to write their own unbounded iterators that can similarly conform to the leader's size, but we haven't ever completed the design and implementation of that feature]. |
I'd love to help with it in any way possible! |
@akshansh2000: I wish I knew how to advise you to do so. It's a pretty big design challenge and likely to end up with a very different world than the one we have today. The general ideas we've had (that I'm aware of) are based on increasing the amount of interaction between a leader iterator and the follower iterators such that it's up to a follower to say "mismatched zipper iteration" or just let the follower consume as much of the leader as it wants. The thinking is that this interaction would also help fix a longstanding bug in which expressions like |
@bradcray, I guess I'd have to dig deep into the core concepts of Chapel for that, on it! |
Added the cycle itertool to Itertools.chpl Part of #12657 The `cycle` itertool returns elements from an iterable over and over again, a specified number of times. Refer to [python itertools](https://docs.python.org/3/library/itertools.html#itertools.cycle) for more information. This implementation includes a serial, leader, and follower iterator for `cycle()`. Note that the unbounded variant of this iterator can not be zippered in a `for/coforall` loop due to #13239. - [x] Add the functioning code - [x] Fix `forall` loops error in zippered contexts - [x] Complete the documentation - [x] Add tests [Contributed by @akshansh2000] [Reviewed by @ben-albrecht]
For anyone curious, the current progress of the itertools library can be found here: https://github.com/chapel-lang/chapel/tree/master/test/library/packages/Itertools Much work remains to be done. |
Add accumulate tool to Itertools Part of #12657 Add the [accumulate tool](https://docs.python.org/3/library/itertools.html#itertools.accumulate) to the [Itertools module](https://github.com/chapel-lang/chapel/blob/master/test/library/packages/Itertools/Itertools.chpl). - [x] Add the tool - [x] Write relevant documentation - [x] Write tests This would allow such operations in arrays: ```chpl use Itertools; writeln(accumulate([1, 2, 3, 4, 5], operations.add)); // 1 3 6 10 15 writeln(accumulate([1, 2, 3, 4, 5], operations.multiply)); // 1 2 6 24 120 ``` Currently, the supported operations are: * add * subtract * multiply * divide * bitwise AND * bitwise OR * bitwise XOR [Contributed by @akshansh2000] [Reviewed by @ben-albrecht]
It would be useful to provide a toolkit of common serial and parallel iterators to users through a library. Python’s itertools library would make a good reference for target functionality as well as early performance comparisons. However, this task is not limited to iterators available in itertools.
Assuming we're porting python itertools, we would need to determine the following for each iterator:
There are a number of example recipes provided in the itertools documentation that demonstrate combining multiple iterators together to create powerful constructs. Being able to reproduce many (or all) of these constructs would be a good design goal.
The text was updated successfully, but these errors were encountered: