Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are the intended semantics of <== and <<== #27

Closed
lizmat opened this issue May 15, 2019 · 11 comments
Closed

What are the intended semantics of <== and <<== #27

lizmat opened this issue May 15, 2019 · 11 comments
Assignees
Labels
6.e Related to the next 6.e language release language Changes to the Raku Programming Language

Comments

@lizmat
Copy link
Collaborator

lizmat commented May 15, 2019

See rakudo/rakudo#2899 for the start of this discussion.

@AlexDaniel AlexDaniel added the language Changes to the Raku Programming Language label May 15, 2019
@Kaiepi
Copy link

Kaiepi commented May 17, 2019

With rakudo/rakudo#2903, I think <== and ==> are brought in line with the spec.

The question is how are <<== and ==>> supposed to work? Should code like this be allowed to run?

[4,5,6] ==>> [1,2,3] ==>> my @foo;

Or should only one appending feed operator be allowed at a time?

my @foo;
@foo <<== [1,2,3];
@foo <<== [4,5,6];

If more than one should be allowed, should they be allowed in combination with their respective assigning operators, like this?

my @even <== grep { $_ %% 2 } <== 1..^100;
@even <<== grep { $_ %% 2 } <== 100...*;

@Kaiepi
Copy link

Kaiepi commented May 18, 2019

Also, from the parallelization pullreq:

There's a problem with this... this benches slower than the current implementation of feed operators, even when there's blocking I/O going on at the same time. I think more discussion needs to be made about whether or not this should be implemented.

Feed operators were benching much faster in the first pullreq I made. Should we ignore the spec about parallelizing feed operators?

@lizmat
Copy link
Collaborator Author

lizmat commented May 19, 2019

FWIW, I don't think feeds need to create containers, so we can have that performance benefit. It's only the storing in the endpoint that should create containers if the receiving wants that (e.g Array vs List).

@Kaiepi
Copy link

Kaiepi commented May 19, 2019

Disregard what I said about ignoring the spec, I figured out how to get parallelized feed operators to run 5x faster than the current implementation

@Kaiepi
Copy link

Kaiepi commented May 25, 2019

Before I can continue with my pullreq, there's something that needs to be resolved. Modules in the ecosystem are using feed operators with things that aren't iterable. Here's an example from CUID:

sub timestamp {
        (now.round(0.01) * 100)
        ==> to-base36()
        ==> adjust-by8()
        ==> padding-by8()
}

Should this behaviour be preserved?

@lizmat
Copy link
Collaborator Author

lizmat commented May 25, 2019

Does that currently return an array or a scalar?

@Kaiepi
Copy link

Kaiepi commented May 25, 2019

A scalar

@lizmat
Copy link
Collaborator Author

lizmat commented May 26, 2019

Then I think a nqp::p6store will take care of that eventuality.

@jnthn
Copy link
Contributor

jnthn commented May 29, 2019

Before I can continue with my pullreq, there's something that needs to be resolved. Modules in the ecosystem are using feed operators with things that aren't iterable.

My feeling is that any function you feed a value into had better be happy with getting its input as a final extra Iterable argument (presumably a Seq with an underlying iterator that is pulling from a Channel). Or, once we support it, such an argument at insertion point.

If we've things in the ecosystem that don't play well with that - which I don't believe the example given here will - we may need to preserve the existing semantics for 6.d and below, and introduce the new ones for 6.e.PREVIEW and onwards.

The feed operators really didn't get that much attention to date. The implementation before the recent work was very much a case of "first draft", and certainly didn't explore the parallel aspects alluded to in the language design docs. I'd be surprised if we can make them behave usefully going forward without breaking some of the (less thought out, and probably accidental) past behaviors.

@jnthn
Copy link
Contributor

jnthn commented May 30, 2019

Also, some notes on the parallelism model with feed operators: it's quite different from the hyper/race approach.

In the hyper/race case, we take the data, divide it up into batches, and work on it. Where possible, for the sake of locality, we try to push a single batch through many operations, e.g. if you do @source.race.map(&foo).grep(&bar).map(&baz) then we'd send a batch, do the maps/grep in the worker, and send back the resulting values. In this model, the parallelism comes from dividing the input data. The back-pressure here is provided by the final consumer of the pipeline.

By contrast, the feed model is about a set of steps that execute in parallel. The parallelism is in the stages of the pipeline being run in parallel, not from the data items. It can be seen as a simple case of a Staged Event-Driven Architecture. Since a given state is single-threaded, it may be stateful - whereas if you try and do stateful things in a map block in a hyper/race it's going to be a disaster. The backpressure model here would ideally be that once a queue becomes full, you cannot put anything more into it. One possible solution here would be to make Channel take an optional bound. Then a send into a Channel that is considered full would block, so you can't put more in, meaning a fast stage can't overwhelm a slow one.

One slightly more general problem is that Channel today doesn't really fit our overall concurrency model very well: it blocks a real OS thread when we try and receive from it, whereas in reality we like non-blocking awaiting of things where possible. I mention that here mostly because I think the stages in a pipeline should be spawned on the thread pool scheduler, but it's quite clear that they won't be the best behaved schedulees with Channel as it exists today. Probably we should solve that at the level of Channel, though, so I'd just use Channel between the stages today. It means we get error and completion conveyance, which are easy to get wrong, so I'd rather not have more implementations of those. :-)

Some problems will be better parallelized with hyper/race, some with feed pipelines, but there's also the issue that some things aren't even worth bothering. I fear the ==> operator is especially vulnerable to that: while I don't think too many folks will write .hyper because it looks prettier, they probably will write ==> for that reason. If we magically speed up their programs with parallelism that's great, but there's a decent chance it won't be worth it, and in fact slow things down. That's a tricky problem, and it's also one we'll have to solve for the hyper/race model too. For now, I'd say just do the parallel implementation, and we'll investigate such heuristics and automatic decision making later. I don't think usage of ==> is widespread enough yet for us to really upset anything

@Kaiepi
Copy link

Kaiepi commented Jul 20, 2019

The parallelization part of this is done, all that's left is support for <<==, ==>>, and *. I have a question regarding how <<== and ==>> should work though:

my @foo = (1, 2, 3);
(4, 5, 6) ==>> @foo ==>> my @bar;
say @bar; # OUTPUT: (1, 2, 3, 4, 5, 6)

What should the value of @foo be after running this? (1, 2, 3, 4, 5, 6) or (4, 5, 6)? I think (4, 5, 6) DWIMs better, but I'm not entirely sure.

@vrurg vrurg added the 6.e Related to the next 6.e language release label Nov 27, 2019
@vrurg vrurg added this to In Development in v6.e Release Nov 27, 2019
@lizmat lizmat closed this as completed May 26, 2020
v6.e Release automation moved this from In Development to Done May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.e Related to the next 6.e language release language Changes to the Raku Programming Language
Projects
v6.e Release
  
Done
Development

No branches or pull requests

5 participants