New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "worker pool" pattern in actor builder and other related operators #172

Open
elizarov opened this Issue Nov 29, 2017 · 10 comments

Comments

Projects
None yet
6 participants
@elizarov
Collaborator

elizarov commented Nov 29, 2017

actor builder should natively support "worker pool" pattern via an additional optional parameter parallelism concurrency that defaults to 1, so that to you if you have a list of of some requests, then can be all processed concurrently with a specified limited concurrency with a simple code like this:

val reqs: List<Request> = ...
val workers = actor(concurrency = n) { 
    for (it in channel) processeRequest(it)
}

This particular pattern seems to be quite common, with requests being stored in either a list of requests of receive from some other channel, so the proposal is to add concurrency to map, and cosumeEach, too, to be able to write something like:

incomingRequests.consumeEach(concurrency = n) { processRequest(it) }

UPDATE: We will consistently call it concurrency here. We can have dozens of concurrent coroutines which run on a single CPU core. We will reserve the name parallelism to denote limits on the number of CPU cores that are used.

@elizarov elizarov changed the title from Suppor "worker pool" pattern in `actor` builder and related operators to Support "worker pool" pattern in actor builder and other related operators Nov 29, 2017

@SolomonSun2010

This comment has been minimized.

@enleur

This comment has been minimized.

enleur commented Mar 7, 2018

Is this too naive implementation of map?

fun <E, R> ReceiveChannel<E>.map(
    context: CoroutineContext = kotlinx.coroutines.experimental.Unconfined,
    parallelism: Int = 1,
    transform: suspend (E) -> R
): ReceiveChannel<R> = produce(context, capacity = parallelism) {
    (0 until parallelism).map {
        launch(context) {
            consumeEach {
                send(transform(it))
            }
        }
    }.forEach { it.join() }
}
@elizarov

This comment has been minimized.

Collaborator

elizarov commented Mar 9, 2018

@enleur This is close. However, I'd like to have a slightly more efficient implementation that launches up to n coroutines only as they are needed, so that it starts up efficiently even for very large values of n.

@dobriy-eeh

This comment has been minimized.

dobriy-eeh commented Mar 15, 2018

As a proposal for an alternative implementation:

suspend fun <T> forkJoin(
        context: CoroutineContext = DefaultDispatcher,
        start: CoroutineStart = CoroutineStart.DEFAULT,
        outerBlock: (fork: (suspend () -> T) -> Unit) -> Unit
): List<T> {
    val deferreds = ArrayList<Deferred<T>>()
    outerBlock({ deferreds.add(async(context, start) { it() }) })
    return deferreds.map { it.await() }
}

Usage example 1:

val stream = listOf(1, 2, 3).stream()
val results = forkJoin<Int> { fork ->
    stream.forEach { fork { suspendFunc(it) } }
}

Usage example 2:

val results = forkJoin<Int> { fork ->
    for (i in 1..5) {
        if (i % 2 == 0)
            continue

        fork { suspendFunc(i) }
    }
}

The main advantage: this is quite flexible with respect to the outer "looping" code.
You are not limited some strict interface for outgoing data: for example stream only or channel only.
You can use any language features to organize fork loop: for, if, streams and so on.

Also you are not limited exactly one 'request' parameter for processing function, you may use function with any number of parameters.

@fvasco

This comment has been minimized.

Contributor

fvasco commented Jul 18, 2018

Does concurrent map preserve the order?

Should we introduce a optional parameter preserveOrder : Boolean = true for some operators? (ie map, filter, ...)

@elizarov

This comment has been minimized.

Collaborator

elizarov commented Jul 18, 2018

Sometimes you need an order preserved, sometimes you do not. I wonder what should be the default and whether it should be controlled by a boolean of there should be separate operators.

@elizarov

This comment has been minimized.

Collaborator

elizarov commented Jul 20, 2018

Note, that an alternative design approach to solve the use-case of parallel processing is to introduce a dedicated parallel (?) combinator, so that channel.parallel().map { transform(it) } would perform transform in parallel for all incoming elements without preserving the order.

@fvasco

This comment has been minimized.

Contributor

fvasco commented Jul 20, 2018

I am considering the follow signature, this encapsulates the parallel blocks and allows to reuse all current operators.

suspend fun <E, R> ReceiveChannel<E>.parallel(
        parallelism: Int,
        block: suspend ProducerScope<R>.(ReceiveChannel<E>) -> Unit
): ReceiveChannel<R>

or

suspend fun <E, R> ReceiveChannel<E>.parallel(
        parallelism: Int,
        block: suspend ReceiveChannel<E>.() -> ReceiveChannel<R>
): ReceiveChannel<R>
@fvasco

This comment has been minimized.

Contributor

fvasco commented Jul 21, 2018

I take some time to expose my previous message.

The idea behind is to use a regular fork/join strategy, fork and join using Channels is pretty easy, so it is possible use paralel pipelines to process items.

Multiple coroutines receive items from a single source ReceiveChannel and send results to the output channel.

suspend fun <E, R> ReceiveChannel<E>.pipelines(
        parallelism: Int,
        block: suspend ReceiveChannel<E>.() -> ReceiveChannel<R>
): ReceiveChannel<R>


val ids: ReceiveChannel<Int> = loadIds()
val largeItem = ids
        .pipelines(5) {
            map { loadItem(it) }
                    .filter { it.active }
        }
        .maxBy { it.size }
}

Unfortunately using this syntax is difficult consume data in parallel, ie consumeEach.

So an alternative syntax can be:

suspend fun <E, R> ReceiveChannel<E>.fork(
        parallelism: Int,
        block: suspend (ReceiveChannel<E>) -> R
): List<R>


val largeItem = ids
        .fork(5) {
            it.map { loadItem(it) }
                    .filter { it.active }
                    .maxBy { it.size }
        }
        .filterNotNull()
        .maxBy { it.size }

Obviously consuming items in the fork function produces a List<Unit> and does not requires the join phase.

I suspect that both operators are useful.

@gildor

This comment has been minimized.

Contributor

gildor commented Oct 23, 2018

I want to bump this issue.
This pattern is so often, I see questions about implementation at least each week on Kotlin Slack #coroutines channel also all fast ad-hoc implementations often have problems (a similar problem we had before awaitAll extensions, when simple extension functions just use map { it.await() } which leak coroutines in case of error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment