Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow.share operator #1261

Open
elizarov opened this issue Jun 7, 2019 · 27 comments

Comments

@elizarov
Copy link
Member

commented Jun 7, 2019

The share() operator operates on Flow<T> and returns Flow<T>. It shall have the following semantics. The resulting flow is cold, but when one collector shart collecting from it, the it starts to collect from the upstream flow, activating the emitter upstream. The trick of the share operator is that when additional collectors appear in the downstream, they all "share" the same upstream emitter.

For example, consider the flow:

val flow = flow { 
    var i = 0
    while(true) { 
        delay(1000) 
        println("Emit $i")
        emit(i++)
    }
}

If you launch two collectors:

launch { flow.collect { println("A: got $it") } }
launch { flow.collect { println("B: got $it") } }

Then you shall see "Emit 0 / A: got 0 / Emit 0 / B: got 0 / Emit 1 / A: got 1 / Emit 1 / B: got 1 / ...".

However, if you change the flow to val flow = flow { /* same */ }.share(), then you shall see "Emit 0 / A: got 0 / B: got 0 / Emit 1 / A: got 1 / B: got 1 / ...", that is one emission gets delivered to both collectors.

Now if you need to artificially "start" the shared flow simply to keep it active, then you can always launch a dummy collector: launch { flow.collect {} } that works as a "reference" which is active until you cancel the resulting job.

TBD: Share operator might need some configuration with the respect to how much "history" to keep in memory for "late collectors". So far it seems that one non-negative integer is enough (with zero -- new collector don't get any history, with one -- only the most recent value, with more -- the specified number of recent values). What is unclear is what shall be the default value (if any).

UPDATE: It will have to be, actually, a shareIn(scope) operator. Otherwise, the scope it works in will be bound the the first collectors and when this callector is cancelled it will not be able to maintain emissions to other collectors.

@matejdro

This comment has been minimized.

Copy link

commented Jun 7, 2019

This looks good as #1086 implementation. One thing missing is the sharing timeout, which use case is described in #1086 (comment)

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 7, 2019

One question about 'starting' a shared/published flow through launch { flow.collect {} }.

In #1086 , starting a shared Flow could only happen on a Flow that was already shared, because connect() could only be invoked on a ConnectableFlow. Any subsequent subscribers/cosumers/collectors to this shared flow would not restart the producer of the flow.

Say I want to create an extension function out of launch { flow.collect {} } called connect:

fun <T> Flow<T>.connect(scope: CoroutineScope) = scope.launch { collect {} }

I can't be sure that the receiver of the connect function is a shared/published Flow.
The argument for exposing a shared/published flow as a separate type, as a ConnectableFlow, is that I can then write this extension function as follows:

fun <T> ConnectableFlow<T>.connect(scope: CoroutineScope) = scope.launch { collect {} }

and non-shared/non-published Flows won't be able to get 'connect'ed.

@zach-klippenstein

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

What happens to the cached "history" when moving from one to zero collectors (cancelling the upstream collection), then back to one? Does the history get preserved across upstream collections or lost?

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 7, 2019

@zach-klippenstein That is why Rx does not use share() for a shared cache, but cache(). The number of subscribers governs when the flow/stream starts, but it doesn't govern when the flow/stream stops.

With Rx' ConnectableObservable's refCount(), when there is at least 1 subscriber, the Flowable starts. When there are no more subscribers the Flowable stops.
With Rx' ConnectableObservable's autoConnect(), when there is at least 1 subscriber, the Flowable starts but even when there are no more subscribers, the Flowable continues. the 'history' won't get lost.

@zach-klippenstein

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

What is unclear is what shall be the default value (if any).

To me, the name "share" implies only multicasting, so zero would be a reasonable default. The caching/replaying behavior is an additional feature that is often used along with multicasting, but not necessarily implied by the name.

@zach-klippenstein

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

@streetsofboston That's how Rx works, yes, but I was asking for clarification about this operator because I didn't see it specified in the issue description.

I believe share(0) as is proposed here would be equivalent to RxJava's share() (i.e. publish().refCount()), and share(n) would be equivalent to RxJava's replay(n).refCount().

The behavior of saving the cache across "connections" would be similar to Jake Wharton's RxReplayingShare.

@matejdro

This comment has been minimized.

Copy link

commented Jun 7, 2019

What happens to the cached "history" when moving from one to zero collectors (cancelling the upstream collection), then back to one? Does the history get preserved across upstream collections or lost?

I would argue that this should be an option, since it could be useful to have it preserve data over cancelling (for example to use this flow as a memory cache if data takes a while to load after subscribing) or it would not be useful (for example when data from the flow changes frequently, rendering old values obsolete), depending on the context

What is unclear is what shall be the default value (if any).

I think that there shouldn't be any default value, but startWith operator could be added for this like RX.

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 7, 2019

@zach-klippenstein That's how I modeled my prototype implementation of the ConnectableFlow as well, for the share() and cache() functions: https://gist.github.com/streetsofboston/39c3f8c882c9891b35e0ebc3cd812381

Roman wants to not expose yet another type (ConnectableFlow) from the library and only add one more function called share() for the most-used use-cases so that the Flow api doesn't become bloated. I agree with this sentiment. Keep it as small as possible.

And maybe the argument in favor of exposing a new ConnectableFlow type is strong enough (see my previous comment about a connect() extension function) so that it will be added to the Flow api ....

In the end it's up to the maintainers of kotlinx.coroutines to weigh these factors and make a decision :-)

However, I don't think overloading the proposed share() operator with caching functionality (share when size == 0, cache when size > 0) is a good idea. Better to spell out the functions' responsibilities and have a share() function without a size parameter that shares and a cache() function with size parameter that caches.

Maybe a separate, but related, issue should be opened for a Flow.cache operator.

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 7, 2019

@matejdro With the ConnectableFlow from my gist, either would be possible:

Cache survives re-activations of the flow:
flow.replay(size).autoConnect()

Cache does not survive re-activations of the flow:
flow.replay(size).refCount()

@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2019

This looks good as #1086 implementation. One thing missing is the sharing timeout, which use case is described in #1086 (comment)

Thanks for timeout reminder. I'll see how it can be incorporate here is some nicer way than just adding an additional "timeout" parameter to this operator.

@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2019

Say I want to create an extension function out of launch { flow.collect {} } called connect:

We actually plan this kind of operator, tentatively to be called launchIn(scope) for all kinds of flows. The only missing piece in this operator's design is user-friendly error-handling.

@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2019

However, I don't think overloading the proposed share() operator with caching functionality (share when size == 0, cache when size > 0) is a good idea. Better to spell out the functions responsibilities and have a share() function without a size parameter that shares and a cache() function with size parameter that caches.

I don't see how to decouple share and cache functionality. It seems that you cannot cache without sharing, so we can just as well have a single operator with optional integer parameter that defaults to 0. No need to overload, because share() is conceptually same as cache(0).

@fvasco

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

@elizarov

you can always launch a dummy collector: launch { flow.collect {} }

But the scope remains active until job is working

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 8, 2019

@fvasco
That's true. It can be cancelled, though.
And it will resume when, in case of share(), all subscribers/collectors stop collecting.

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 8, 2019

I don't see how to decouple share and cache functionality
My prototype implementation decouples the two. It requires two private implementations of ConnectableFlow, though...

@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 8, 2019

My prototype implementation decouples the two. It requires two private implementations of ConnectableFlow, though...

I actually envision a single implementation where "replay size" (of zero or more) is just a configuration parameter.

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

That proposal lookg good. It leaves me wondering about Throwables handling though:

  1. How should exceptions/throwables be handled when there is replay? Are they replayed too? Or is throwing illegal if shared?
  2. How to recover from an exception/throwable in a share session? Thinking about end-user point of view (e.g. of a mobile/desktop app) can help find possibilities.
@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 8, 2019

If you need to handle errors for all collectors in a shared flow (that is, handle emitter's errors), then you have to put error-handling operators before you call share(). Btw, take a look at the error handling proposal: #1263

@fvasco

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

My consideration is that the launch trick has non-trivial consequences.
This can be related to #1065

@streetsofboston

This comment has been minimized.

Copy link

commented Jun 8, 2019

I actually envision a single implementation where "replay size" (of zero or more) is just a configuration parameter.

You can design it to just have one "share" method with a replay-size parameter, but my point was that decoupling the share and cache use cases is possible. I'd argue for decoupling, but you can argue for one "share" method as well.

@zach-klippenstein

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

Cache survives re-activations of the flow:
flow.replay(size).autoConnect()

@streetsofboston This is slightly different. AutoConnect preserves the upstream connection as well. RxReplayingShare will still disconnect upstream when the ref count hits zero (like the refCount operator), but it will preserve the cache and continue to replay the cache to new subscribers. The latter is useful if the upstream subscription represents some expensive resource that should be shutdown if nobody is actively listening, but you still want to have some initial value to emit immediately if the upstream takes a while to "warm up", like @matejdro suggested.

In general I am wary of autoConnect in Rx because it's so easy to misuse and accidentally leak the upsteam connection. Structured concurrency makes that a lot safer, since the connection requires the collection to happen in a particular scope.

I think that there shouldn't be any default value, but startWith operator could be added for this like RX.

@matejdro The function of startWith, at least as Rx defines it, is unrelated to multicasting and caching, but I see what you meant. I read that open question as referring to the "default value" of the cache size parameter - zero, one, or something else - not the default value of the cache itself, which, as you said, would just be empty until the upstream emits something.

@zach-klippenstein

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

In @streetsofboston's proposal, share and cache are "sibling" operators with disjoint functionality, but both are composed of the same two steps: of multicasting and automatic connection. I also prefer that naming because caching implies sharing, but sharing does not imply caching.

You could also make share and cache coordinate through operator fusion (like the recent change to buffer), but it's unclear what cache would do if not adjacent to share, since it doesn't make sense to cache for a single collector:

flow.share() // returns a multicasted flow
  .cache(1) // configures the multicasted flow to cache 1 emission

I don't mind using a single operator for both, as long as the operator name communicates that it does both multicasting and caching (e.g. shareReplaying).

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

Having a cache() operator for Flow would be misleading as you might use it incorrectly on any Flow. The fact that a flow is shared and has cache is an implementation detail, and we should avoid operators that make assumptions on the source Flow. Since cache only makes sense when the Flow is shared, and needs to be applied only on the sharing flow, not by the consumers, it should be a configuration, or an overload of the share operator.

@voddan

This comment has been minimized.

Copy link

commented Jun 11, 2019

How will the updated signature shareIn(scope) work with several dynamically connecting and disconnecting collectors? E.i. there are use cases for the share operator that require the flow to keep track of the scopes of its consumers. Here is a quote from a Slack message with such a use case:

What I'm looking is to create long time operation that persists through multiple scopes. For example:

  1. Screen A (with Scope A) starts long operation
  2. User presses a button to switch to Screen B
  3. Screen B (with Scope B) also starts this long operation, but since this operation is already running, it would just somehow "add its scope"
  4. Screen A is closed, so Scope A that started the operation is cancelled. But since Scope B is still using this operation, it would not stop until Scope B is also cancelled.
@elizarov

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2019

shareIn(GlobalScope) would do the trick.

@voddan

This comment has been minimized.

Copy link

commented Jun 11, 2019

shareIn(GlobalScope) would do the trick.

Would it close the flow if all of the consumers are cancelled? In terms of the above example that would mean canceling the long-running operation in the flow if both consumers, Screen A and Screen B, are closed.

@LouisCAD

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

@voddan Yes, that's the whole point of having a share operator for Flow instead of a ConflatedBroadcastChannel that you have to close manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.