-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help in understanding how to do parallel processing with reactive streams #100
Comments
The key thing to remember is 1) A single subscription always runs sequentially. 2) Multiple subscriptions may run concurrently. If things runs in parallel depends on the GIL, third-party libraries used, or as with the example using the process pool. The So yes, you are getting a stream of 500 futures, but the output is generated when they resolve, and they may resolve in parallel. Thus the order of the merged stream may be very different from the original order. Could you perhaps provide a running example and explain exactly what output you get, and what you expect if you think the output was unexpected. Buffering using |
@dbrattli thanks for the reply and the explanation. I will try to post a much more complete snippet by the end of the day But here's something that I was trying today. Notice that I am using ThreadPoolExecutor vs ProcessPoolExecutor. Things seem to be working fine with ProcessPoolExecutor I was under the assumption that either should not make a difference, because we get a list of futures that will be processed by the subscriber and only when all of these have been process the code will move to the next line
and I see outputs varying from
|
Also where does a scheduler fits in this picture. I was under the assumption that scheduler is ideally how you should be executing code snippets in different threads. |
Looks like we have a race-condition and that one or more operators are not thread-safe. I need to investigate, so flagging this as a bug. About schedulers. You can write the same thing using schedulers: scheduler = ThreadPoolScheduler()
xs = Observable.range(1, 5).flat_map(lambda x: Observable.just(x, scheduler=scheduler), mapper) The |
thanks, appreciate it. I will try and contribute the schedulers example mentioned above. Is there any reason one would prefer using scheduler vs using language specific multithreading? Just trying to understand if one approach is preferred over the other and rationale behind it. |
I can still recreate the race/thread issue - is there any further progress on this? This code sometimes completes normally, but sometimes it blows up with
|
In addition to sometimes raising that
|
I cannot reproduce this with RxPY v3 so closing this isse. Below is the equivalent v3 code, so please re-open this issue if you can make it fail: import concurrent.futures
import rx
from rx import operators as ops
def custom_print_buffer(items):
if items:
print(len(items))
def custom_print(item):
print(item)
def return_item(item):
return item
x = rx.from_(range(500))
with concurrent.futures.ThreadPoolExecutor(5) as executor:
x.pipe(
ops.flat_map(lambda item: executor.submit(return_item, item)),
ops.buffer_with_count(100)
).subscribe(custom_print_buffer)
print("Done executing") |
Isn't RxPy V3 beta? |
I concur that it appears to work with |
Yes, still beta, but hopefully not for long (days) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I wasn't sure if there is a dedicated SO for this, or if there was a mailing list where I could put up this questions. So apologies in advance if I have posted this at the wrong place.
I am trying to wrap my head around the concept of how to consume a reactive stream in a parallel fashion because from my understanding the stream is consumed by the subscriber in a sequential fashion.
The only example I could find was from https://github.com/ReactiveX/RxPY/blob/master/examples/parallel/timer.py . But I believe schedulers are supsd to do something like along those lines as well, not really able to understand the difference between the 2 is.
I am trying with a sample snippet
The result of this code is
length of items 5
length of items 5
and then the code moves on to the next statement. Am I missing something fundamental here. My understanding is that we should be getting a stream of 500 futures and the "length of items statement should repeat 100 times"
The same logic works with :
Thanks for the help.
The text was updated successfully, but these errors were encountered: