-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The intention of random subscription in Subscriber does not work as expected #269
Comments
Yes, you are correct. We should probably just remove this line from Broadway. The change needs to be added to GenStage instead. A PR for Broadway is welcome. |
@josevalim So excited that I can find out something while learning to read the source. Do you mean in Broadway, we can just subscribe without shuffling? I wonder how to change in GenStage. As the documentation said about DemandDispatcher, it is in a FIFO ordering. If we shuffle the |
Yes!
In the DemandDispatcher, there is a |
em... After second thought, I wonder whether we really need to maintain the "FIFO" as the main characteristics of DemandDispatcher is to dispatch based on which consumer has the biggest demand. After event consuming and demanding, I don't think any consumer still has any particular sequence there. And I also think the library users, using subscription pattern, should not rely on any sequencing for their implementation. Hence, from my point of view, it seems to be more suitable and easier to shuffle in May you advise any situation that I do not know that may require "FIFO"? |
The biggest concern is having the same producer always come back on top of the queue, although that would only happen if the system is idle, so perhaps not a big concern. The other concern is performance: if we need to get all equal consumers and shuffle them every time, what is the impact? Because of that, I think introducing only an initial shuffle is much more contained and hopefully will give enough entropy into the system. |
I am sorry that I don't quite follow the first concern. Shuffling during def subscribe(_opts, {pid, ref}, {demands, pending, max}) do
{:ok, 0, {demands ++ [{0, pid, ref}], pending, max}}
end |
The |
The I thought about the first |
IIRC |
O, yes. It's assuming there is a sorted list. But would my concern on the timing of # GenStage.DemandDispatcher
@doc false
def ask(counter, {pid, ref}, {demands, pending, max}) do
if is_nil(max) do
IO.inspect("In #{inspect(self())} when max is nil, demands length is #{Enum.count(demands)}")
end
We can see that when the first call on the
|
You are correct, it needs to be sorted on the first call to |
I am trying to write a PR in GenStage. However, due to the entropy introduced by shuffling, I am still struggling with how the UT should be written (Possibly PropertyBaseTest required?) and I am also wondering if the change causes unexpected impacts to other modules (as some strange warning/error might show if I keep running the UT several times). |
I would add a dispatcher unit test. What I would do is: start a dispatcher without the shuffle tag, with 3 subscriptions, and dispatch to one of them. Then I would do something like: Another option you can use is to rely on the fact that Enum.random, which is most likely what we are using for shuffle, has a configurable seed (search for seed in the Enum docs). So you can set a consistent seed and then call the dispatcher one without the shuffle flag and the other with the shuffle flag plus seed, and see the result is different. |
Sorry that I don't quite get the first suggestion. For the second one, I thought about using seed but wonder whether applying the seed to generate reproducible (fixed) outcome against the property of shuffling. Is the UT in PR good enough? |
New GenStage is out, thank you so much! |
It's my pleasure. |
There is one statement in Subscriber's
init/1
function:However, the outcome does not seem to be working after testing.
After reading the Tuning Broadway RabbitMQ Pipelines for Latency by Dockyard and testing with project broadway_rabbitmq_experiment they prepare, it's found that the
demands
list in every Producer are exactly the same at the beginning using consumers' startup sequence.As described in my blog, if I add debug message in
GenStage
andBroadway
, and run that experiment project using below pipeline configuration, we can see the phenomenon.My suspicion is that the messages in the mailbox of each Producer process are filled by the sequence of Processors' startup sequence. That mean even though every Processor tries to subscribe to the Producer randomly by shuffling the producer names, but each Producer still receives the subscription signal from Processor No.1 firstly, and then No.2, and then No.3, and finally the No.10.
This phenomenon happens obviously when the batch size of the messages is equal to or smaller than the processor count. But even if the batch size is much larger, say 50, beside the first 10 will be sent to the No.1 processor, others will be biased to the processors started earlier. I think the load could only be balanced after the system running for a while.
Output logs:
Below is the output (sorted) if I send a batch of
50
messages.The text was updated successfully, but these errors were encountered: