-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A single Verticle consumer is processing messages concurrently for a worker thread pool #3798
Comments
Duplicate of #3790 |
OK, so it is been handled already, at least I hope that unit test give you guys another perspective. |
thanks @guidomedina maybe we need to rethink the affinity between task queue and duplicate contexts to avoid concurrent access like this I think we could introduce a new flag for worker execution, currently we have sequential/unordered, we would add a third mode that is per duplicate context that is the current behavior and rollback by default to sequential for the verticle. |
But I do understand what you are saying which is a little different than the PR |
#3793 seems to bring lot of complexity, I will have a look at the original issue and this one |
it seems to me we should
|
@vietj So I like that idea at the surface, but I do see some issues with it.
While the PR does add some complexity. It does properly handle the issues of new duplicates for tracing being produced, as those will propagate the |
context0.executeBlocking(ignored0 -> {
//excutecutes on the main TaskQueue for context0
JDBC jdbc = /* whatever */
jdbc.getConnection(con ->{
//executes on a new TaskQueue per invocation.
vertx.eventBus().consume("addr", msg ->{
vertx.executeBlocking(ignored1 -> {
//excutecutes on the main TaskQueue for context0 because there is a new DuplicatedContext.
});
});
});
}); If you don't completely separate the I know it is a weird case with the placement of the eventbus consumer, but I imagine more things will get tracing support and need a |
IMHO there should be no guarantees at all with executeBlocking/workers at least with regard to ordering. I think Ordered=true should be deprecated, because it implies one (identifiable) taskqueue, which a Context would need to enforce. Maybe it would be a good idea to tell executeBlocking on which Context to execute if the user should have control over it. |
Even though ordering and concurrency are closely related, the issue I'm reporting is the fact that a single consumer on a single Verticle instance is able to consume messages concurrently, this IMHO breaks the core concept of what originally a Verticle is and does, which is similar to the guarantees you get in Akka for example, you expect a single actor modifying state, but in this case such contract is broken as demonstrated with my unit test. |
yes perhaps there should be no ordering at all and then if a user needs
sequential execution then he should chain the task himself:
executeBlocking(t1, ar -> {
executeBlocking(t2, ar -> {
...
});
});
that's an interesting perspective.
…On Wed, Feb 10, 2021 at 5:50 PM Andreas Mack ***@***.***> wrote:
IMHO there should be no guarantees at all with executeBlocking/workers at
least with regard to ordering. I think Ordered=true should be deprecated,
because it implies one (identifiable) taskqueue, which a Context would need
to enforce.
Maybe it would be a good idea to tell executeBlocking on which Context to
execute if the user should have control over it.
(just my 2 cents)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3798 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDCWBRTFGRWKI23RKBQ3S6K2MZANCNFSM4XMZNTQA>
.
|
@guidomedina The contract is valid if you are executing on the event loop, it is broken as soon as you use executeBlocking(). If you insist on ordered blocking execution all the benefits of the Vertx model will go down the drain. I would advise the following:
BTW: Reduce setWorkerPoolSize() to 1. Your test will pass. @Gattag Old behavior is ok, but it is still confusing and it's not easy for Vertx maintainers to support it, code-wise and user-wise. @vietj and vertx maintainers Please keep this as simple as possible, keep the model simple. Otherwise you will get a lot of bug reports / issues with hard-to-reproduce concurrency bugs. I'm using Vertx since 2016, very successfully. It's better not to support all cases and not to overpromise. With DuplicatedContext I think you've worked around old JDBC code and got better behavior but the truth is that these problems need to be fixed in the driver, preferrably an asynchronous one. |
I have a complex queuing system sitting on top of Vert.x where each message processor do some heavy I/O, let me describe it:
What are the promises of my system:
I also have a HTTP and a TCP servers using Vert.x, this system has been in place for 3 years now, it was fine until 3.9.5 but now it isn't working with 4.0.x I simply came here because something was working and now is not, I'm not asking Vert.x guys to over-promise on anything, it will just take me sometime to migrate it to something like Akka. I have used Vert.x and Akka in 2 projects now, including a low latency distributed trading system, I don't mean to over-burden Vert.x with things that aren't its specialty. |
This is weird, how did you make sure that the Vertx was able to parallelize the messages to different addresses but not to the same address? Did you deploy a verticle for each address? |
Yes, there are not too many addresses, but some parallelism is required, so an Actor system was too much for me. |
Ok, but that would mean you can afford a single-threaded Executor in each of your verticle. This way you are always sure about in- order processing. Are you sure that the heavy IO code cannot become a bottleneck if too many messages arrive? |
If a message with the same hash or Some of the processors have something like 64 addresses with 4 threads so they are maxed out with minimal hash coalition, I know I know, Akka offers this but it was easier to do with Vert.x |
I understand. Did it work with 3.9.5 and workerPoolSize set to 4 for each verticle? |
But I do believe the idea of worker threads is kind of dumb, maybe it should be eliminated and just let the user pick between main loop thread(s) or another thread pool, for example it would be something like:
|
Up to 3.9.5 the system is working flawlessly, but you have to remember that if you name the worker thread pool it doesn't create a thread pool per Verticle, so it is basically one thread pool handling N addresses with N verticles and X threads |
Yes, that's the idea. I have a different idea, I did a similar thing a while ago:
(Maybe this can be done with Vertx Futures/Promises too) |
From the docs https://vertx.io/docs/vertx-core/java/#_verticles:
@guidomedina this is the documentation that you are referring to. I apologize to you, the contract seems to be exactly what you said. I still believe it is wrong and should be deprecated. |
I agree with you Andreas.
…On Wed, Feb 10, 2021 at 9:20 PM Andreas Mack ***@***.***> wrote:
Even though ordering and concurrency are closely related, the issue I'm
reporting is the fact that a single consumer on a single Verticle instance
is able to consume messages concurrently, this IMHO breaks the core concept
of what originally a Verticle is and does, which is similar to the
guarantees you get in Akka for example, you expect a single actor modifying
state, but in this case such contract is broken as demonstrated with my
unit test.
@guidomedina <https://github.com/guidomedina> The contract is valid if
you are executing on the event loop, it is broken as soon as you use
executeBlocking(). If you insist on ordered blocking execution all the
benefits of the Vertx model will go down the drain. I would advise the
following:
- Please re-think if you really need total order. Very often this is
not the case. Whatever you are doing in your blocking code, does it really
have a relationship with the previous and next message? Multimedia streams
and files/blobs broken up in pieces are things that I know where it is
needed, but you wouldn't use the eventloop/EventBus for that but a normal
TCP stream. Logging is the other thing but there are low latency solutions
for that so you don't need executeBlocking().
- If you decide that you need it, consider using a CompletableFuture
with a single-threaded Executor, chain them and the store the latest
CompletableFuture in the Context. This should give you the desired behavior.
- If you don't want to do that, maybe Vertx is not the right tool for
your use case. You are forcing everything into one thread basically. If the
arrival rate of the messages on the event bus exceeds the execution rate in
the executeBlocking() code, the application will have high latency
problems. I doubt that Akka can do it better. Project Loom is the only
thing that may help here.
BTW: Reduce setWorkerPoolSize() to 1. Your test will pass.
@Gattag <https://github.com/Gattag> Old behavior is ok, but it is still
confusing and it's not easy for Vertx maintainers to support it, code-wise
and user-wise.
@vietj <https://github.com/vietj> and vertx maintainers Please keep this
as simple as possible, keep the model simple. Otherwise you will get a lot
of bug reports / issues with hard-to-reproduce concurrency bugs. I'm using
Vertx since 2016, very successfully. It's better not to support all cases
and not to overpromise. With DuplicatedContext I think you've worked around
old JDBC code and got better behavior but the truth is that these problems
need to be fixed in the driver, preferrably an asynchronous one.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3798 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDCRJKZ3JSDL55BWQZT3S6LTA7ANCNFSM4XMZNTQA>
.
|
I think there are two things to address here for duplicate context: 1/ the worker execution that is currently broken |
I've made a draft branch to address the worker verticle execution here #3802 we can use it for execute blocking execution too |
@guidomedina can you check the branch I created ? I run your test and now it passes again |
@Gattag you said this is a duplicate of the execute blocking task but it is not, executeBlocking has a different semantic than worker execution (also it allows parallel execution happening). |
I think that execute blocking contract should be clarified on how it works for duplicate context, perhaps that an ordered execution by duplicate context is not the good default. I believe it is since 1/ this is most of the time what the user intend to have (e.g this solve work arounds we used to have for JDBC client) |
@vietj I said it was a duplicate because they are caused by the same thing under the hood, but they do have different semantics to an extent which I didn't fully conceptualize I agree that it is more efficient, than what I came up with, but the more I think about it, the more I think that the added taskqueue behavior should be reverted in its entirety. The biggest thing I don't get after looking more at jdbc client now is why was storing a task queue per connection considered a work around? At this point all I see is |
@vietj I agree, but how easy is it to tell the user not to update the state? My main concern is the Context object itself which has a lot of state. Does duplicate context have it's own Map for put/get? |
I fully agree @Gattag ! Thanks! |
3.9 concurrency behavior is what we should have in 4.x, I'm sorry to sorta change gears here from earlier, it just really doesn't seem like this will play out cleanly and it adds cases for weird thing to occur. I'm not sure if I'm not seeing the full picture here and I'm sorry if I'm wasting time with all these questions. I'm just concerned |
it shares the same map than the parent context and has a local map that can
be used for some specific use cases that requires to store data per
duplicate context (e.g tracing).
…On Thu, Feb 11, 2021 at 10:27 AM Andreas Mack ***@***.***> wrote:
@vietj <https://github.com/vietj> I agree, but how easy is it to tell the
user not to update the state?
My main concern is the Context object itself which has a lot of state.
Does duplicate context have it's own Map for put/get?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3798 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDCWHMAED2M6T7QOFLPTS6OPJHANCNFSM4XMZNTQA>
.
|
you mean here that executeBlocking by default should serialize on the
parent context and not on the duplicate, right ? (beside this worker
verticle execution bug)
…On Thu, Feb 11, 2021 at 10:40 AM Gattag ***@***.***> wrote:
3.9 concurrency behavior is what we should have in 4.x, I'm sorry to sorta
change gears here from earlier, it just really doesn't seem like this will
play out cleanly and it adds cases for weird thing to occur
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3798 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDCSQONHDNO7IKODWTFTS6OQ2XANCNFSM4XMZNTQA>
.
|
@vietj Yes |
But really what I mean is that, DuplicatedContext should have its own localData and that's it. There should be no special context with a special task queue |
The exception would be that internally jdbc could use a special context for that (A context that is not reachable outside of the internal workings of jdbc, so callbacks would not provide this context), but what I was saying earlier, I'm not grasping what was wrong with the solution of storing an independant TaskQueue with each JDBCConnection. I thought I understood before, but I realized I don't |
Looks good, the test is spot on (except for the comment I made), I should had probably made my tests much simpler like you did.
In other words, for any kind of thread pool the following being true will make Vert.x very reliable, having one producer and one verticle with one consumer the following should be true regardless of how many threads its assigned thread pool has:
|
Version
Which version did you encounter this bug? 4.0.2
Context
I'm using Vert.x Verticles to process messages sequentially per given address, up to 3.9.5 a consumer would only process messages in a sequential order but now they seem to be doing that concurrently.
Do you have a reproducer?
Yes, here is a simple unit test, it passes with 3.9.5 and fails with 4.0.2:
https://gist.github.com/guidomedina/ff20d1531bf59e046dd5fd5599918052
The text was updated successfully, but these errors were encountered: