ARTEMIS-1999 Broker uses 100% core's CPU time if msg grouping is used #2203

franz1981 · 2018-07-31T12:39:21Z

The deliver loop won't give up trying to deliver messages when
back-pressure kicks in (credits and/or TCP) if msg grouping is used and
there are many consumers registered: this change will allow the loop
to exit by instructing the logic that the group consumer is the only
consumer to check.

franz1981 · 2018-07-31T12:40:22Z

@clebertsuconic Please take a look if the change seems to break any other logic and @michaelandrepearce if exclusive consumers should be affected by a similar issue (IMO it shouldn't be the case)

clebertsuconic · 2018-07-31T13:10:41Z

Would be possible a test? without a test I don't know how to validate the change TBH.

At least were you able to run the whole testsuite.

franz1981 · 2018-07-31T13:12:35Z

I have already run the entire test suite that is already filled of AMQP and CORE JMS message group tests AFAIK.
About the performance implications (not anymore 100% CPU) I don't know what could I write to validate it with a test...

clebertsuconic · 2018-07-31T14:23:38Z

@franz1981 make a test that will exercise the loop. Try to get a synchronize on Queue...

if the test hangs.. it's a bug... use a timeout tag on the test.

michaelandrepearce · 2018-07-31T14:38:20Z

@franz1981 i exclusive yes i think would have similar issue, after all it followed the same logic of message groups in part. It would be easy to fix, as in the same place / if statement just check for the exclusive flag. (best way to confirm this would be to extend the test case @clebertsuconic is suggesting to do the same with exclusive)

Btw your fix assumes a message group has been assigned already, what if all consumers are busy and a message group isnt assigned? it would still spin i assume.

I would probably change the check you are doing to see if groupID is set or not.

michaelandrepearce · 2018-07-31T14:47:52Z

artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/QueueImpl.java

@@ -2370,10 +2370,10 @@ private void deliver() {
               }
            }

-            if (pos == endPos) {
-               // Round robin'd all
+            if (pos == endPos || groupConsumer != null) {


groupConsumer is only set if a msg group is already assigned, or has succesfully handled. i would change this to check if groupID is not null.

franz1981 · 2018-07-31T14:53:30Z

exclusive yes i think would have similar issue, after all it followed the same logic of message groups in part. It would be easy to fix, as in the same place / if statement just check for the exclusive flag.

Nice, I suppose that would be better to raise a different issue/PR for that even if I'm tempted to do it fro this one: it is indeed fixing a similar but different issue.
I'm anyway opened to do it differently too 👍

what if all consumers are busy and a message group isnt assigned? it would still spin i assume.

if the message group isn't assigned it will round robin between the consumers until noDeliver == size: in that case it will stop spinning without burning any CPU, because deliverAsync won't be called anymore.
The reason why it won't stop spinning if groupConsumer is set is that the consumer round robin isn't happening in that case and it won't hit the logic that will stop the loop.

michaelandrepearce · 2018-07-31T15:11:41Z

@franz1981 once you do fix for this with test and this is merged, ill fix exclusive quickly, as then i can just rip your hard work :P :P :P

btw how you get a clean PR build ? without the damn MultiThreadAsynchronousFileTest failing...like it has been for most PR's recently?

franz1981 · 2018-07-31T15:21:32Z

@michaelandrepearce ahah fine!

btw how you get a clean PR build ? without the damn MultiThreadAsynchronousFileTest failing...like it has been for most PR's recently?

That's a nice question indeed...it isn't an intermittent failure? Probably I've been lucky...or unlucky depends on the point of views :P

clebertsuconic · 2018-07-31T16:09:27Z

@michaelandrepearce would be too much of a hack to add a -Ptravis profile and add a property to ignore those tests?

Those tests run at least daily on my CI and they never fail.

It's probably a question of limited resources or the use of hypervisors on a public cloud. We don't actually need those to validate PR buidls.

clebertsuconic · 2018-07-31T16:14:03Z

@franz1981 so are you adding a test?

michaelandrepearce · 2018-07-31T16:24:53Z

@clebertsuconic before i went on holiday it didnt seem to error as much as it is now, i wonder if some recent merge has destabilized the build and in particular this test case? I would worry about ignoring it, as its a concurrency test so it maybe actually highlighting an issue thats been introduced by some recent change / merge. Has there been any changes around the journals?

clebertsuconic · 2018-07-31T16:28:13Z

@michaelandrepearce there are no changes around the journal...

We already have a profile that will ignore a lot of tests.. and anyone running the full testsuite would still be able to capture regressions.

I will take a look on the failures, but if this is indeed environmental we should move them to the ignore-list on the -Pfast-tests

clebertsuconic · 2018-08-01T01:43:46Z

@franz1981 please watch the examples. On my private CI an example failed on your branch.

franz1981 · 2018-08-01T07:30:38Z

@clebertsuconic I have tried both message-group and message-group2 examples and they are working: which examples are not working for you?
I have rebased and run all the examples and now it seems to pass: i suppose that the failures wasn't related to the change I did.
I have added a test (that's quite strict I would say) to test it too: let me know if it seems ok 👍

clebertsuconic · 2018-08-03T16:00:22Z

@franz1981 can you rebase here? what to do with this?

clebertsuconic · 2018-08-03T18:16:33Z

just one nit-pick.. this test should be on QueueImplTest, away from timing... (integration-tests)

can you move it?

The deliver loop won't give up trying to deliver messages when back-pressure kicks in (credits and/or TCP) if msg grouping is used and there are many consumers registered: this change will allow the loop to exit by instructing the logic that the group consumer is the only consumer to check.

clebertsuconic · 2018-08-08T19:48:49Z

@franz1981 I have moved this into 2.6.x but I had to manually make the changes.. can you check 2.6.x please?

michaelandrepearce reviewed Jul 31, 2018

View reviewed changes

franz1981 force-pushed the ARTEMIS-1999 branch 2 times, most recently from 149dd7f to 226d12f Compare August 1, 2018 08:21

michaelandrepearce mentioned this pull request Aug 2, 2018

ARTEMIS-856 Test fixes - Alternative #2211

Merged

franz1981 force-pushed the ARTEMIS-1999 branch from 226d12f to 3105b09 Compare August 3, 2018 16:10

franz1981 force-pushed the ARTEMIS-1999 branch from 3105b09 to 755c6e2 Compare August 4, 2018 07:45

asfgit closed this in 3d2eadf Aug 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARTEMIS-1999 Broker uses 100% core's CPU time if msg grouping is used #2203

ARTEMIS-1999 Broker uses 100% core's CPU time if msg grouping is used #2203

franz1981 commented Jul 31, 2018

franz1981 commented Jul 31, 2018

clebertsuconic commented Jul 31, 2018

franz1981 commented Jul 31, 2018

clebertsuconic commented Jul 31, 2018

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

michaelandrepearce Jul 31, 2018

franz1981 commented Jul 31, 2018 •

edited

Loading

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

franz1981 commented Jul 31, 2018 •

edited

Loading

clebertsuconic commented Jul 31, 2018 •

edited

Loading

clebertsuconic commented Jul 31, 2018

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

clebertsuconic commented Jul 31, 2018

clebertsuconic commented Aug 1, 2018

franz1981 commented Aug 1, 2018 •

edited

Loading

clebertsuconic commented Aug 3, 2018

clebertsuconic commented Aug 3, 2018

clebertsuconic commented Aug 8, 2018

ARTEMIS-1999 Broker uses 100% core's CPU time if msg grouping is used #2203

ARTEMIS-1999 Broker uses 100% core's CPU time if msg grouping is used #2203

Conversation

franz1981 commented Jul 31, 2018

franz1981 commented Jul 31, 2018

clebertsuconic commented Jul 31, 2018

franz1981 commented Jul 31, 2018

clebertsuconic commented Jul 31, 2018

michaelandrepearce commented Jul 31, 2018 • edited Loading

michaelandrepearce Jul 31, 2018

Choose a reason for hiding this comment

franz1981 commented Jul 31, 2018 • edited Loading

michaelandrepearce commented Jul 31, 2018 • edited Loading

franz1981 commented Jul 31, 2018 • edited Loading

clebertsuconic commented Jul 31, 2018 • edited Loading

clebertsuconic commented Jul 31, 2018

michaelandrepearce commented Jul 31, 2018 • edited Loading

clebertsuconic commented Jul 31, 2018

clebertsuconic commented Aug 1, 2018

franz1981 commented Aug 1, 2018 • edited Loading

clebertsuconic commented Aug 3, 2018

clebertsuconic commented Aug 3, 2018

clebertsuconic commented Aug 8, 2018

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

franz1981 commented Jul 31, 2018 •

edited

Loading

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

franz1981 commented Jul 31, 2018 •

edited

Loading

clebertsuconic commented Jul 31, 2018 •

edited

Loading

michaelandrepearce commented Jul 31, 2018 •

edited

Loading

franz1981 commented Aug 1, 2018 •

edited

Loading