Skip to content

ARTEMIS-6009 Performance improvement when consuming large messages#6369

Merged
jbertram merged 2 commits intoapache:mainfrom
AntonRoskvist:ARTEMIS-6009
Apr 21, 2026
Merged

ARTEMIS-6009 Performance improvement when consuming large messages#6369
jbertram merged 2 commits intoapache:mainfrom
AntonRoskvist:ARTEMIS-6009

Conversation

@AntonRoskvist
Copy link
Copy Markdown
Contributor

Current solution reads the message payload using Javas OutputStream default implementation which iterates over a given byte array and returns an int for each value.

This change instead passes the byte array "as is" through the ActiveMQOutputStream associated with the large message into its ActiveMQBuffer.

I'm seeing a "real world" performance improvement of about 170% for a client handling exclusively large messages.

I'm not quite sure how I go about writing a meaningful test for this, any feedback on that would be greatly appreciated.

@jbertram
Copy link
Copy Markdown
Contributor

jbertram commented Apr 19, 2026

Looking at the implementation of org.apache.activemq.artemis.api.core.ActiveMQBuffer#writeBytes(byte[], int, int) versus java.io.OutputStream#write(byte[], int, int) I can see why the former would be faster (i.e. since it's writing in chunks instead of each individual byte).

I might be would be worth having a JMH test (see tests/performance-jmh). If you put the test into its own commit folks can cherry-pick it and test it on branches with different implementations and compare the results. It would be great if you could summarize the results here, if possible.

Aside from that, regression tests would probably suffice.

@AntonRoskvist
Copy link
Copy Markdown
Contributor Author

Thanks @jbertram,

I've never worked with JMH previously so that might take some time to get in place...

In the meantime, I have some additional figures from what I'm seeing when testing this change:

In a "real world" scenario, i.e broker and client running on dedicated servers, communicating over a network:
Without PR: Client can process an average of 673 msgs/s, running on 100% CPU.
With PR: Client can process an average of 2200 msgs/s, using ~50% CPU.
(current bottleneck there is defined limit on network utilization from the cloud provider)

I've also set up and run a test locally, using a standalone broker (2.53.0), default configuration, with 300k messages preloaded like this:
bin/artemis producer --message-count 60000 --text-size 600000 --destination queue://LARGE.MSG.LOAD --threads 5 --url 'tcp://localhost:61616?compressLargeMessage=true'

Messages are then consumed by a "cli consumer" using either broker release 2.53.0 or a version built on top of this PR.

Messages in this test are compressed to save on storage space

This is the command used to consume messages:
bin/artemis consumer --message-count 60000 --destination queue://LARGE.MSG.LOAD --threads 5

Results:
Without PR: Consumer finishes in 1316 seconds, averaging 228 msgs/s
With PR: Consumer finishes in 79 seconds, averaging 3797 msgs/s

I collected flame graphs from the local tests which I have added here:
org_flamegraph.html
pr_flamegraph.html

@tabish121
Copy link
Copy Markdown
Contributor

The change makes sense and maps to how this is normally handled in other areas of the broker code as well. You normally would override those built in stream methods as the default implementation simply calls the main writeByte method in a loop which is quite inefficient.

@AntonRoskvist
Copy link
Copy Markdown
Contributor Author

Also, when I said: "I'm not quite sure how I go about writing a meaningful test for this, any feedback on that would be greatly appreciated."

I meant that I'm not quite sure how to go about writing a regression test to validate the new behavior... at least not without relaxing access to the ActiveMQOutputStream and adding some absolute spaghetti around it... I also tried using Mockito but gave up as I simply could not get it to work properly.

I'll keep trying but if anyone has an idea, It's probably better than what I'm currently piecing together.

@jbertram
Copy link
Copy Markdown
Contributor

@AntonRoskvist those results are compelling! Nice work.

Previously when I said, "...regression tests would probably suffice," I meant that existing tests would probably suffice for detecting regressions.

Ultimately, if you provide a way for folks to independently verify the performance improvement and the existing test-suite is green then I think that's sufficient.

@tabish121
Copy link
Copy Markdown
Contributor

I ran this through CI and all tests are passing. The commit message does not contain the related JIRA which needs to be fixed.

@AntonRoskvist AntonRoskvist changed the title Performance improvement when consuming large messages ARTEMIS-6009 Performance improvement when consuming large messages Apr 21, 2026
@AntonRoskvist
Copy link
Copy Markdown
Contributor Author

@jbertram @tabish121 thanks!

I've been unable to make a decent JMH-test for this, so I instead added a very simple test under "soak-tests" in a secondary commit. I'm very open to exclude that unless you feel it adds some value.

If nothing else it should serve as a simple way to try this out for yourselves.

@jbertram
Copy link
Copy Markdown
Contributor

@AntonRoskvist I used your test on main and I got around 39s and on your branch it was around 9s. Given the test-suite is green I'm merging this. Thanks!

@jbertram jbertram merged commit ae17380 into apache:main Apr 21, 2026
6 checks passed
@clebertsuconic
Copy link
Copy Markdown
Contributor

@AntonRoskvist / @jbertram I'm replacing the soak test by a MockedTest

This test was taking 3 minutes on my laptop, and 2 minutes on the CI.

The MockedTest I wrote will validate the write([], int, int) was used.

@clebertsuconic
Copy link
Copy Markdown
Contributor

PR sent here to replace test: #6385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants