[QUERY] Can excessive memory usage by reactor.util.concurrent.SpscArrayQueue while consuming from an EventHub be controlled? Or is this a bug? #39386

ChrisCollinsIBM · 2024-03-25T16:29:45Z

Query/Question
During consumption from an EventHub, we're seeing an exceptionally large amount of memory in-use in the java heap.

In the example below there are PartitionEvent objects containing EventData and then AmqpMessageBody bodies of roughly 148,000 bytes (which itself is multiple messages which is why the message is so large).

This ArrayList contains about 500 of these (but could contain 549 based on the instantiated size), and the SpscArrayQueue could contain up to 512 of these ArrayList objects.

Class Name                                                                                      | Shallow Heap | Retained Heap | Percentage
--------------------------------------------------------------------------------------------------------------------------------------------
reactor.util.concurrent.SpscArrayQueue @ 0x603b21700                                            |          400 | 8,433,524,592 |     21.14%
'- array java.lang.Object[512] @ 0x6052d6dd0                                                    |        2,064 | 8,433,524,192 |     21.14%
   '- java.util.ArrayList @ 0x6e97f36e0                                                         |           32 |    50,389,456 |      0.13%
      '- elementData java.lang.Object[549] @ 0x947f6cdb0                                        |        2,208 |    50,389,424 |      0.13%
         '- com.azure.messaging.eventhubs.models.PartitionEvent @ 0x6ef385a80                   |           32 |       148,496 |      0.00%
            '- eventData com.azure.messaging.eventhubs.EventData @ 0x6ef386b80                  |           32 |       148,432 |      0.00%
               '- annotatedMessage com.azure.core.amqp.models.AmqpAnnotatedMessage @ 0x6ef388d30|           48 |       148,224 |      0.00%
                  |- amqpMessageBody com.azure.core.amqp.models.AmqpMessageBody @ 0x6ef38ca50   |           32 |       147,696 |      0.00%
                  |- messageAnnotations java.util.HashMap @ 0x6ef38caa0                         |           48 |           288 |      0.00%
                  |- properties com.azure.core.amqp.models.AmqpMessageProperties @ 0x6ef38cb20  |           64 |            64 |      0.00%
                  |- deliveryAnnotations java.util.HashMap @ 0x6ef38ca70                        |           48 |            48 |      0.00%
                  |- footer java.util.HashMap @ 0x6ef38cad0                                     |           48 |            48 |      0.00%
                  '- header com.azure.core.amqp.models.AmqpMessageHeader @ 0x6ef38cb00          |           32 |            32 |      0.00%
--------------------------------------------------------------------------------------------------------------------------------------------

So we're trying to understand why there are over 56,000 messages in memory (8,433,524,592 total heap / 148,496 sample message size) when default cache (100) and prefetch (300) values are being used with a batch size of 999.

There seems to be one of these caches for each partition-pump-x-x thread as well which scales the effect of this memory usage up massively when collecting from multiple partitions or EventHubs.

Why is this not a Bug or a feature Request?
Before making this a bug, we want to ensure there aren't some settings that can control this that we can make use of.

Setup (please complete the following information if applicable):

OS: RHEL7 / Java 8.0.7.20
Library/Libraries:
com.azure:azure-core:1.28.0
com.azure:azure-core-amqp:2.0.5.1
com.azure:azure-messaging-eventhubs:5.11.2
com.azure:azure-messaging-eventhubs-checkpointstore-blob:1.12.2
com.azure:azure-storage-blob:12.21.1
com.azure:azure-storage-common:12.16.1

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

[ X ] Query Added
[ X ] Setup information Added

The text was updated successfully, but these errors were encountered:

anuchandy · 2024-03-25T17:14:28Z

Hi @ChrisCollinsIBM, we optimized the memory allocation of Processor in version 5.18.0 azure-sdk-for-java/sdk/eventhubs/azure-messaging-eventhubs/CHANGELOG.md at main · Azure/azure-sdk-for-java (github.com), this should lower the memory usage that you are seeing.

Each partition has a dedicated connection (link) to the service, the in-memory queues exist for each partition-receive. Each partition-receive is managed by an instance of partition-pump-x-x, so as you observed, allocation adds up based on the number of partitions-receive hosted in one machine.

ChrisCollinsIBM · 2024-03-25T17:23:10Z

Thanks for the prompt response @anuchandy, I did find #38572 when I went digging into the EventHub Messaging history so I presume that's the fix you're referring to.

Since we're using the prefetch defaults (300) what would you suggest for a reasonable batch size? I see 100 tossed around in many discussions using a 3:1 prefetch:batch ratio but maybe that was cache related. But then in some other places I see 10 as a batch. Some guidance on this would be great, thanks!

anuchandy · 2024-04-17T22:37:19Z

Hello @ChrisCollinsIBM, sorry for the late response. @conniey and I discussed this. We don’t have a one-size-fit-all recommendation for tuning prefetch and batch size for optimal memory, there is also a third variable of expected event(s) size. Our suggestion is to run the application (with actual event processing logic) and tune these values to achieve expected throughput. While doing this exercise, identify an appropriate value to set for max heap size (-Xmx). The idea is, once the application run reaches a steady state with expected throughput, force a full GC using tools such as JConsole, check how much memory is occupied after the full GC. You want to size the heap such that only ~30% is occupied after full GC; use this value to set the max heap size (-Xmx). Size the host (e.g., container) memory to have an "additional ~1 GB" of memory for the "non-heap" need for the JVM instance.

anuchandy · 2024-04-29T16:35:06Z

Closing this, refer previous comment.

joshfree added the Event Hubs label Mar 26, 2024

joshfree assigned conniey Mar 26, 2024

github-actions bot removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Mar 26, 2024

anuchandy self-assigned this Apr 29, 2024

anuchandy closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUERY] Can excessive memory usage by reactor.util.concurrent.SpscArrayQueue while consuming from an EventHub be controlled? Or is this a bug? #39386

[QUERY] Can excessive memory usage by reactor.util.concurrent.SpscArrayQueue while consuming from an EventHub be controlled? Or is this a bug? #39386

ChrisCollinsIBM commented Mar 25, 2024 •

edited

anuchandy commented Mar 25, 2024

ChrisCollinsIBM commented Mar 25, 2024

anuchandy commented Apr 17, 2024

anuchandy commented Apr 29, 2024

[QUERY] Can excessive memory usage by reactor.util.concurrent.SpscArrayQueue while consuming from an EventHub be controlled? Or is this a bug? #39386

[QUERY] Can excessive memory usage by reactor.util.concurrent.SpscArrayQueue while consuming from an EventHub be controlled? Or is this a bug? #39386

Comments

ChrisCollinsIBM commented Mar 25, 2024 • edited

anuchandy commented Mar 25, 2024

ChrisCollinsIBM commented Mar 25, 2024

anuchandy commented Apr 17, 2024

anuchandy commented Apr 29, 2024

ChrisCollinsIBM commented Mar 25, 2024 •

edited