Skip to content

Conversation

@lenoch7
Copy link

@lenoch7 lenoch7 commented Nov 26, 2025

See full description in https://issues.apache.org/jira/browse/AMQ-9813

Simply:
It seems like 8b23e07 breaks previous contract of expireOldMessages() methods, which can caused that expired messages are not "processed", when expiration is triggered by tryAddMessageLast(MessageReference, long), addMessageFirst(MessageReference) methods.

@lenoch7 lenoch7 force-pushed the AMQ-9813_IncorrectQueueSizeNonPersistentMessageWithTTL branch from bcd0024 to bcc27b3 Compare November 26, 2025 08:25
- add "missing" invocation of discardExpiredMessage() method
  into tryAddMessageLast(), addMessageFirst(), probably caused
  in context of AMQ-5785
- use same "postponed" strategy (outside of synchronization) like was
  already done in original commit (see onUsageChanged() method)
@lenoch7 lenoch7 force-pushed the AMQ-9813_IncorrectQueueSizeNonPersistentMessageWithTTL branch from bcc27b3 to 5c5eb33 Compare November 26, 2025 10:18
@cshannon cshannon self-requested a review November 26, 2025 14:51
Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty good to me, thanks for the PR. The discard callback definitely is getting missed in those methods and there's even a possible deadlock as well when discarding a single message still so this is a good catch and fix.

I had a few minor tweaks/suggestions mostly involved around avoiding extra creation of array lists when we don't need to. It was easier to just push up a follow on commit here: cshannon@ec20d0b

You can either apply the changes to your branch or I can apply them after. I think this can be merged (either with my changes applied now or later) assuming the CI tests all pass.

@cshannon cshannon self-requested a review November 26, 2025 16:49
Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a comment in line as well but I realized that this updated fix will now slightly increase the change of OOM (i say slighly as we are adding a single message) so I need to think through if there's something we can do to avoid that with how it's current written.

I suppose it may not be that big of a deal with a single message being added but it would be nice to fix it. One thing that could be done is to not decrement the references for the expired messages until after we process the discard loop, but in that case the hasSpace() second call after expiration becomes pretty pointless as the point of that check is to do it after memory was possibly freed up from the expiration.

if (!hasSpace()) {
if (isDiskListEmpty()) {
expireOldMessages();
expiredMessages = expireOldMessages();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I just realized that this is actually could be issue because the expireOldMessages() method is going to decrement the reference on each message, but the messages get copied to that temporary list. The problem is that now that we are hanging onto the list reference (to process later outside of the lock) we are more likely to blow memory because the hasSpace() method will think there is free space but those messages are still held in the heap until later. We are only adding one message but if that message happens to be very large it could be an issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for very quick feedback and analyze. I think, I got the point, which you mean. But unfortunately I have no idea, how to change/improve it, except the way you already mentioned - put new message to disk (temp storage) immediately without attempt to "dispose" expired messages before switch to disk (if I really got the point).

Maybe it will be better to close this PR without merge to give you more comfortable way to make fix/changes/improvements. I am happy, you got the problem, which I tried to report.

if (!hasSpace()) {
if (isDiskListEmpty()) {
expireOldMessages();
expiredMessages.addAll(expireOldMessages());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in addMessageFirst

@lenoch7
Copy link
Author

lenoch7 commented Nov 27, 2025

I am sorry for of topic, but...
I noticed, that one of build - Build 2 (https://ci-builds.apache.org/job/ActiveMQ/job/ActiveMQ/job/PR-1551/2/cloudbees-pipeline-explorer/) failed (org.apache.activemq.web.RestTest.testConsumeAsyncTimeout). I don't believe that it is caused by this PR (change). I guess (maybe), the errorResult async request was received as first request (before result request), regardless on "position" in the test code (because they are async request). "Stupid" sleep between these two async requests in the test? But it is different story, I am sorry again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants