Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMQ-7211 STOMP: Do not clear pending ACKs on COMMIT #359

Closed
wants to merge 2 commits into from
Closed

AMQ-7211 STOMP: Do not clear pending ACKs on COMMIT #359

wants to merge 2 commits into from

Conversation

borgstrom
Copy link

@borgstrom borgstrom commented May 21, 2019

This PR first adds a test that proves the behavior is broken: c3729b0

When you run this test against master it will fail:

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.activemq.transport.stomp.Stomp12Test
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 62.016 sec <<< FAILURE! - in org.apache.activemq.transport.stomp.Stomp12Test
testClientIndividualAckPrefetchTransaction(org.apache.activemq.transport.stomp.Stomp12Test)  Time elapsed: 61.822 sec  <<< FAILURE!
java.lang.AssertionError: null
	at org.junit.Assert.fail(Assert.java:86)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.junit.Assert.assertTrue(Assert.java:52)
	at org.apache.activemq.transport.stomp.Stomp12Test.testClientIndividualAckPrefetchTransaction(Stomp12Test.java:413)


Results :

Failed tests: 
  Stomp12Test.testClientIndividualAckPrefetchTransaction:413 null

Tests run: 1, Failures: 1, Errors: 0, Skipped: 0

The test output shows that it only took 4 messages to trigger this condition:

2019-05-20 21:54:48,710 [me-limited test] - INFO  Stomp12Test                    - Broker sent: MESSAGE
expires:0
destination:/queue/org.apache.activemq.transport.stomp.Stomp12Test.testClientIndividualAckPrefetchTransaction
JMSXUserID:system
ack:ID:eborgstrom-NW3081-55336-1558414488397-6:5
subscription:1
priority:4
message-id:ID:eborgstrom-NW3081-55336-1558414488397-5:1:-1:1:5
timestamp:1558414488648

4
2019-05-20 21:54:48,711 [0.1:55337@55324] - WARN  ProtocolConverter              - Exception occurred processing: ACK -> org.apache.activemq.transport.stomp.ProtocolException: Unexpected ACK received for message-id [null]
2019-05-20 21:54:48,712 [0.1:55337@55324] - DEBUG ProtocolConverter              - Exception detail
org.apache.activemq.transport.stomp.ProtocolException: Unexpected ACK received for message-id [null]
	at org.apache.activemq.transport.stomp.ProtocolConverter.onStompAck(ProtocolConverter.java:476)
	at org.apache.activemq.transport.stomp.ProtocolConverter.onStompCommand(ProtocolConverter.java:251)
	at org.apache.activemq.transport.stomp.StompTransportFilter.onCommand(StompTransportFilter.java:85)
	at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
	at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:233)
	at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:215)
	at java.lang.Thread.run(Thread.java:748)
2019-05-20 21:54:48,716 [me-limited test] - INFO  Stomp12Test                    - Broker sent: ERROR
message:Unexpected ACK received for message-id [null]
content-type:text/plain

org.apache.activemq.transport.stomp.ProtocolE...d.java:748)

Which is the exception I reported in https://issues.apache.org/jira/browse/AMQ-7211

The fix for this (be19dca) is simply to NOT clear pending ACKs on transaction COMMIT & ABORT.

Looking at the ticket referenced in the commit that added the pedingAcks.clear() lines (52d95ee / https://issues.apache.org/jira/browse/AMQ-5423) I believe that changing the this.pedingAcks.get to this.pedingAcks.remove is the real solution to the memory leak that prompted the change and that clearing the acks at the end of a transaction is not needed and incorrect behavior.

CC @tabish121 -- since you commited 52d95ee and worked on AMQ-5423

@borgstrom
Copy link
Author

@tabish121 / @gtully / @cshannon -- Can I get someone to give this a review please?

@tabish121
Copy link
Contributor

I took a look at the PR and reviewed the STOMP code a bit. The fix isn't really complete and in review I've noticed a couple other oddities around transactions and the pending Acks tracking which also needs fixed.

@borgstrom
Copy link
Author

Thanks for looking, @tabish121.

The fix isn't really complete

Can you elaborate on what you mean by this? Do you mean that you'd like me to do more around the fix, and if so can you provide some guidance? Or do you mean that you are planning on doing some work on making the fix "complete"?

For me, the biggest thing is that the unit test I wrote here covers our use case and I would like to see it included in the tests to ensure that we aren't subject to regressions in the future.

Thanks again!

@tabish121
Copy link
Contributor

Clearing the pending acks was intended to mitigate leaks in the ack ID tracking and by removing the clears the original issue is reintroduced. The clear isn't correct as noted here but the accounting of the ack Ids needs to be done such that we aren't leaking them after TX operations or client and client-individual acknowledgements.

There is a more serious issue here in that the ack Ids are lost when a Transaction is aborted and so the client cannot again ACK a message or NACK it for that matter as the entries for some or all the TX enlisted messages are lost (depending on ACK mode).

I am working on a fix that should mitigate all these issues and adding additional tests to cover this

@tabish121
Copy link
Contributor

This issue along with loss of ack ids after TX abort is fixed in 063d24e

@tabish121 tabish121 closed this May 29, 2019
@borgstrom borgstrom deleted the stomp-individual-ack-transaction branch May 29, 2019 23:13
@borgstrom
Copy link
Author

Thank-you very much for the comprehensive fix & accompanying test cases, @tabish121!

@tabish121
Copy link
Contributor

I have not merged to the 5.15.x branch as yet, I'd appreciate if you could test the change on a local build with your application and see if the fix holds up to that use case.

@borgstrom
Copy link
Author

 INFO | Apache ActiveMQ 5.16.0-SNAPSHOT (localhost, ID:eborgstrom-NW3081-51785-1559173167453-0:1) started

Just sent 1000 messages through with transactions enabled and everything looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants