Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOL-79061] Application spins into an infinite loop throwing com.solacesystems.jcsmp.StaleSessionException when reconnectRetries is set to a +ve number #179

Closed
gvensan opened this issue Oct 19, 2022 · 6 comments
Labels
enhancement New feature or request tracked Internally tracked by Solace's internal issue tracking system

Comments

@gvensan
Copy link

gvensan commented Oct 19, 2022

Please refer to the community post here.

With reconnetRetries set to a +ve number, after exhausting the reconnect attempts, the application goes for a spin throwing the following exception in a loop. Even after the broker is up, the application never reconnects.

The exception thrown in loop is:

com.solacesystems.jcsmp.StaleSessionException: Tried to call receive on a stopped message consumer.
	at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.throwClosedException(FlowHandleImpl.java:2040) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:918) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:885) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:293) ~[spring-cloud-stream-binder-solace-core-3.3.2.jar:na]
	at com.solace.spring.cloud.stream.binder.util.FlowReceiverContainer.receive(FlowReceiverContainer.java:225) ~[spring-cloud-stream-binder-solace-core-3.3.2.jar:na]
	at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.receive(InboundXMLMessageListener.java:114) [spring-cloud-stream-binder-solace-core-3.3.2.jar:na]
	at com.solace.spring.cloud.stream.binder.inbound.InboundXMLMessageListener.run(InboundXMLMessageListener.java:91) [spring-cloud-stream-binder-solace-core-3.3.2.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_321]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_321]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_321]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_321]
	at java.lang.Thread.run(Thread.java:750) [na:1.8.0_321]
Caused by: com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error receiving data from underlying connection.
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel$ClientChannelReconnect.call(TcpClientChannel.java:2439) ~[sol-jcsmp-10.13.1.jar:na]
	... 4 common frames omitted
Caused by: com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Message VPN Unavailable
	at com.solacesystems.jcsmp.protocol.impl.TcpChannel.executePostOnce(TcpChannel.java:232) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.protocol.impl.ChannelOpStrategyClient.performOpen(ChannelOpStrategyClient.java:97) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.performOpenSingle(TcpClientChannel.java:418) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.access$800(TcpClientChannel.java:114) ~[sol-jcsmp-10.13.1.jar:na]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel$ClientChannelReconnect.call(TcpClientChannel.java:2275) ~[sol-jcsmp-10.13.1.jar:na]
	... 4 common frames omitted
@Nephery
Copy link
Collaborator

Nephery commented Oct 19, 2022

Set reconnect-retries=-1 to retry forever. If not, then once all reconnect attempts have been exhausted, the health of the binder goes DOWN and the user will need to restart their application to recover. This is expected behavior.

As for the logging, this issue is very similar to #174 with the key difference being that while this issue is about the session dying, #174 was specifically about flow receivers dying.

Not entirely sure if the fix is handled by #174 (likely). But we should leave this open to capture the use case.

@Nephery Nephery added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Oct 19, 2022
@Nephery Nephery changed the title Application spins into an infinite loop throwing com.solacesystems.jcsmp.StaleSessionException when reconnectRetries is set to a +ve number [SOL-79061] Application spins into an infinite loop throwing com.solacesystems.jcsmp.StaleSessionException when reconnectRetries is set to a +ve number Oct 19, 2022
@Nephery Nephery added the tracked Internally tracked by Solace's internal issue tracking system label Oct 19, 2022
@gvensan
Copy link
Author

gvensan commented Oct 20, 2022

Hi @Nephery, Thanks for looking into this. I guess #174 is a related issue.

But the question is, with reconnectRetries set to 3 (say) when the reconnect is not successful after 3 attempts, what is the right behavior?
a) Application can exit/crash
b) Throw an exception

But why would it invoke or wake up to call receiveMessage on a stale connection? That is the root cause of infinite logging and is affecting fail-over scenarios. Is this a bug that will get addressed, or is any workaround available (configuration settings)

Can you give a hint on how the resolution would look like?

@Nephery
Copy link
Collaborator

Nephery commented Oct 20, 2022

But the question is, with reconnectRetries set to 3 (say) when the reconnect is not successful after 3 attempts, what is the right behavior?
a) Application can exit/crash
b) Throw an exception

None of the above. Once all reconnect attempts have been exhausted, the health of the binder just goes DOWN.

But why would it invoke or wake up to call receiveMessage on a stale connection? That is the root cause of infinite logging and is affecting fail-over scenarios. Is this a bug that will get addressed, or is any workaround available (configuration settings)
Can you give a hint on how the resolution would look like?

Handling this will be part of this resolution. Likely I'll just close the underlying consumer threads when a StaleSessionException is received. Since the session isn't able to recover in this scenario, there isn't any point keeping them alive.

@gvensan
Copy link
Author

gvensan commented Oct 20, 2022

Sounds good - that sounds logical. Any ETA for this fix?

@mackenza
Copy link
Contributor

ETA is we are trying to get this and #174 done in Q1CY23

@carolmorneau
Copy link
Collaborator

Closed with #215

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tracked Internally tracked by Solace's internal issue tracking system
Projects
None yet
Development

No branches or pull requests

4 participants