Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOL-79060] Health indicator should capture binding health statuses #145

Closed
GreenRover opened this issue Jun 27, 2022 · 10 comments · Fixed by #237
Closed

[SOL-79060] Health indicator should capture binding health statuses #145

GreenRover opened this issue Jun 27, 2022 · 10 comments · Fixed by #237
Labels
enhancement New feature or request tracked Internally tracked by Solace's internal issue tracking system

Comments

@GreenRover
Copy link
Contributor

Scenario:

  • Application starts up
  • A queue will be provisioned
  • Some one deleted the queues (by a mistake)

Result:

2022-06-26T08:03:52.664+0000 ERROR Unable to send request message
org.springframework.messaging.MessagingException: Unable to send message to topic tms/monitoring/monalesy/p/v1/serviceState/request; nested exception is com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error receiving data from underlying connection.
    at com.solace.spring.cloud.stream.binder.util.ErrorChannelSendingCorrelationKey.send(ErrorChannelSendingCorrelationKey.java:57) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessagingException(JCSMPOutboundMessageHandler.java:142) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessage(JCSMPOutboundMessageHandler.java:98) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler.handleMessageInternal(AbstractMessageChannelBinder.java:1074) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:222) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:164) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:144) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]

Expected:
The SolaceBinderHealthIndicator of the application change to unhealthy.

Because mas last 10 pull requests was not merged i only create an issue

@GreenRover
Copy link
Contributor Author

Ping @Mrc0113

@Nephery
Copy link
Collaborator

Nephery commented Aug 23, 2022

The solace binder health indicator currently only captures the health of the binder's PubSub+ session. It does not currently capture any binding health statuses since their flows were configured to try reconnecting forever. But it seems that some cases (like this one) aren't captured by the flow reconnect feature.

To add this, we'll need to do some digging around to see if there's a good way to add binding statuses to the binder's health indicator (since its not a composite health indicator) or if/how other SCSt binders capture binding health statuses.

If there's no good way to add this to the existing indicator, we might have to add some custom config option to change the solace binder health indicator into a composite health indicator, which when enabled, would show the health statuses for both the binder's session as well as for all its bindings.

@Nephery Nephery added the enhancement New feature or request label Aug 23, 2022
@Nephery Nephery changed the title HealthCheck: does not react on missing queues HealthCheck: does not capture binding health statuses Aug 23, 2022
@Nephery Nephery changed the title HealthCheck: does not capture binding health statuses Health indicator should capture binding health statuses Aug 23, 2022
@mackenza
Copy link
Contributor

This has been logged in the Solace Jira and we are targeting Q1CY2023 for a fix.

@Nephery Nephery changed the title Health indicator should capture binding health statuses [SOL-79060] Health indicator should capture binding health statuses Sep 29, 2022
@Nephery Nephery added the tracked Internally tracked by Solace's internal issue tracking system label Sep 29, 2022
@Nephery
Copy link
Collaborator

Nephery commented Oct 21, 2022

@GreenRover Just double checking, but did you post the correct stacktrace?

Looking again at the one you posted, this looks like an error on the producer side. But you shouldn't get this error by just deleting the queue. Deleting the queue should result in a different error, and it would be one on the consumer.

I was only able to reproduce a similar stacktrace by killing the session (e.g. session reconnect attempts exhausted). But the health for that is already captured by the existing health indicator (i.e. the PubSub+ session health).

@GreenRover
Copy link
Contributor Author

I retestet now with:
<solace-spring-cloud.version>2.3.2
<spring-cloud.version>2021.0.4

A "deleted queue" gives following log message:
2022-10-25T07:33:10.381+0200 INFO Client-1: Error Response (503) - Service Unavailable; subCode: 50; flowId=7
2022-10-25T07:33:10.410+0200 INFO Client-1: Got BIND ('scst/wk/last_value/plain/last_value/state/_') Error Response (503) - Unknown Queue
But health is still "UP". --> Should be "DOWN"

Not connected but reconnecting:
Health is still "UP". (sub status: RECONNECTING) --> OK as expected
Sending via StreamBridge is blocking as long in status=RECONNECTING --> OK as expected

Not connected end of reconnectes:
Health is still "DOWN". (sub status: DOWN) --> OK as expected

org.springframework.messaging.MessagingException: Unable to send message to topic tms/monitoring/monalesy/p/v1/serviceState/request; nested exception is com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error receiving data from underlying connection.
I am at the moment not able to reproduce. Have just seen this while a not 100% understood production issue.

@Nephery
Copy link
Collaborator

Nephery commented Oct 26, 2022

Is the queue that you're deleting the one the input binding is consuming messages from? Or do you have a queue subscribed to the output binding destination, tms/monitoring/monalesy/p/v1/serviceState/request, and that is the queue you are deleting?

@GreenRover
Copy link
Contributor Author

the queue i delete is the from the input binding

@GreenRover
Copy link
Contributor Author

With release 2.5.0 when a queue was deleted manually the service logs:

2023-05-05T10:46:20.725+0200 WARN Received error while trying to read message from endpoint scst/wk/sensor.XXX/plain/sensor/FOOO/_/_
com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
	at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:211) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:570) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleAssuredCtrlMsg(TcpClientChannel.java:1768) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleMessage(TcpClientChannel.java:1733) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.processRead(SubscriberMessageReader.java:98) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.read(SubscriberMessageReader.java:140) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.read(SimpleSmfClient.java:1206) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.processReactorChannels(SyncEventDispatcherReactor.java:206) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.eventLoop(SyncEventDispatcherReactor.java:157) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor$SEDReactorThread.run(SyncEventDispatcherReactor.java:338) ~[sol-jcsmp-10.16.0.jar:?]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

But /actuator/health is still in status up. And the application is forever a zombi. Because neither the queues will be recreated nor the application will going to be killed.

@Nephery Nephery mentioned this issue Oct 25, 2023
Nephery added a commit that referenced this issue Oct 26, 2023
# Global Changes
* Solace PubSub+ Messaging API for Java (JCSMP) upgraded to `10.21.0`
* Spring Boot upgraded to `3.1.5`
* Spring Cloud upgraded to `2022.0.4`

# Specific Project Changes
## Solace Spring Cloud Stream Binder
* Added health indicators to capture flow health
  * closes #145
* Added support for Solace PubSub+ partitioned queues
* Fixed potential error channel name collisions
@mackenza
Copy link
Contributor

@GreenRover , apologies for taking so long to address this issue. Did you also log an RT for this? If so, do you know the ticket reference #?

@GreenRover
Copy link
Contributor Author

Hello Andreaw, no there is no related RT.
ReTest is succesfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request tracked Internally tracked by Solace's internal issue tracking system
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants