[SOL-79060] Health indicator should capture binding health statuses #145

GreenRover · 2022-06-27T08:59:17Z

Scenario:

Application starts up
A queue will be provisioned
Some one deleted the queues (by a mistake)

Result:

2022-06-26T08:03:52.664+0000 ERROR Unable to send request message
org.springframework.messaging.MessagingException: Unable to send message to topic tms/monitoring/monalesy/p/v1/serviceState/request; nested exception is com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error receiving data from underlying connection.
    at com.solace.spring.cloud.stream.binder.util.ErrorChannelSendingCorrelationKey.send(ErrorChannelSendingCorrelationKey.java:57) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessagingException(JCSMPOutboundMessageHandler.java:142) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at com.solace.spring.cloud.stream.binder.outbound.JCSMPOutboundMessageHandler.handleMessage(JCSMPOutboundMessageHandler.java:98) ~[spring-cloud-stream-binder-solace-core-3.2.1.jar!/:?]
    at org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler.handleMessageInternal(AbstractMessageChannelBinder.java:1074) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272) ~[spring-integration-core-5.5.8.jar!/:5.5.8]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:222) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:164) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]
    at org.springframework.cloud.stream.function.StreamBridge.send(StreamBridge.java:144) ~[spring-cloud-stream-3.2.1.jar!/:3.2.1]

Expected:
The SolaceBinderHealthIndicator of the application change to unhealthy.

Because mas last 10 pull requests was not merged i only create an issue

The text was updated successfully, but these errors were encountered:

GreenRover · 2022-08-23T11:33:40Z

Ping @Mrc0113

Nephery · 2022-08-23T15:07:58Z

The solace binder health indicator currently only captures the health of the binder's PubSub+ session. It does not currently capture any binding health statuses since their flows were configured to try reconnecting forever. But it seems that some cases (like this one) aren't captured by the flow reconnect feature.

To add this, we'll need to do some digging around to see if there's a good way to add binding statuses to the binder's health indicator (since its not a composite health indicator) or if/how other SCSt binders capture binding health statuses.

If there's no good way to add this to the existing indicator, we might have to add some custom config option to change the solace binder health indicator into a composite health indicator, which when enabled, would show the health statuses for both the binder's session as well as for all its bindings.

mackenza · 2022-09-29T14:28:54Z

This has been logged in the Solace Jira and we are targeting Q1CY2023 for a fix.

Nephery · 2022-10-21T18:48:45Z

@GreenRover Just double checking, but did you post the correct stacktrace?

Looking again at the one you posted, this looks like an error on the producer side. But you shouldn't get this error by just deleting the queue. Deleting the queue should result in a different error, and it would be one on the consumer.

I was only able to reproduce a similar stacktrace by killing the session (e.g. session reconnect attempts exhausted). But the health for that is already captured by the existing health indicator (i.e. the PubSub+ session health).

GreenRover · 2022-10-25T05:55:11Z

I retestet now with:
<solace-spring-cloud.version>2.3.2
<spring-cloud.version>2021.0.4

A "deleted queue" gives following log message:
2022-10-25T07:33:10.381+0200 INFO Client-1: Error Response (503) - Service Unavailable; subCode: 50; flowId=7
2022-10-25T07:33:10.410+0200 INFO Client-1: Got BIND ('scst/wk/last_value/plain/last_value/state/_') Error Response (503) - Unknown Queue
But health is still "UP". --> Should be "DOWN"

Not connected but reconnecting:
Health is still "UP". (sub status: RECONNECTING) --> OK as expected
Sending via StreamBridge is blocking as long in status=RECONNECTING --> OK as expected

Not connected end of reconnectes:
Health is still "DOWN". (sub status: DOWN) --> OK as expected

org.springframework.messaging.MessagingException: Unable to send message to topic tms/monitoring/monalesy/p/v1/serviceState/request; nested exception is com.solacesystems.jcsmp.JCSMPTransportException: (JCSMPTransportException) Error receiving data from underlying connection.
I am at the moment not able to reproduce. Have just seen this while a not 100% understood production issue.

Nephery · 2022-10-26T17:45:33Z

Is the queue that you're deleting the one the input binding is consuming messages from? Or do you have a queue subscribed to the output binding destination, tms/monitoring/monalesy/p/v1/serviceState/request, and that is the queue you are deleting?

GreenRover · 2022-11-30T09:23:01Z

the queue i delete is the from the input binding

GreenRover · 2023-05-05T08:48:51Z

With release 2.5.0 when a queue was deleted manually the service logs:

2023-05-05T10:46:20.725+0200 WARN Received error while trying to read message from endpoint scst/wk/sensor.XXX/plain/sensor/FOOO/_/_
com.solacesystems.jcsmp.JCSMPErrorResponseException: 503: Unknown Queue
	at com.solacesystems.jcsmp.impl.flow.BindRequestTask.execute(BindRequestTask.java:211) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.impl.flow.SubFlowManagerImpl.handleAssuredCtrlMessage(SubFlowManagerImpl.java:570) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleAssuredCtrlMsg(TcpClientChannel.java:1768) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel.handleMessage(TcpClientChannel.java:1733) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.processRead(SubscriberMessageReader.java:98) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SubscriberMessageReader.read(SubscriberMessageReader.java:140) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.smf.SimpleSmfClient.read(SimpleSmfClient.java:1206) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.processReactorChannels(SyncEventDispatcherReactor.java:206) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor.eventLoop(SyncEventDispatcherReactor.java:157) ~[sol-jcsmp-10.16.0.jar:?]
	at com.solacesystems.jcsmp.protocol.nio.impl.SyncEventDispatcherReactor$SEDReactorThread.run(SyncEventDispatcherReactor.java:338) ~[sol-jcsmp-10.16.0.jar:?]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

But /actuator/health is still in status up. And the application is forever a zombi. Because neither the queues will be recreated nor the application will going to be killed.

# Global Changes * Solace PubSub+ Messaging API for Java (JCSMP) upgraded to `10.21.0` * Spring Boot upgraded to `3.1.5` * Spring Cloud upgraded to `2022.0.4` # Specific Project Changes ## Solace Spring Cloud Stream Binder * Added health indicators to capture flow health * closes #145 * Added support for Solace PubSub+ partitioned queues * Fixed potential error channel name collisions

mackenza · 2023-10-26T18:47:09Z

@GreenRover , apologies for taking so long to address this issue. Did you also log an RT for this? If so, do you know the ticket reference #?

GreenRover · 2023-10-31T10:25:22Z

Hello Andreaw, no there is no related RT.
ReTest is succesfully.

GreenRover mentioned this issue Jun 29, 2022

[SOL-76960] Binder does not cleanup flows if flow is opened but binding creation fails #146

Open

Nephery added the enhancement New feature or request label Aug 23, 2022

Nephery changed the title ~~HealthCheck: does not react on missing queues~~ HealthCheck: does not capture binding health statuses Aug 23, 2022

Nephery changed the title ~~HealthCheck: does not capture binding health statuses~~ Health indicator should capture binding health statuses Aug 23, 2022

GreenRover mentioned this issue Sep 22, 2022

[SOL-79061] Very verbose logging when connection to broker is interuppted: #174

Closed

Nephery changed the title ~~Health indicator should capture binding health statuses~~ [SOL-79060] Health indicator should capture binding health statuses Sep 29, 2022

Nephery added the tracked Internally tracked by Solace's internal issue tracking system label Sep 29, 2022

Nephery mentioned this issue Oct 25, 2023

Release 3.1.0 #237

Merged

Nephery closed this as completed in #237 Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SOL-79060] Health indicator should capture binding health statuses #145

[SOL-79060] Health indicator should capture binding health statuses #145

GreenRover commented Jun 27, 2022

GreenRover commented Aug 23, 2022

Nephery commented Aug 23, 2022

mackenza commented Sep 29, 2022

Nephery commented Oct 21, 2022 •

edited

Loading

GreenRover commented Oct 25, 2022

Nephery commented Oct 26, 2022

GreenRover commented Nov 30, 2022

GreenRover commented May 5, 2023

mackenza commented Oct 26, 2023

GreenRover commented Oct 31, 2023

[SOL-79060] Health indicator should capture binding health statuses #145

[SOL-79060] Health indicator should capture binding health statuses #145

Comments

GreenRover commented Jun 27, 2022

GreenRover commented Aug 23, 2022

Nephery commented Aug 23, 2022

mackenza commented Sep 29, 2022

Nephery commented Oct 21, 2022 • edited Loading

GreenRover commented Oct 25, 2022

Nephery commented Oct 26, 2022

GreenRover commented Nov 30, 2022

GreenRover commented May 5, 2023

mackenza commented Oct 26, 2023

GreenRover commented Oct 31, 2023

Nephery commented Oct 21, 2022 •

edited

Loading