Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Non recoverable JMS connection caused by: java.lang.NullPointerException: null #29881

Closed
3 tasks done
adelinor opened this issue Jul 8, 2022 · 19 comments
Closed
3 tasks done
Assignees
Labels
azure-spring All azure-spring related issues azure-spring-servicebus Spring service bus related issues. bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team
Milestone

Comments

@adelinor
Copy link

adelinor commented Jul 8, 2022

Describe the bug
Consumer implemented with spring-cloud-azure-starter-servicebus-jms loses the JMS connection and is not able to recover it.

Exception or Stack Trace
The stack trace below shows that the connection is lost after or while processing the 31st message. Attempts to recover the connection can be seen with the message: Encountered a JMSException - resetting the underlying JMS Connection.

run_10011log.txt

On the 10th attempt the connection is not recovered and the consumer becomes stale. The only way to resume the consumption of messages is to stop the app and start it again.

To Reproduce

  1. Create a premium Azure Service Bus
  2. Load more than a thousand messages to a queue
  3. Consume messages with a spring boot app which connects with the version 4.3.0 of the spring-cloud-azure-starter-servicebus-jms deployed in a network outside Azure.

Code Snippet
This is not identifiable as logs only report a cause of null.

Expected behavior
The expected behaviour is that the process for resetting the underlying JMS connection eventually succeeds.

Screenshots
If applicable, add screenshots to help explain your problem.

Setup (please complete the following information):

  • OS: Linux
  • IDE: n/a
  • Library/Libraries: com.azure.spring:spring-cloud-azure-starter-servicebus-jms:4.3.0 with Spring boot version 2.6.9 and Apache camel version 2.25.2
  • Java version: 8
  • App Server/Environment: Oracle Weblogic
  • Frameworks: Spring Boot, Apache Camel

I do not suspect a dependency version mismatch. I nevertheless provide the effective pom which includes all transitive dependencies:
effective-pom.xml.txt

Additional context
The capacity was increased at the Azure Service Bus level to 8 MU but it did not affect the behaviour of not recovering the JMS connection. It still failed after a few hundred messages were received.

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@ghost ghost added needs-triage This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 8, 2022
@Netyyyy Netyyyy added bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. azure-spring All azure-spring related issues labels Jul 11, 2022
@ghost ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Jul 11, 2022
@Netyyyy Netyyyy added this to the 2022-08 milestone Jul 11, 2022
@yiliuTo yiliuTo added the azure-spring-servicebus Spring service bus related issues. label Jul 11, 2022
@yiliuTo
Copy link
Member

yiliuTo commented Jul 11, 2022

@adelinor Thanks for reporting this. I tested your case with only spring-cloud-azure-starter-servicebus-jms and it can work. Given you are using Apache camel in your project, could you share a minimum project to show how you send/receive messages with Azure Service Bus via Camel, as well if you have any custom configuration/usage for Spring JMS?

@adelinor
Copy link
Author

Dear @yiliuTo , Many thanks for coming back on this issue. The project I use is the following:
camel-jms-calculator.zip

@yiliuTo yiliuTo removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label Jul 12, 2022
@yiliuTo
Copy link
Member

yiliuTo commented Jul 14, 2022

Hi @adelinor , thanks for sharing your project. Could you also share your Service Bus service configuration? like below or something more if you have other custom configurations.
image

And I noticed that you are using Weblogic as well as the WebLogicJtaTransactionManager for JMS transactions, however after reading your log information, I suppose there should be something wrong when a REMOTE_LINK_OPEN package gets recieved from Service Bus, which seems not realted with either Weblogic or transactions. Thus I tried to reproduce your issue with only camel jms (copied from your project) but no spring transactions, and running on Tomcat. But I haven't reproduced it now. Thouands of messages can be successfully received and consumed.

Thus besides for the Service Bus service configuration, could you help to test if you can get this error without Weblogic or transaction usage? This can help us to narrow down the scope.

@adelinor
Copy link
Author

Hi @yiliuTo ,

The service bus was setup as follows:

  • Create namespace in West Europe with the premium pricing tier and a capacity of 1 MU.
  • Create queues calculator/inbox and calculator/outbox with settings Max delivery count = 3 and Enable dead lettering on message expiration.
  • Create shared access policy for connection string with privileges: send / listen / manage.

The sample code attached earlier was updated to use the default transaction manager (data source based one):

  1. JmsConfig.java
@Configuration
@EnableJms
@Slf4j
public class JmsConfig {

	/**
	 * @param connectionFactory   Auto configured by spring boot
	 * @param transactionManager  Required parameter
	 * @return Camel JMS configuration
	 */
	@Bean
	JmsConfiguration jmsConfiguration(
			ConnectionFactory connectionFactory,
			PlatformTransactionManager transactionManager) {
		JmsConfiguration result = new JmsConfiguration();
		result.setConnectionFactory(connectionFactory);

		if (transactionManager != null) {
			LOGGER.info("Setting transacted=true for the JMS configuration");
			result.setTransactionManager(transactionManager);
			result.setTransacted(true);
			// Default is CACHE_AUTO which should resolve to
			// CACHE_NONE in a transaction enabled environment but
			// defaulting it to avoid any doubt
			result.setCacheLevelName("CACHE_NONE");
		}
		return result;
	}

	/**
	 * @param jmsConfiguration Required parameter
	 * @return Camel JMS endpoint
	 */
	@Bean
	JmsComponent jmsComponent(JmsConfiguration jmsConfiguration) {
		return JmsComponent.jmsComponent(jmsConfiguration);
	}
}
  1. application.properties to disable jta:
spring.jta.enabled=false

5000 messages were pushed to the calculator/inbox queue. The same behaviour can be seen as per the last run where at some point the app becomes stale. This time the last exception is more explicit:

Caused by: org.apache.qpid.jms.provider.ProviderException: Local transactions cannot span multiple top-level entities such as queue or topic. Reference:c20722d3-b551-4ec5-8bde-43c598768983, TrackingId:c89f587b0000043a000522f362d11d74_G9S1_B5S2, SystemTracker:myqueues:Queue:calculator/inbox, Timestamp:2022-07-15T07:55:44 [condition = amqp:not-allowed]
	at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToNonFatalException(AmqpSupport.java:181) ~[qpid-jms-client-0.53.0.jar:na]
	at org.apache.qpid.jms.provider.amqp.AmqpFixedProducer.applyDeliveryStateUpdate(AmqpFixedProducer.java:252) ~[qpid-jms-client-0.53.0.jar:na]
	... 28 common frames omitted

Please find logs for the entire execution:
run_5000log.txt

@yiliuTo
Copy link
Member

yiliuTo commented Jul 28, 2022

Hi @adelinor sorry for the late response, given the log of

Local transactions cannot span multiple top-level entities such as queue or topic.

Service Bus has some specific rules for cross-entity transactions, like it will create a session with specified name crossentity-coordinator for cross-entity transaction operations. Thus for other common transaction managers, it cannot meet this requirement of Service Bus.

However, I have tested with the native JMS transacted session with your sample code, and find that can work. I am connecting to MS Service Bus team internally to confirm this and will update to you if any. Would you like to try using the transacted session for camel JMS instead of other transaction managers?

@adelinor
Copy link
Author

Hi @yiliuTo , as it seems difficult to reproduce the issue, it would be beneficial to add more logging in the SDK so that the cause of the problem can be better located.

This sample app consumes a message to perform operations in the database so the transaction manager needs to encompass database operations.

Kind regards

@yiliuTo
Copy link
Member

yiliuTo commented Aug 5, 2022

Hi @adelinor ,

it would be beneficial to add more logging in the SDK so that the cause of the problem can be better located.

While considering that what the library spring-cloud-azure-starter-servicebus-jms does is to provide auto-configuration of caching/pooled ConnectionFactory as well as JmsListenerContainerFactory for org.apache.qpid:qpid-jms-client, but those factory implementation are used from the third-party library, like org.springframework:spring-jms and org.messaginghub:pooled-jms, thus we cannot modify the logs for those.

And as for the cause of the cross-entity transaction error, as mentioned in jmstoolbox/jmstoolbox#137 (comment), the support from Service Bus service side on cross-entity transaction of JMS specs is only for transacted session (locally transacted JMS session). And according to the underlying Spring JMS code, see JmsAccessor and DefaultJmsListenerContainerFactoryConfigurer, to set the transacted JMS session or transaction managers are mutually exclusive, thus I think that's the reason.

@yiliuTo yiliuTo added the needs-author-feedback More information is needed from author to address the issue. label Aug 5, 2022
@yiliuTo yiliuTo modified the milestones: 2022-08, 2022-09 Aug 9, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Aug 16, 2022
@ghost
Copy link

ghost commented Aug 16, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@adelinor
Copy link
Author

adelinor commented Aug 18, 2022

Hi @yiliuTo ,

The JMSConfiguration is set as transacted. If the transaction configuration was wrong, no message would get processed. The observed behaviour is that the processing works as expected for a small number of messages (less than 100) and fails with the error reported in this issue with larger volumes.

The error reported, java.lang.NullPointerException: null, does not allow you to find the cause of the problem. Could you share the test suite you are using to validate the behaviour? I could try to execute from my machine and see how it behaves.

Kind regards

@ghost ghost added needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-author-feedback More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. labels Aug 18, 2022
@adelinor
Copy link
Author

Hi @yiliuTo,

I would like to share with you some further tests I have done on this issue. I did two runs where both start with 1000 messages in the inbox queue and with empty tables.
The first run is done by setting the environment variable SPRING_JTA_ENABLED=false (which corresponds to the spring boot configuration spring.jta.enabled=false).
When the app starts, the logs show that spring boot configures the JPA transaction manager the one configured from the datasource.
Despite 35 occurrences of the exception reported in this issue, the processing goes on and ends when all 1000 messages are consumed. The problem is that only 997 records get saved in the database.

The second run is done by setting the environment variable SPRING_JTA_ENABLED=true .
When the app starts, the logs show that spring boot configures the Weblogic JTA transaction manager.
The app hangs after processing 174 messages. The database has 173 messages: i.e. data for one item is lost.

In both scenarios above I come to the conclusion that we do not have reliability with this setup: i.e. use of Azure Service Bus via JMS.

@yiliuTo
Copy link
Member

yiliuTo commented Aug 25, 2022

Hi @adelinor , thanks for the detailed information. The test I used is a pure spring-jms application without camel jms since both camel-jms and Azure Service Bus JMS starter relys on the connection /listener container/transaction-management implementation from spring-jms, and here is the application I use. And for spring-jms, it by default configure the transacted session when no transaction manager is found, thus in my test I didn't specifically configure the transaction.

However, I found that when using different ways for cross-entity transaction of spring-jms, it behaves differently of using JmsTemplate or @sendto annotation, when using @sendto, it will encounter the same error of Local transactions cannot span multiple top-level entities such as queue or topic. as what you have observed with camel, while when using JmsTemplate, it works normally. Thus I am reading the spring-jms code to understand the reason, and also plan to compare spring-jms with pure Qpid JMS implemenation about the transacted-session to see if can get more insights. And communication about spring-jms transacted session is welcome.

@adelinor
Copy link
Author

The article https://www.infoworld.com/article/2077963/distributed-transactions-in-spring--with-and-without-xa.html describes very well the options that for managing transactions with various transactional resources. That is the case of the camel-jms-calculator.

The option of not using JTA, i.e. using the default JPA transaction manager, is referred as the Wing-and-a-Prayer pattern:

  • most items will fall in the lucky situation where the database and the Azure service bus transactions got synchronized
  • some items will not, they fail silently due to the lack of synchronization between transactions

Using JTA would guarantee the reliability but the Azure Service Bus may not support XA and this might explain why the applications hangs.

The XA and the Last Resource Gambit would be a fit for this scenario:

  • the database being an XA resource
  • Azure Service Bus as the single non XA resource
    The challenge here is to understand how this could be setup from a configuration perspective.

@ezYakaEagle442
Copy link

ezYakaEagle442 commented Sep 5, 2022

@adelinor could you please try to reproduce the issue with Java 11?
https://camel.apache.org/manual/what-are-the-dependencies.html#_java_support

https://camel.apache.org/manual/camel-3-migration-guide.html

Also your POM does not include camel-azure-servicebus , could you please try to reproduce with Camel latest version below ?:
https://camel.apache.org/components/3.18.x/azure-servicebus-component.html

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-azure-servicebus</artifactId>
    <version>x.x.x</version>
    <!-- use the same version as your Camel core version -->
</dependency>

https://camel.apache.org/components/3.18.x/jdbc-component.html#_endpoint_query_option_resetAutoCommit

Camel will set the autoCommit on the JDBC connection to be false, commit the change after executed the statement and reset the autoCommit flag of the connection at the end, if the resetAutoCommit is true. If the JDBC connection doesn’t support to reset the autoCommit flag, you can set the resetAutoCommit flag to be false, and Camel will not try to reset the autoCommit flag. When used with XA transactions you most likely need to set it to false so that the transaction manager is in charge of committing this tx.

@ezYakaEagle442
Copy link

@adelinor also:
Distributed transactions aren't supported, but transacted sessions are supported.
https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-java-how-to-use-jms-api-amqp#unsupported-features-and-restrictions

@saragluna saragluna modified the milestones: 2022-09, 2022-10 Sep 6, 2022
@adelinor
Copy link
Author

adelinor commented Sep 6, 2022

@ezYakaEagle442 ,

could you please try to reproduce the issue with Java 11?

The Weblogic Application server runs on Java 8. Nevertheless in order to try this option, I updated slightly the pom file to make the app runnable from Spring boot. Changes were to:

  • add the Oracle database driver
  • update datasource configuration to specify url, credentials rather than using a JNDI lookup

With JDK 11 (version "11.0.16" 2022-07-19 LTS), the same error as reported in this issue occurs but the application is not blocked and all messages end up in the out queue. The database though misses records: 18 out of 5000.

could you please try to reproduce with Camel latest version below ?

I changed further the pom to use the camel version 3.18.1, updated the camel spring boot starter is updated to use the new groupId name org.apache.camel.springboot, and finally added:

<dependency>
	<groupId>org.apache.camel</groupId>
	<artifactId>camel-azure-servicebus</artifactId>
	<version>${camel.version}</version>
</dependency>

With that setup, a batch of 5000 messages get consumed, No error appear in the logs. The database misses 15 records out of the 5000 processed.

@yiliuTo yiliuTo modified the milestones: 2022-10, 2022-11 Oct 9, 2022
@yiliuTo yiliuTo assigned fangjian0423 and unassigned yiliuTo Oct 9, 2022
@fangjian0423
Copy link
Contributor

hi @adelinor ,

We have a meeting internally to discuss this issue, and aware that finally the POC successfully.

Is there any other problems we need to help you?

@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Oct 27, 2022
@ghost
Copy link

ghost commented Oct 27, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@adelinor
Copy link
Author

adelinor commented Oct 27, 2022

Dear @fangjian0423 ,

Indeed by creating separate transaction managers:

  • one for JMS transactions
  • one for the database transactions

I was able to process successfully 100,00 messages without a loss or errors using a standard (i.e. not premium) Azure ServiceBus resource. Note that:

  • I excluded the dependency on Apache Camel as its spring boot starter was not coping with two transaction managers. This dependency could be re-instated after finding the correct way to configure Camel without the auto-config.
  • I had to wrap all database operations behind a single façade. This was easy to do in this small PoC but could be very un-natural in a larger project.

To conclude, this issue can be closed. As a take-away it would be beneficial to the community to have guidance on how to approach the use case of processing messages while performing a transaction with a database.

Kind regards

@ghost ghost removed needs-author-feedback More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. labels Oct 27, 2022
@fangjian0423
Copy link
Contributor

Thank you for your feedback and suggestion.

We will mention the transaction in our docs in the future.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
azure-spring All azure-spring related issues azure-spring-servicebus Spring service bus related issues. bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team
Projects
Archived in project
Development

No branches or pull requests

6 participants