Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewChangesAroundDiagnosticsHandler #39077

Merged
merged 7 commits into from
Mar 7, 2024

Conversation

xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Mar 5, 2024

Issues:
There have been reported application hanging caused due to the System.exit in diagnosticsProvider.

Changes:

  • Only system.exit for error cases
  • Suppress any exceptions when calling diagnosticsHandlers. Only log the error message
  • Add System.err for error cases
  • Even though by default we still call System.err for jvm error cases, but it can be disabled by setting the following system property or environment variable: COSMOS.DIAGNOSTICS_PROVIDER_SYSTEM_EXIT_ON_ERROR
  • If customer has registered a mapper to convert error to exception(NOT RECOMMEND), the total mapper execution count will show up in CosmosDiagnostics:
    image
"cosmos-rntbd-epoll-2-4" #164 daemon prio=5 os_prio=0 cpu=1651247.45ms elapsed=306164.59s tid=0x00007f8d9047b800 nid=0x7eb8 in Object.wait()  [0x00007f8cf9184000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11.0.18/Native Method)
	- waiting on <no object reference available>
	at java.lang.Thread.join(java.base@11.0.18/Thread.java:1300)
	- waiting to re-lock in wait() <0x000000070208acd8> (a java.lang.Thread)
	at java.lang.Thread.join(java.base@11.0.18/Thread.java:1375)
	at java.lang.ApplicationShutdownHooks.runHooks(java.base@11.0.18/ApplicationShutdownHooks.java:107)
	at java.lang.ApplicationShutdownHooks$1.run(java.base@11.0.18/ApplicationShutdownHooks.java:46)
	at java.lang.Shutdown.runHooks(java.base@11.0.18/Shutdown.java:130)
	at java.lang.Shutdown.exit(java.base@11.0.18/Shutdown.java:174)
	- locked <0x0000000702086da0> (a java.lang.Class for java.lang.Shutdown)
	at java.lang.Runtime.exit(java.base@11.0.18/Runtime.java:116)
	at java.lang.System.exit(java.base@11.0.18/System.java:1752)
 at com.azure.cosmos.implementation.DiagnosticsProvider.endSpan(DiagnosticsProvider.java:250)
	...
       ...
       ... 
	at java.util.concurrent.CompletableFuture.completeExceptionally(java.base@11.0.18/CompletableFuture.java:2088)
	at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.messageReceived(RntbdRequestManager.java:1144)
	at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdRequestManager.channelRead(RntbdRequestManager.java:214)
"SpringApplicationShutdownHook" #22 prio=5 os_prio=0 cpu=128.90ms elapsed=1643.40s tid=0x00007f8d9428e000 nid=0x6a14 in Object.wait()  [0x00007f8ce44be000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11.0.18/Native Method)
	- waiting on <no object reference available>
	at java.lang.Object.wait(java.base@11.0.18/Object.java:328)
	at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:275)
	- waiting to re-lock in wait() <0x0000000755a9fe40> (a io.netty.util.concurrent.PromiseTask)
	at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:35)
	at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdClientChannelPool.close(RntbdClientChannelPool.java:540)
	at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdServiceEndpoint.close(RntbdServiceEndpoint.java:366)
	at com.azure.cosmos.implementation.directconnectivity.rntbd.RntbdServiceEndpoint$Provider.close(RntbdServiceEndpoint.java:732)
	at com.azure.cosmos.implementation.directconnectivity.RntbdTransportClient.close(RntbdTransportClient.java:201)
	at com.azure.cosmos.implementation.directconnectivity.StoreClientFactory.close(StoreClientFactory.java:73)
	at com.azure.cosmos.implementation.LifeCycleUtils.closeQuietly(LifeCycleUtils.java:16)
	at com.azure.cosmos.implementation.RxDocumentClientImpl.close(RxDocumentClientImpl.java:4540)
	at com.azure.cosmos.CosmosAsyncClient.close(CosmosAsyncClient.java:558)
	at org.springframework.beans.factory.support.DisposableBeanAdapter.destroy(DisposableBeanAdapter.java:239)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(DefaultSingletonBeanRegistry.java:587)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(DefaultSingletonBeanRegistry.java:559)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(DefaultListableBeanFactory.java:1161)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons(DefaultSingletonBeanRegistry.java:520)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons(DefaultListableBeanFactory.java:1154)
	at org.springframework.context.support.AbstractApplicationContext.destroyBeans(AbstractApplicationContext.java:1106)
	at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1075)
	at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.doClose(ServletWebServerApplicationContext.java:172)
	at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
	- locked <0x0000000701262548> (a java.lang.Object)
	at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
	at org.springframework.boot.SpringApplicationShutdownHook$$Lambda$4052/0x0000000801678c40.accept(Unknown Source)
	at java.lang.Iterable.forEach(java.base@11.0.18/Iterable.java:75)
	at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
	at java.lang.Thread.run(java.base@11.0.18/Thread.java:829)

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12 xinlian12 merged commit b052394 into Azure:main Mar 7, 2024
71 checks passed
jeet1995 pushed a commit to jeet1995/azure-sdk-for-java that referenced this pull request Mar 12, 2024
* few changes in diagnostics provider

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

(cherry picked from commit b052394)
jeet1995 pushed a commit to jeet1995/azure-sdk-for-java that referenced this pull request Mar 12, 2024
* few changes in diagnostics provider

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

(cherry picked from commit b052394)
jeet1995 added a commit that referenced this pull request Mar 18, 2024
* fewChangesAroundDiagnosticsHandler (#39077)

* few changes in diagnostics provider

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

(cherry picked from commit b052394)

* fixSampleRateForQuery (#37015)

* fix

* update changelog

* fix and refactor

* add logs for debugging

* Revert "fix and refactor"

This reverts commit 81b4bf9.

* fix

* Removed unused imports

* Fixed changelog

* Added synchronized block to record feed response as well when tracing is not enabled

* Fixing test regression

* Update CHANGELOG.md

* Update DiagnosticsProvider.java

* Update RxDocumentClientImplTest.java

* Update DiagnosticsProvider.java

* Update CosmosPagedFlux.java

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>
Co-authored-by: Kushagra Thapar <kuthapar@microsoft.com>

(cherry picked from commit 0814173)

* Revert "fixSampleRateForQuery (#37015)"

This reverts commit 07f7759.

* Adding changes from #37015 (#37015).

* Adding changes from #37015 (#37015).

* Adding changes from #37015 (#37015).

* Remove need for winutils.

* Updated CHANGELOG and removed System.exit() calls from ImplementationBridgeHelpers.

* Updated versions for release.

* Updated versions for release.

* Updated CHANGELOG.

---------

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com>
jeet1995 added a commit that referenced this pull request Mar 18, 2024
* fewChangesAroundDiagnosticsHandler (#39077)

* few changes in diagnostics provider

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>

(cherry picked from commit b052394)

* Clean cherry pick.

* Adding changes from #37015.

* Remove need for winutils.

* Updated CHANGELOG.md.

* Removing System.exit() calls from ImplementationBridgeHelpers.

* Removing System.exit() calls from ImplementationBridgeHelpers.

* Removing System.exit() calls from ImplementationBridgeHelpers.

* Updated CHANGELOG.

* Updated versions for release.

* Updated versions for release.

* Updated CHANGELOG.

---------

Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com>
drielenr pushed a commit that referenced this pull request Apr 2, 2024
* few changes in diagnostics provider

---------

Co-authored-by: annie-mac <xinlian@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants