[BUG] UndeliverableException when creating resource group and network security group in heavy load #33056

wangwenbj · 2023-01-18T06:48:19Z

Describe the bug
We encountered the following errors on heavy load when creating resource group and network security group using Azure Java SDK new version, The Webclient is OkHttpClient. This issue is not happending in the old rxjava version though

Exception or Stack Trace

Exception in thread "RxCachedThreadScheduler-141" io.reactivex.rxjava3.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
at io.reactivex.rxjava3.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:372)
at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:49)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleResumeNext.subscribeActual(SingleResumeNext.java:39)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
at io.reactivex.rxjava3.core.Scheduler$DisposeTask.run(Scheduler.java:644)
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65)
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

To Reproduce
This issue cannot be reproduced eaisly. It happens every now and then in our production env and we have nowhere to catch and handle this issue.

In large scale of resoruce group creation we encounter this issue some times. I have reproduce this only once locally using 100 resource groups provision in parallel.

Code Snippet
ResourceGroup.DefinitionStages.WithCreate creator = this.azureResoureManager.resourceGroups().define(resourceGroupName)
.withRegion(region);
return ReactorToRxV3Interop.monoToSingle(creator.createAsync());

Expected behavior
No exception happend or if exception happened we could have a way to catch it inside the reactor chain.

Screenshots
API error. No screen shots

Additional context
This part of log is what we catch in our customized okhttp interceptor, however, after the exception is thrown, the upper chain lost track of this exception. Which caused the chain to never stop.

2023-01-11T17:05:44.011Z [trace_id=9492315ecd8cdf9e9db291d40c42e57b] [transaction_id=1e99ae844e81ce79] ERROR [gement.azure.com/...] .i.i.AzureResilienceInterceptorImpl.logRetryInfoForError:506 - Exception: java.io.IOException: Canceled at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.DynamicThrottleInterceptor.intercept(DynamicThrottleInterceptor.java:80) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:221) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)

| stream | stdout

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

Bug Description Added
Repro Steps Added
Setup information Added

joshfree · 2023-01-19T23:49:43Z

Thank you for reaching out to us via this github issue, @wangwenbj. @weidongxu-microsoft will be able to help route your issue further. Please note that if this problem requires immediate attention, please refer to Azure support plan details here: https://github.com/Azure/azure-sdk-for-java/blob/main/SUPPORT.md#support

weidongxu-microsoft · 2023-01-20T01:23:50Z

@wangwenbj

What is the version of the SDK?
What is the version of azure-core-http-okhttp?

Also, may I ask why choose OkHttpClient over NettyClient?

wangwenbj · 2023-01-20T05:44:24Z

Hi Weidong, Belonw is what we are using: <dependency> <groupId>com.azure.resourcemanager</groupId> <artifactId>azure-resourcemanager</artifactId> <version> 2.19.0</version> <exclusions> <exclusion> <groupId>com.azure</groupId> <artifactId>azure-core-http-netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.5.4</version> <exclusions> <exclusion> <groupId>com.azure</groupId> <artifactId>azure-core-http-netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-core-http-okhttp</artifactId> <version>1.11.1</version> </dependency> Let me know if you have any questions. Best regards, Wen From: Weidong Xu ***@***.***> Date: Friday, January 20, 2023 at 09:24 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=10RpRiObrwgXnuZK8IE1EawhSlDwb4zW02xQuJoWrxI%3D&reserved=0> What is the version of the SDK? Also, may I ask why choose OkHttp over Netty? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1397814710&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xspRAkgSqiKfBKiqkA4q8nB%2B3mYgxgdcVH2tlwgMMLw%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XLEBFTKJ7TLKCALB5DWTHSLBANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3cybWCgYoh3eC0c4ztNvP%2FNnVDXDej2wVpYJlY6%2Bxs4%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

wangwenbj · 2023-01-30T07:29:17Z

Hi Weidong， Any updates? We used Okhttp client in previous Azure SDK and had implements to adjust Azure quota limits, which is pretty hard to change to netty Best regards， Wen From: Wen Wang ***@***.***> Date: Friday, January 20, 2023 at 13:44 To: Azure/azure-sdk-for-java ***@***.***>, Azure/azure-sdk-for-java ***@***.***> Cc: Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) Hi Weidong, Belonw is what we are using: <dependency> <groupId>com.azure.resourcemanager</groupId> <artifactId>azure-resourcemanager</artifactId> <version> 2.19.0</version> <exclusions> <exclusion> <groupId>com.azure</groupId> <artifactId>azure-core-http-netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-identity</artifactId> <version>1.5.4</version> <exclusions> <exclusion> <groupId>com.azure</groupId> <artifactId>azure-core-http-netty</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>com.azure</groupId> <artifactId>azure-core-http-okhttp</artifactId> <version>1.11.1</version> </dependency> Let me know if you have any questions. Best regards, Wen From: Weidong Xu ***@***.***> Date: Friday, January 20, 2023 at 09:24 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=10RpRiObrwgXnuZK8IE1EawhSlDwb4zW02xQuJoWrxI%3D&reserved=0> What is the version of the SDK? Also, may I ask why choose OkHttp over Netty? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1397814710&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xspRAkgSqiKfBKiqkA4q8nB%2B3mYgxgdcVH2tlwgMMLw%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XLEBFTKJ7TLKCALB5DWTHSLBANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C748e0a901fc04e6a083208dafa85033a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638097746462629406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3cybWCgYoh3eC0c4ztNvP%2FNnVDXDej2wVpYJlY6%2Bxs4%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-01-30T07:55:10Z

Hi @wangwenbj ,
I've tried creating 100 resource groups multiple times but not able to reproduce the issue...

You can refer to this doc for throttling control.

P.S. You don't have to write your own ReactorToRxV3Interop. There's official support for converting Mono to Rxjava3 Single.

wangwenbj · 2023-01-30T08:14:24Z

Hi Xiaofei, I could not reproduce this issue easily locally as well, and this issue keeps occurring like, everyday. Any think you could think of that caused this issue? 1. For the Azure client, we used Okhttp clients before and have several interceptors implemented. It could be huge work if we switch it Netty, Please kindly take a look how we could handle this in OkHttpClient 2. Thanks for the information of the Reactor adaptor, we could make this change. Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Monday, January 30, 2023 at 15:55 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Hi @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C262b559b22f74106164708db029756b5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106621246637674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WZz8ymiTUXPwWPpCKVGNs29%2B4r5V91t6GHSnCKWIjzI%3D&reserved=0> , I've tried creating 100 resource groups multiple times but not able to reproduce the issue... You can refer to this doc<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fblob%2Fmain%2Fsdk%2Fresourcemanager%2Fdocs%2FTHROTTLING.md&data=05%7C01%7Cwwen%40vmware.com%7C262b559b22f74106164708db029756b5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106621246657585%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4LrwqdTZUA1IhV9eSruwfXJklKlZVdFK6w6MdzXj3b8%3D&reserved=0> for throttling control. P.S. You don't have to write your own ReactorToRxV3Interop. There's official support<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprojectreactor.io%2Fdocs%2Fadapter%2Frelease%2Fapi%2Freactor%2Fadapter%2Frxjava%2FRxJava3Adapter.html&data=05%7C01%7Cwwen%40vmware.com%7C262b559b22f74106164708db029756b5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106621246667544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OiclIjZggsqsiMdKmmSPkA4vemLniP%2FzAgcztbuefiU%3D&reserved=0> for converting Mono to Rxjava3 Single. — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1408138767&data=05%7C01%7Cwwen%40vmware.com%7C262b559b22f74106164708db029756b5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106621246667544%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fa1rc0vKchDt5BAh4H6f%2Fzs%2BwreAlaXhSRuFQlVO00w%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XMSGUJI5BBZU36N3YDWU5XWTANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C262b559b22f74106164708db029756b5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106621246677501%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CKTCAnyDrjB7oDxEUQHCi7JRw1BZF%2BOpqW1sp4%2B2Gig%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-01-30T08:51:29Z

OK, got it.

Any think you could think of that caused this issue?

I'm not sure. From the log I can't tell the root cause of the exception. And for your description:

after the exception is thrown, the upper chain lost track of this exception. Which caused the chain to never stop.

I don't quite understand, can you elaborate on this? What do you mean by never stop?

wangwenbj · 2023-01-30T09:08:46Z

Hi Xiaofei, Here’s some context of this issue. Let me know if this still not answer your questions. 1. Logged error. Jan 11, 2023 @ 17:05:44.010 Exception in thread "RxCachedThreadScheduler-141" io.reactivex.rxjava3.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException at io.reactivex.rxjava3.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:372) at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:49) at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855) at io.reactivex.rxjava3.internal.operators.single.SingleResumeNext.subscribeActual(SingleResumeNext.java:39) at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855) at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89) at io.reactivex.rxjava3.core.Scheduler$DisposeTask.run(Scheduler.java:644) at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) Jan 11, 2023 @ 17:05:44.010 io.reactivex.rxjava3.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException at io.reactivex.rxjava3.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:372) at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:49) at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855) at io.reactivex.rxjava3.internal.operators.single.SingleResumeNext.subscribeActual(SingleResumeNext.java:39) at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855) at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89) at io.reactivex.rxjava3.core.Scheduler$DisposeTask.run(Scheduler.java:644) at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException at reactor.core.Exceptions.propagate(Exceptions.java:396) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91) at reactor.core.publisher.Mono.block(Mono.java:1742) at com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569) at com.azure.resourcemanager.resources.implementation.DeploymentsImpl.checkExistence(DeploymentsImpl.java:102) at com.vmware.horizon.sg.clouddriver.impl.azure.v2.operator.DeploymentOperator.isDeploymentExist(DeploymentOperator.java:46) at com.vmware.horizon.sg.clouddriver.impl.azure.v2.CloudDriverAzureV2.lambda$isDeploymentExist$29(CloudDriverAzureV2.java:794) at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:43) ... 12 more Caused by: java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source) at java.base/java.util.concurrent.CountDownLatch.await(Unknown Source) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) ... 18 more Jan 11, 2023 @ 17:05:44.011 [trace_id=9492315ecd8cdf9e9db291d40c42e57b] [transaction_id=1e99ae844e81ce79] ERROR [gement.azure.com/...] .i.i.AzureResilienceInterceptorImpl.logRetryInfoForError:506 - Exception: java.io.IOException: Canceled at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.DynamicThrottleInterceptor.intercept(DynamicThrottleInterceptor.java:80) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:221) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Jan 11, 2023 @ 17:05:44.011 returnCode=uioe HEAD https://management.azure.com/subscriptions/da5d81e1-1138-4026-a898-ce9a1ad280d1/resourcegroups/vmw-hcs-63bed6164b4f5924004bd0bd-63bee7b53d8ff07a670fdc16/providers/Microsoft.Resources/deployments/vmw-hcs-63bed6164b4f5924004bd0bd-63bee7b53d8ff07a670fdc16-nsg?api-version=2021-01-01 ... 18 more Jan 11, 2023 @ 17:05:44.011 [trace_id=9492315ecd8cdf9e9db291d40c42e57b] [transaction_id=1e99ae844e81ce79] INFO [gement.azure.com/...] okhttp3.OkHttpClient.log:133 - --> HEAD https://management.azure.com/subscriptions/da5d81e1-1138-4026-a898-ce9a1ad280d1/resourcegroups/vmw-hcs-63bed6164b4f5924004bd0bd-63bee7b53d8ff07a670fdc16/providers/Microsoft.Resources/deployments/vmw-hcs-63bed6164b4f5924004bd0bd-63bee7b53d8ff07a670fdc16-nsg?api-version=2021-01-01 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException at reactor.core.Exceptions.propagate(Exceptions.java:396) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91) at reactor.core.publisher.Mono.block(Mono.java:1742) at com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569) at com.azure.resourcemanager.resources.implementation.DeploymentsImpl.checkExistence(DeploymentsImpl.java:102) at com.vmware.horizon.sg.clouddriver.impl.azure.v2.operator.DeploymentOperator.isDeploymentExist(DeploymentOperator.java:46) at com.vmware.horizon.sg.clouddriver.impl.azure.v2.CloudDriverAzureV2.lambda$isDeploymentExist$29(CloudDriverAzureV2.java:794) at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:43) ... 12 more Caused by: java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source) at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source) at java.base/java.util.concurrent.CountDownLatch.await(Unknown Source) Jan 11, 2023 @ 17:05:44.011 [trace_id=9492315ecd8cdf9e9db291d40c42e57b] [transaction_id=1e99ae844e81ce79] INFO [gement.azure.com/...] okhttp3.OkHttpClient.log:133 - <-- HTTP FAILED: java.io.IOException: Canceled 1. How to reproduce it. Provision multiple resource groups , after which, create NSG, in parallel, in heavy load. 1. How the follow work in creating NSG. After creating NSG, we create key vault, and then do some network (Subnet) queries, and then create key vault if demanded. 1. Can you please share a sample of code. Create resource group public Single<ResourceGroup> createAsync(String resourceGroupName, String region, Map<String, String> tags) { ResourceGroup.DefinitionStages.WithCreate creator = this.azure.resourceGroups().define(resourceGroupName) .withRegion(region); if (MapUtils.isNotEmpty(tags)) { creator.withTags(tags); } return ReactorToRxV3Interop.monoToSingle(creator.createAsync()); } Check NSG exist public Single<NetworkSecurityGroup> getSecurityGroupByName(String securityGroupName, String resourceGroupName) { return ReactorToRxV3Interop.monoToSingle(this.networkSecurityGroups() .getByResourceGroupAsync(resourceGroupName, securityGroupName) .filter(Objects::nonNull)); } Create public Single<Deployment> createAsyncByArmTemplate(String deploymentName, String resourceGroupName, ARMTemplateBuilder.Template armTemplate) throws IOException { Mono<Deployment> deploymentMono = this.deployments().define(deploymentName) .withExistingResourceGroup(resourceGroupName) .withTemplate(armTemplate.template) .withParameters(armTemplate.parameters) .withMode(DeploymentMode.INCREMENTAL) .createAsync(); return ReactorToRxV3Interop.monoToSingle(deploymentMono); } Reactor to Rxjava interoperators public class ReactorToRxV3Interop { public static <T> Single<T> monoToSingle(Mono<T> singleSource) { return Single.fromPublisher(singleSource); } public static <T> Observable<T> fluxToObservable(Flux<T> fluxSource) { return Flowable.fromPublisher(fluxSource).toObservable(); } public static <T> Completable monoToCompletable(Mono<T> monoSource) { return new CompletableFromPublisher<>(monoSource); } } 1. Can you please share with us the name of the new and old SDK. Old SDK: <dependency> <groupId>com.microsoft.azure</groupId> <artifactId>azure</artifactId> <version>1.41.3</version> </dependency> New SDK: <dependency> <groupId>com.azure.resourcemanager</groupId> <artifactId>azure-resourcemanager</artifactId> <version>2.19.0</version> <exclusions> <exclusion> <groupId>com.azure</groupId> <artifactId>azure-core-http-netty</artifactId> </exclusion> </exclusions> </dependency> Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Monday, January 30, 2023 at 16:51 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email OK, got it. Any think you could think of that caused this issue? I'm not sure. From the log I can't tell the root cause of the exception. And for your description: after the exception is thrown, the upper chain lost track of this exception. Which caused the chain to never stop. I don't quite understand, can you elaborate on this? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1408205460&data=05%7C01%7Cwwen%40vmware.com%7C7921e13399b34fb3194f08db029f34d1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106655035594925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vYkW8JsyTOvawfIChOA250mDlgQO20STAziKEC%2FEx2Y%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XJ6YSNBPFB4BDQWMHDWU56JZANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C7921e13399b34fb3194f08db029f34d1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106655035594925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y%2B5NLOmkpFWCn%2BmYwc4%2Bq7fatCWrcm91XB5Kh3mjYis%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-01-30T10:21:31Z

Thanks @wangwenbj

I saw a blocking get operation got canceled in DynamicThrottleInterceptor(Exception: java.io.IOException: Canceled), and InterruptedException is thrown. You may want some special error-handling here, described by Rxjava3 error-handling:

In addition, some 3rd party libraries/code throw when they get interrupted by a cancel/dispose call which leads to an undeliverable exception most of the time. Internal changes in 2.0.6 now consistently cancel or dispose a Subscription/Disposable before cancelling/disposing a task or worker (which causes the interrupt on the target thread).

// in some library
try {
   doSomethingBlockingly()
} catch (InterruptedException ex) {
   // check if the interrupt is due to cancellation
   // if so, no need to signal the InterruptedException
   if (!disposable.isDisposed()) {
      observer.onError(ex);
   }
}

If the library/code already did this, the undeliverable InterruptedExceptions should stop now. If this pattern was not employed before, we encourage updating the code/library in question.

By the way, could you show me the codesnippet of DynamicThrottleInterceptor please?

wangwenbj · 2023-01-30T11:09:07Z

Thanks,, Xiaofei, I checked in the call sequence; we did not generate any Rx objects so there’s no way we could generate this issue in our code. For the DynamicThrottleInterceptor, I modified the code to provide the logic. It’s basically used to prevent the 429 issue and do delay before it happen. Please check: @OverRide public @NotNull Response ***@***.*** Chain chain) throws IOException { Request request = chain.request(); String requestMethod = request.method(); String requestUrl = request.url().toString(); // Record our recognized operationType for dev purpose Set<String> requestOperationType = getQuotaTypes(requestMethod, requestUrl); long delay = getQuotaDelay(requestMethod, requestUrl, clientId); if (delay > 0) { throw new Exception(); } Response response = chain.proceed(request); // Call Azure. return response; } Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Monday, January 30, 2023 at 18:21 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Thanks @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C83320e3673fc493aff7308db02abc911%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106709069241311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IpDP0Nh%2B60LtIO%2BkyKi0Lj%2FPI25nxTNEbiUaYTdo1mA%3D&reserved=0> I saw a blocking get operation got canceled in DynamicThrottleInterceptor(Exception: java.io.IOException: Canceled), and InterruptedException is thrown. You may want some special error-handling here, described by Rxjava3 error-handling<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FReactiveX%2FRxJava%2Fwiki%2FWhat%27s-different-in-2.0%23error-handling&data=05%7C01%7Cwwen%40vmware.com%7C83320e3673fc493aff7308db02abc911%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106709069397537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eOMzLyQ1gexGCpy8k6SrN9h%2BJA8Q2R8YDANlrsD%2BoiA%3D&reserved=0>: In addition, some 3rd party libraries/code throw when they get interrupted by a cancel/dispose call which leads to an undeliverable exception most of the time. Internal changes in 2.0.6 now consistently cancel or dispose a Subscription/Disposable before cancelling/disposing a task or worker (which causes the interrupt on the target thread). // in some library try { doSomethingBlockingly() } catch (InterruptedException ex) { // check if the interrupt is due to cancellation // if so, no need to signal the InterruptedException if (!disposable.isDisposed()) { observer.onError(ex); } } If the library/code already did this, the undeliverable InterruptedExceptions should stop now. If this pattern was not employed before, we encourage updating the code/library in question. By the way, could you show me the codesnippet of DynamicThrottleInterceptor please? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1408351061&data=05%7C01%7Cwwen%40vmware.com%7C83320e3673fc493aff7308db02abc911%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106709069397537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CtmcLa2RUCGxs7xfh9KkS1DRIndZNcyI6YMpOuLv1Uo%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XL4K4IDSSIR7O6MFPTWU6I3PANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C83320e3673fc493aff7308db02abc911%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638106709069397537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HHd5E6GRG2Lk5798mObtQG0hO8CA%2BLOjiVt7C5XMKdM%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-01-31T08:54:19Z

OK, our track1 lib uses Rxjava and your code uses Rxjava3. There is a difference in error handling since Rxjava2, especially those undeliverable:

One important design requirement for 2.x is that no Throwable errors should be swallowed. This means errors that can't be emitted because the downstream's lifecycle already reached its terminal state or the downstream cancelled a sequence which was about to emit an error.

My best guess is that this error actually happens in old rxjava but got swallowed. You can try adding a global error handler to handle this specific exception based on whether they represent a likely bug or an ignorable application/network state in Rxjava3 described in error-handling:

RxJavaPlugins.setErrorHandler(e -> {
    if (e instanceof UndeliverableException) {
        e = e.getCause();
    }
    if ((e instanceof IOException) || (e instanceof SocketException)) {
        // fine, irrelevant network problem or API that throws on cancellation
        return;
    }
    if (e instanceof InterruptedException) {
        // fine, some blocking code was interrupted by a dispose call
        return;
    }
    if ((e instanceof NullPointerException) || (e instanceof IllegalArgumentException)) {
        // that's likely a bug in the application
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    if (e instanceof IllegalStateException) {
        // that's a bug in RxJava or in a custom operator
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    Log.warning("Undeliverable exception received, not sure what to do", e);
});

wangwenbj · 2023-02-01T01:54:26Z

Thanks, Xiaofei, Sure, I tried the RxJavaPlugins global error handler, however, when this UndeliverableException happened, global handler could not stop the Rx chain and it will not help the timeout. What I really need is to stop the blocking wait when error happened and handle the error accordingly. Also, we use the same code piece for this logic and same code piece for transferring rxjava to rxjava3 when we use the previous Azure SDK and it worked fine for years. Considering the scenarios above, could you please continue the investigation on the SDK itself with OKHttpClient? We are blocked on our side for the investigation. I will try changing the client to Netty which could be lot’s of effort and could take some time. Before we finally eliminate the issue, could you please continuously help us on this one? I think the following perspectives are what we could focus: 1. Azure Java SDK Reactor implementation vs the previous one over OkHttpClient 2. OkhttpClient vs Netty. What’s the difference for the two clients. Let me know if you have any questions. Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Tuesday, January 31, 2023 at 16:54 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email OK, our track1 lib uses Rxjava and your code uses Rxjava3. There is a difference in error handling since Rxjava2, especially those undeliverable: One important design requirement for 2.x is that no Throwable errors should be swallowed. This means errors that can't be emitted because the downstream's lifecycle already reached its terminal state or the downstream cancelled a sequence which was about to emit an error. My best guess is that this error actually happens in old rxjava but got swallowed. You can try adding a global error handler to handle this specific exception based on whether they represent a likely bug or an ignorable application/network state in Rxjava3 described in error-handling<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FReactiveX%2FRxJava%2Fwiki%2FWhat%27s-different-in-2.0%23error-handling&data=05%7C01%7Cwwen%40vmware.com%7C3ee3e801955642ee7d3508db0368c4cd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638107520748572106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=T1u84dhUbrHoF07xoGSquaIcVgRrI8wkQzATFSbkfDQ%3D&reserved=0>: RxJavaPlugins.setErrorHandler(e -> { if (e instanceof UndeliverableException) { e = e.getCause(); } if ((e instanceof IOException) || (e instanceof SocketException)) { // fine, irrelevant network problem or API that throws on cancellation return; } if (e instanceof InterruptedException) { // fine, some blocking code was interrupted by a dispose call return; } if ((e instanceof NullPointerException) || (e instanceof IllegalArgumentException)) { // that's likely a bug in the application Thread.currentThread().getUncaughtExceptionHandler() .handleException(Thread.currentThread(), e); return; } if (e instanceof IllegalStateException) { // that's a bug in RxJava or in a custom operator Thread.currentThread().getUncaughtExceptionHandler() .handleException(Thread.currentThread(), e); return; } Log.warning("Undeliverable exception received, not sure what to do", e); }); — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1409983138&data=05%7C01%7Cwwen%40vmware.com%7C3ee3e801955642ee7d3508db0368c4cd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638107520748572106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cuHwOtH5G5vZ5M4tp2Tkm8C1tJApnE8LTGR1AZXbFlI%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XKLABRXKFJ6F4RRNITWVDHMPANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C3ee3e801955642ee7d3508db0368c4cd%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638107520748572106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cs2KKux%2BAXOtN0Kh5HEhxAdr7TFo4crPb5nlNZcJKQ8%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-02-01T06:00:22Z

Sure. Would you help me confirm line 80 code content of DynamicThrottleInterceptor? I assume the exception initiated here?

at com.vmware.horizon.sg.clouddriver.impl.azure.internal.DynamicThrottleInterceptor.intercept(DynamicThrottleInterceptor.java:80)

wangwenbj · 2023-02-01T06:17:10Z

Really appreciated, Xiaofei, The line 80 is: Response response = chain.proceed(request); // Call Azure. Which is perform call to Azure service Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Wednesday, February 1, 2023 at 14:00 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Sure. Would you help me confirm line 80 of DynamicThrottleInterceptor? I assume the exception initiated here? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1411508737&data=05%7C01%7Cwwen%40vmware.com%7C51d5b6dc33ef410fcab108db0419a1b1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638108280350450115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=auAVjgATtLcHv4dL2Z0KPP3Gte6%2BuUqlDZGbC6S7lvg%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XOAMFR37FNHOBKTT7LWVH3YDANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C51d5b6dc33ef410fcab108db0419a1b1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638108280350450115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DdXExS8M9Fwjh%2BEoKebbNChN4n0SElnrEV6Hzusjrzo%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-02-02T08:04:43Z

Hi @wangwenbj , I saw a very similar situation where the chain got stalled when throwing an non-IOException in interceptor:
square/retrofit#3453

I wonder if this is the case? What did you do with the exception after you logged it in your custom interceptor(AzureResilienceInterceptorImpl.logRetryInfoForError)? Did you wrapped it into some other non-IOException?

wangwenbj · 2023-02-08T02:34:03Z

Thanks, Xiaofei, We do have thrown an Exception extends RuntimeException instead of IOException in the interceptor. Let me do the change and see if this will help. Also, this impl has been there for a long time and I am just curious why this issue is not happening in the old rxjava version SDK? Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Thursday, February 2, 2023 at 16:04 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Hi @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C45e51efe59dd4e16682408db04f42bba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638109218979172687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0bg6tR2DeW%2BW%2B917KYixbJfhs55PsBpOWHTx%2FRvLqLc%3D&reserved=0> , I saw a very similar situation where the chain got stalled when throwing an non-IOException in interceptor: square/retrofit#3453<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsquare%2Fretrofit%2Fissues%2F3453&data=05%7C01%7Cwwen%40vmware.com%7C45e51efe59dd4e16682408db04f42bba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638109218979172687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LeKOSVitgxiGs7iZvXuKvLtzFe9DsE21clqw2BULFSc%3D&reserved=0> I wonder if this is the case? What did you do with the exception after you logged it in your custom interceptor(AzureResilienceInterceptorImpl.logRetryInfoForError)? Did you wrapped it into some other non-IOException? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1413302844&data=05%7C01%7Cwwen%40vmware.com%7C45e51efe59dd4e16682408db04f42bba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638109218979172687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QbBhiIXhqfwLGVimg0wtWuc5xjVsLk51kbhC6v8TRo4%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XP45L5Z34OJLB6U5E3WVNTCPANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C45e51efe59dd4e16682408db04f42bba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638109218979172687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4R838munK2UU2mfqUiuAWrIo7k9VEA6Az6POjeH5gko%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-02-08T06:50:40Z

Hi @wangwenbj

why this issue is not happening in the old rxjava version SDK?

I'm not sure. Are you using the same version of Okhttp3 as before?

My other speculation is that the Rxjava->Rxjava3 adaptor that you used before behaves differently than the Reactor->Rxjava3 adaptor you are using now. This is pure speculation...

General good practice(from their official doc) is that you don't throw your own exceptions in Interceptors, IOExceptions or not.
Instead, if you want to signal a failure, use synthetic http responses:

 @Throws(IOException::class)
 override fun intercept(chain: Interceptor.Chain): Response {
   if (myConfig.isInvalid()) {
     return Response.Builder()
         .request(chain.request())
         .protocol(Protocol.HTTP_1_1)
         .code(400)
         .message("client config invalid")
         .body("client config invalid".toResponseBody(null))
         .build()
   }

   return chain.proceed(chain.request())
 }

wangwenbj · 2023-02-09T04:53:09Z

Anyways, thank you Xiaofei, Let’s keep this issue open and we have already made the change and let’s see if it helps or not. I will update the thread once we got some conclusions. It will take some time. Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Wednesday, February 8, 2023 at 14:50 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Hi @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NeRDDi9CBmBiw5G9UWmfUrBKfaRiUUb%2FShs7DHYQ6HM%3D&reserved=0> why this issue is not happening in the old rxjava version SDK? I'm not sure. Are you using the same version of Okhttp3 as before? My other speculation is that the Rxjava->Rxjava3 adaptor that you used before behaves differently than the Reactor->Rxjava3 adaptor you are using now. This is pure speculation... General good practice(from their official doc<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsquare%2Fokhttp%2Fblob%2F3ad1912f783e108b3d0ad2c4a5b1b89b827e4db9%2Fokhttp%2Fsrc%2FjvmMain%2Fkotlin%2Fokhttp3%2FInterceptor.kt%23L40-L57&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fNEZ0nH4k8eJcHx8%2BZqZV5ijQVUGy3jPpZp3zJvkMwk%3D&reserved=0>) is that you don't throw your own exceptions in Interceptors. Instead, if you want to signal a failure, use synthetic http responses: @throws(IOException::class) override fun intercept(chain: Interceptor.Chain): Response { if (myConfig.isInvalid()) { return Response.Builder() .request(chain.request()) .protocol(Protocol.HTTP_1_1) .code(400) .message("client config invalid") .body("client config invalid".toResponseBody(null)) .build() } return chain.proceed(chain.request()) } — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1422097300&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49yBJFITbunX9pmoFO1YjG%2FE3jw%2FMWVvakKGNeG1a%2Bk%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XOPGVMZIEM2H6KFWV3WWM64XANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lNu0Ja1n7INmEIlY%2F95c2pp32YKLDF0sAh6EFbomtII%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

wangwenbj · 2023-02-15T07:40:19Z

Hi Xiaofei, We updated the exception which extends from IOException istead of RuntimeException and this issue still occurs. From my perspective, before we throw any of the exceptions, this issue happens. The error is as follows: 2023-02-14T18:18:37.744Z [trace_id=4277b25640ee77e705a653b02f8e1f64] [transaction_id=56e2270d14efa859] ERROR [gement.azure.com/...] .i.i.AzureResilienceInterceptorImpl.logRetryInfoForError:506 - Exception: java.io.IOException: Canceled at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.DynamicThrottleInterceptor.intercept(DynamicThrottleInterceptor.java:80) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:221) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Best regards, Wen From: Wen Wang ***@***.***> Date: Thursday, February 9, 2023 at 12:53 To: Azure/azure-sdk-for-java ***@***.***>, Azure/azure-sdk-for-java ***@***.***> Cc: Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) Anyways, thank you Xiaofei, Let’s keep this issue open and we have already made the change and let’s see if it helps or not. I will update the thread once we got some conclusions. It will take some time. Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Wednesday, February 8, 2023 at 14:50 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Hi @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NeRDDi9CBmBiw5G9UWmfUrBKfaRiUUb%2FShs7DHYQ6HM%3D&reserved=0> why this issue is not happening in the old rxjava version SDK? I'm not sure. Are you using the same version of Okhttp3 as before? My other speculation is that the Rxjava->Rxjava3 adaptor that you used before behaves differently than the Reactor->Rxjava3 adaptor you are using now. This is pure speculation... General good practice(from their official doc<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsquare%2Fokhttp%2Fblob%2F3ad1912f783e108b3d0ad2c4a5b1b89b827e4db9%2Fokhttp%2Fsrc%2FjvmMain%2Fkotlin%2Fokhttp3%2FInterceptor.kt%23L40-L57&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fNEZ0nH4k8eJcHx8%2BZqZV5ijQVUGy3jPpZp3zJvkMwk%3D&reserved=0>) is that you don't throw your own exceptions in Interceptors. Instead, if you want to signal a failure, use synthetic http responses: @throws(IOException::class) override fun intercept(chain: Interceptor.Chain): Response { if (myConfig.isInvalid()) { return Response.Builder() .request(chain.request()) .protocol(Protocol.HTTP_1_1) .code(400) .message("client config invalid") .body("client config invalid".toResponseBody(null)) .build() } return chain.proceed(chain.request()) } — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1422097300&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=49yBJFITbunX9pmoFO1YjG%2FE3jw%2FMWVvakKGNeG1a%2Bk%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XOPGVMZIEM2H6KFWV3WWM64XANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C427871217882475efad508db09a0d1c0%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638114358538797592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lNu0Ja1n7INmEIlY%2F95c2pp32YKLDF0sAh6EFbomtII%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-02-15T08:17:00Z

Thanks @wangwenbj , and the UndeliverableException still persists?

wangwenbj · 2023-03-01T14:33:43Z

Hi Xiaofei, Weidong, Thanks for the reply. From the previous code snippet. It looks similar when this issue happened. Let me illustrate where we are and move it forward. 1. Trak1 is the rxjava version of Azure SDK, right? Do we have plan to update the reactor SDK according to the old? 2. When we use the RxJava version of Azure SDK, our interface if RxJavaV3 and we translate all the RxJavaV1 output into RxjavaV3 3. When we use the Reactor version of Azure SDK, we translate the Reactor interface to RxJavaV3 which is compatible to our service. 4. We do use some blocking methods to switch the async calls to sync result, in some async RxJava chains, I wonder if this will cause some issue? Best regards, Wen From: Weidong Xu ***@***.***> Sent: Wednesday, March 1, 2023 1:04 PM To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>; Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email I think track1 does not set callTimeout (so it probably the default 0), but connect/read timeout is set https://github.com/Azure/autorest-clientruntime-for-java/blob/master/client-runtime/src/main/java/com/microsoft/rest/RestClient.java#L264-L265<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fautorest-clientruntime-for-java%2Fblob%2Fmaster%2Fclient-runtime%2Fsrc%2Fmain%2Fjava%2Fcom%2Fmicrosoft%2Frest%2FRestClient.java%23L264-L265&data=05%7C01%7Cwwen%40vmware.com%7C24c70e1718f04b3e8d3408db1a125af6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638132438381140909%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Hi2%2F7GlFLCov3jcvy8JryWh%2Bydkk0k77z13MNXHzAzU%3D&reserved=0> - Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1449353546&data=05%7C01%7Cwwen%40vmware.com%7C24c70e1718f04b3e8d3408db1a125af6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638132438381140909%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yowwn2jXXE1gnr8VqSx%2Fob5wu9SN09PI1iRLdUSG4W8%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XNWJMTI6A5J4BI23Q3WZ3KDTANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C24c70e1718f04b3e8d3408db1a125af6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638132438381140909%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=56vu8OQc0pQ0piN1Pw%2Fh1r6l0nOBZduzn%2FEioQ0yFgs%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-03-02T07:59:15Z

Hi wen,

Thanks for the clarification. Rxjava translators should be fine.

For 1, I don't think so since our track1 SDK is officially deprecated since March 2022.
For 4, do you mean

chain.blockingGet()

or

chain.map(v -> 
    {
        anotherChain.blockingGet();
        return v;
    }

?
The latter is not correct since one shouldn't do sync blocks inside a chain. Some codesnippet would be helpful for us to better understand your situation.

Another thing is, have you set any callTimeouts to OkHttpClient(or OkHttpAsyncHttpClient)? We can't control the timeout exception since it's directly from OkHttp itself. You could set the calltimeout to a higher value if this is the case.

XiaofeiCao · 2023-03-02T08:32:42Z

Also, I saw from the stacktrace that there's a blockingGet in AzureResilienceInterceptorImpl:

com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)

Usually it should be fine to do blocking calls in Okhttp interceptors. However, if you could share what you do with the blockingGet, it would help us better understand the situation. You could do that in my personal repo: https://github.com/XiaofeiCao/ioexception_repro, or email me if that's possible.

Further question, does this line always appear in the exception's stacktrace?
Or does the exception happen somewhere else too? If so, could you share the stacktrace?

wangwenbj · 2023-03-02T14:08:16Z

Thnaks Xiaofei, For the blocking call example. Your sample code is what I described in the thread. For the AzureResilienceInterceptorImpl, we have this: response = chain.proceed(request); // Call Azure. And this is the expected rest call to Azure in the OkhttpClient interceptor. We did not use blocking calls here anyways. We didn’t set any timeouts in the okhttp client for the new Azure SDK. CMIIAW, I don’t think we have it set in the old one. Any suggestion for the new impl? Also, I checked the old SDK is still supported by the end of this month from: https://azure.github.io/azure-sdk/releases/latest/all/java.html ***@***.*** Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Thursday, March 2, 2023 at 16:32 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Also, I saw from the log that there's a blockingGet in AzureResilienceInterceptorImpl: com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) Usually it should be fine to do blocking calls in Okhttp interceptors. However, if you could share what you do with the blockingGet, it would help us better understand the situation. You could do that in my personal repo: https://github.com/XiaofeiCao/ioexception_repro<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXiaofeiCao%2Fioexception_repro&data=05%7C01%7Cwwen%40vmware.com%7C7d319b31b39942447a2408db1af8b766%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638133427778270229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FTVlTi70qob%2FsbPvU9trce3TAhLBI3M9qEQTDz0MFgs%3D&reserved=0>, or email me if that's possible. Further question, does this line always appear in the exception's stacktrace? Or does the exception happen somewhere else too? If so, could you share the stacktrace? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1451485612&data=05%7C01%7Cwwen%40vmware.com%7C7d319b31b39942447a2408db1af8b766%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638133427778270229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FejXAQUU3GXZ4Hcg8HUTvG2p%2FhZCCM22FiOniGXKG7c%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XP5AQU4E25ZPXQLDETW2BLLLANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C7d319b31b39942447a2408db1af8b766%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638133427778426437%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Fsh37cmYZokj75iB%2BcxWn9%2BvEDbROT7U3f%2Fw4b7bvL0%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-03-08T07:56:34Z

@wangwenbj Could you show me how you set up your OkHttpClient please? Or do you leave it as default?

wangwenbj · 2023-03-08T13:27:03Z

Hi, Xiaofei, private HttpClient buildHttpClient(CredentialAzure credentialAzure) { OkHttpClient.Builder okHttpClientBuilder = new OkHttpClient.Builder(); Set<Interceptor> interceptors = azureRestClientConfig.getInterceptors(); if (Objects.nonNull(interceptors) && !interceptors.isEmpty()) { for (Interceptor interceptor: interceptors) { okHttpClientBuilder.addInterceptor(interceptor); } } okHttpClientBuilder.addInterceptor(new HttpLoggingInterceptor().setLevel(HttpLoggingInterceptor.Level.BASIC)) .addInterceptor(new DynamicThrottleInterceptor(credentialAzure.getClientId())) .addInterceptor(azureResilienceInterceptor); OkHttpClient okHttpClient = okHttpClientBuilder.build(); OkHttpAsyncHttpClientBuilder builder = new OkHttpAsyncHttpClientBuilder(okHttpClient) .readTimeout(Duration.of(azureRestClientConfig.getReadTimeoutSecond(), ChronoUnit.SECONDS)) .connectionTimeout(Duration.of(azureRestClientConfig.getConnectTimeoutSecond(), ChronoUnit.SECONDS)); return builder.build(); } Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Wednesday, March 8, 2023 at 15:56 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7Cc9018d6ede5f4d01ba8a08db1faaaa20%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638138590087040034%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MaucyouTHdYHz1lAOKgsy017VEZnckEKDKO9DSg1Wj0%3D&reserved=0> Could you show me how you set up your OkHttpClient please? Or do you leave it as default? — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1459684730&data=05%7C01%7Cwwen%40vmware.com%7Cc9018d6ede5f4d01ba8a08db1faaaa20%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638138590087040034%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7ThshM3ALJqZ1wMY5iQxn7%2FEoTZER8WinB9IXJUpdP8%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XLOKVEAJQSLSSBVZX3W3A3T3ANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7Cc9018d6ede5f4d01ba8a08db1faaaa20%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638138590087040034%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c3E1R5ssYVUUU9OOson1g0ja0aXK5u4g2i2R0Zh%2BQR0%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-03-09T09:21:18Z

Thanks @wangwenbj for your code snippet!

I was able to reproduce your situation in my demo repo test.

Exception in thread "Thread-11" reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
    at reactor.core.Exceptions.propagate(Exceptions.java:396)
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
    at reactor.core.publisher.Mono.block(Mono.java:1742)
    at com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569)
    at com.azure.resourcemanager.resources.implementation.DeploymentsImpl.checkExistence(DeploymentsImpl.java:102)
    at com.azure.resourcemanager.repro.ioexception.test.undeliverable.CallTimeoutMockTests$1.run(CallTimeoutMockTests.java:129)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.InterruptedException
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1048)
    at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
    ... 5 more
java.io.IOException: Canceled
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at com.azure.resourcemanager.repro.ioexception.test.undeliverable.CallTimeoutMockTests.lambda$buildHttpClient$1(CallTimeoutMockTests.java:177)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
    at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

lt's very similar to this issue, in which the calling thread got interrupted.

The IOException: Canceled is logged in OkHttpClient interceptor, caused by the thread interruption. Now it's all about finding where did this interruption occur.

Does this error always occur on this line?

com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569)

wangwenbj · 2023-03-09T12:34:04Z

Hi Xiaofei, It’s very similar. What I experience are at these calls: 1. Create resource group 2. Get NSG 3. Create VM We used the async method and used as a blocking outside the chain. e.g. azure.networkSecurityGroups().getByResourceGroupAsync(resourceGroupName, securityGroupName) Do we have a work around / fix for these kind of issue? Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Thursday, March 9, 2023 at 17:21 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Thanks @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7C37059c5f81a94177309f08db207fab33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638139504950018860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8XSZrZYX%2FAXCaPBljTlA8mJqJmmPTqdEB13TTVzjQS8%3D&reserved=0> for your code snippet! I was able to reproduce your situation in my demo repo test<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXiaofeiCao%2Fioexception_repro%2Fblob%2F7a3c53c2990b1246a83096c11c43c01a7e39a7c4%2Fsrc%2Ftest%2Fjava%2Fcom%2Fazure%2Fresourcemanager%2Frepro%2Fioexception%2Ftest%2Fundeliverable%2FCallTimeoutMockTests.java%23L110&data=05%7C01%7Cwwen%40vmware.com%7C37059c5f81a94177309f08db207fab33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638139504950018860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lkeFtYNNd2CFzg%2Bj9YRpkiqVM7MhE1ibdn293g%2B8rgc%3D&reserved=0>. Exception in thread "Thread-11" reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException at reactor.core.Exceptions.propagate(Exceptions.java:396) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91) at reactor.core.publisher.Mono.block(Mono.java:1742) at com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569) at com.azure.resourcemanager.resources.implementation.DeploymentsImpl.checkExistence(DeploymentsImpl.java:102) at com.azure.resourcemanager.repro.ioexception.test.undeliverable.CallTimeoutMockTests$1.run(CallTimeoutMockTests.java:129) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.lang.InterruptedException at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1048) at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) ... 5 more java.io.IOException: Canceled at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.azure.resourcemanager.repro.ioexception.test.undeliverable.CallTimeoutMockTests.lambda$buildHttpClient$1(CallTimeoutMockTests.java:177) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) lt's very similar to this issue<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33829&data=05%7C01%7Cwwen%40vmware.com%7C37059c5f81a94177309f08db207fab33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638139504950018860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hfjsQgr21DwO8wZfj00%2BkpRuz6JgVts6pKhvpxtUp2U%3D&reserved=0>, in which the calling thread got interrupted. The IOException: Canceled is logged in OkHttpClient interceptor, caused by the thread interruption. Does this error always occur on this line? com.azure.resourcemanager.resources.implementation.DeploymentsClientImpl.checkExistence(DeploymentsClientImpl.java:7569) — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1461637421&data=05%7C01%7Cwwen%40vmware.com%7C37059c5f81a94177309f08db207fab33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638139504950018860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=g90NLIEeIFTz1qdP0DrJB6Kwef69bmid%2BOpZG4Gg3QE%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XKEQQQQ4OBKS57KA4DW3GOJVANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7C37059c5f81a94177309f08db207fab33%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638139504950018860%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2sxKww8Zr8dGkOP2krzyvn0c8XA38RqS4ZW4Lpf2cQk%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-03-10T08:13:53Z

Hi @wangwenbj ,

Thanks @wangwenbj for the information.

Sorry for not making my point clear. The above demo only simulated the error log. It's a guess of what actually happened.

I'm still trying to reproduce in normal situations.

XiaofeiCao · 2023-03-10T10:12:13Z

I've updated my real-time test with 100 concurrent resource group creation and deletion.

I'll leave it running till the bug is reproduced.

Meanwhile, may I know what you did after you throw an Exception in DynamicThrottleInterceptor when the quota delay calculated is positive?

long delay = getQuotaDelay(requestMethod, requestUrl, clientId);

    if (delay > 0) {

        throw new Exception();

    }

weidongxu-microsoft · 2023-03-10T12:37:31Z

Let's make it simpler.

@XiaofeiCao , you already have the test running. Configure it as best as author's (same OkHttpClient config, same Interceptor configure, same scale, same AKS instance configure if need to be), run it till we see the same problem.

If we reproduce it, diagnose and fix it. If we don't see it, while it does not prove there is no bug in SDK, at least it means the bug is unlikely.

The reason is that apparently we cannot have code from author's stress test, and even if we had it, it may contain too many code that not belong to SDK and could be a cause in itself. We'd like to limit Xiaofei's reproduction on a relatively simple scenario that having minimal non-SDK code, so that it focus on reproducing SDK bug.

@wangwenbj , if you think Xiaofei's test fail to reproduce the problem, please let him know what you'd like him to change.
Both Xiaofei and me has email in profile, and you can email us for anything you think might help to diagnose the problem.

wangwenbj · 2023-03-13T00:51:07Z

Thanks, Weidong, Xiaofei, I am trying to reproduce this issue from my side as well. I will keep you updated once I can have this issue reproduced as well. Best regards, Wen From: Weidong Xu ***@***.***> Date: Friday, March 10, 2023 at 20:37 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Let's make it simpler. @XiaofeiCao<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FXiaofeiCao&data=05%7C01%7Cwwen%40vmware.com%7Cdc588ea4a1864c4b271f08db21643ea8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140486656431409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7IVZw2eBsXlKeyrJBNhYUQB9a8J%2FxfI7fsvc%2Fm9r1wQ%3D&reserved=0> , you already have the test running. Configure it as best as author's (same OkHttpClient config, same Interceptor configure, same scale, same AKS instance configure if need to be), run it till we see the same problem. If we reproduce it, diagnose and fix it. If we don't see it, while it does not prove there is no bug in SDK, at least it means the bug is unlikely. The reason is that apparently we cannot have code from author's stress test, and even if we had it, it may contain too many code that not belong to SDK and could be a cause in itself. We'd like to limit Xiaofei's test on a relatively simple scenario that having minimal non-SDK code, so that it focus on reproducing SDK bug. @wangwenbj<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwangwenbj&data=05%7C01%7Cwwen%40vmware.com%7Cdc588ea4a1864c4b271f08db21643ea8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140486656431409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=fH9m%2FeUr1ap50q3Y%2B3F3fgLfD083rMOa1Nj%2Bp%2Fm4h%2FU%3D&reserved=0> , if you think Xiaofei's test fail to reproduce the problem, please let him know what you'd like him to change. Both Xiaofei and me has email in profile, and you can email us for anything you think might help to diagnose the problem. — Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fazure-sdk-for-java%2Fissues%2F33056%23issuecomment-1463745809&data=05%7C01%7Cwwen%40vmware.com%7Cdc588ea4a1864c4b271f08db21643ea8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140486656431409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=x%2BbyoXR1byo%2BPPLN1L%2B7uHJkCgiMFMnZ4q6YAhFIg%2Bc%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAESK3XPANKFNUA4W4WYMIH3W3MOBPANCNFSM6AAAAAAT6WTJ2A&data=05%7C01%7Cwwen%40vmware.com%7Cdc588ea4a1864c4b271f08db21643ea8%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638140486656431409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3YwDqfk26G6YY%2BGpUMZ311L1tile%2BwT10ACP%2FHmpxA0%3D&reserved=0>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

ghost · 2023-03-13T07:58:22Z

Thank you for your feedback. This has been routed to the support team for assistance.

wangwenbj · 2023-03-15T14:20:14Z

@XiaofeiCao
According to Azure network support team, this issue seems to happen in the following sequence:

Submit a request. e.g. create a resource group
This request succeeded in secondes
Using the new Azure SDK, we did not see any response in 20 minutes and finally timeout from client side
Azure service got a client failure after 20 minutes and then refused this request.

ghost · 2023-03-17T03:25:28Z

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

Issue Details

Describe the bug
We encountered the following errors on heavy load when creating resource group and network security group using Azure Java SDK new version, The Webclient is OkHttpClient. This issue is not happending in the old rxjava version though

Exception or Stack Trace

Exception in thread "RxCachedThreadScheduler-141" io.reactivex.rxjava3.exceptions.UndeliverableException: The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
at io.reactivex.rxjava3.plugins.RxJavaPlugins.onError(RxJavaPlugins.java:372)
at io.reactivex.rxjava3.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:49)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleResumeNext.subscribeActual(SingleResumeNext.java:39)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
at io.reactivex.rxjava3.core.Scheduler$DisposeTask.run(Scheduler.java:644)
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65)
at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

To Reproduce
This issue cannot be reproduced eaisly. It happens every now and then in our production env and we have nowhere to catch and handle this issue.

In large scale of resoruce group creation we encounter this issue some times. I have reproduce this only once locally using 100 resource groups provision in parallel.

Code Snippet
ResourceGroup.DefinitionStages.WithCreate creator = this.azureResoureManager.resourceGroups().define(resourceGroupName)
.withRegion(region);
return ReactorToRxV3Interop.monoToSingle(creator.createAsync());

Expected behavior
No exception happend or if exception happened we could have a way to catch it inside the reactor chain.

Screenshots
API error. No screen shots

Additional context
This part of log is what we catch in our customized okhttp interceptor, however, after the exception is thrown, the upper chain lost track of this exception. Which caused the chain to never stop.

2023-01-11T17:05:44.011Z [trace_id=9492315ecd8cdf9e9db291d40c42e57b] [transaction_id=1e99ae844e81ce79] ERROR [gement.azure.com/...] .i.i.AzureResilienceInterceptorImpl.logRetryInfoForError:506 - Exception: java.io.IOException: Canceled at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:72) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.interceptor.AzureResilienceInterceptorImpl.intercept(AzureResilienceInterceptorImpl.java:117) at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at com.vmware.horizon.sg.clouddriver.impl.azure.internal.DynamicThrottleInterceptor.intercept(DynamicThrottleInterceptor.java:80) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:221) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201) at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)

| stream | stdout

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

Bug Description Added
Repro Steps Added
Setup information Added

Author:	wangwenbj
Assignees:	XiaofeiCao
Labels:	`question`, `ARM`, `Service Attention`, `Mgmt`, `customer-reported`, `pillar-reliability`, `needs-team-attention`
Milestone:	-

navba-MSFT · 2023-03-17T03:26:03Z

@armleads-azure Could you please look into this ? Thanks in advance.

CC @jennyhunter-msft @josephkwchan

XiaofeiCao · 2023-03-21T09:21:30Z

Thanks @wangwenbj , I was able to get the request log from your second screenshot. I believe it's a NetworkSecurityGroup query?

Strangely the httpStatusCode is 404, which means the nsg is not deployed(or less likely, the client sends the wrong URL)...

I don't know where to locate the log from your first screenshot. Are they targeting the same networkSecurityGroup?

XiaofeiCao · 2023-03-28T06:42:44Z

Hi @wangwenbj , would you try replacing sync call

Single.fromCallable(() -> azureResourceManager.deployments().checkExistence(resourceGroupName, nsgName))

with below async one, and see if the Exception throws again?

azureResourceManager.deployments().manager().serviceClient().getDeployments().checkExistenceAsync(resourceGroupName, nsgName)

And avoid any sync http calls in reactor/rxjava chain, like the first code snippet(checkExistence's implementation is checkExistenceAsync.block()). I tried it in my repo and it got stuck:
https://github.com/XiaofeiCao/ioexception_repro/blob/5db3fbcb4c6b03196d0b56f8555c9fa7849210b7/src/test/java/com/azure/resourcemanager/repro/ioexception/test/undeliverable/BatchCreateResourceGroupTests.java#L107

wangwenbj · 2023-03-28T07:19:50Z

Thanks, Xiaofei, We had a Rx wrapper that works as throttle control for preventing Azure 429 response with our own retry machanism. That’s why we use Single.fromCallable() and then blocking. Also, this works for all the Async operations. Though I could try what you suggested in my own env. I wonder if we could handle this with the existing Rx syn -> async flow? Best regards, Wen From: Xiaofei Cao ***@***.***> Date: Tuesday, March 28, 2023 at 14:42 To: Azure/azure-sdk-for-java ***@***.***> Cc: Wen Wang ***@***.***>, Mention ***@***.***> Subject: Re: [Azure/azure-sdk-for-java] [BUG] UndeliverableException when creating resource group and network security group in heavy load (Issue #33056) !! External Email Hi @wangwenbj<https://github.com/wangwenbj> , would you try replacing sync call Single.fromCallable(() -> azureResourceManager.deployments().checkExistence(resourceGroupName, nsgName)) with below async one, and see if the Exception throws again? azureResourceManager.deployments().manager().serviceClient().getDeployments().checkExistenceAsync(resourceGroupName, nsgName) And avoid any sync http calls in reactor/rxjava chain, like the first code snippet(checkExistence's implementation is checkExistenceAsync.block()). I tried it in my repo and it got stuck: https://github.com/XiaofeiCao/ioexception_repro/blob/5db3fbcb4c6b03196d0b56f8555c9fa7849210b7/src/test/java/com/azure/resourcemanager/repro/ioexception/test/undeliverable/BatchCreateResourceGroupTests.java#L107 — Reply to this email directly, view it on GitHub<#33056 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AESK3XPC7UTKJ4GC7XORY5DW6KB7BANCNFSM6AAAAAAT6WTJ2A>. You are receiving this because you were mentioned.Message ID: ***@***.***> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.

XiaofeiCao · 2023-03-28T08:50:01Z

I see.

sync -> async is tricky in this case. If you are doing simple sync call without IO operations involved, e.g. getting a model's innerModel's properties, you can safely do that.

But if IO operations are involved, I think you should always avoid it. Like in this case, checkDeploymentExists() is achieved by checkDeploymentExistsAsync().block(), which involves http invocation. You should always resort to an async variant if applicable.

Unfortunately in this case, we didn't provide an async variant in convenience layer. Though you could use serviceClient level code instead, which is

azureResourceManager.deployments().manager().serviceClient().getDeployments().checkExistenceAsync(resourceGroupName, nsgName)

Then wrap it using Single.fromPublisher.

XiaofeiCao · 2023-04-04T09:49:18Z

Hi, does the issue still persists?

You could also try

Single.fromCallable(() ->
        azureResourceManager
                .deployments()
                .checkExistence(resourceGroupName, nsgName))
        .subscribeOn(Schedulers.io())

wangwenbj · 2023-04-07T11:09:22Z

Hi Xiaofei,

So what you mean is we could avoid this from happening if we did not throw IOException? It's probably not easy to do so. Let me specify the usage

We Added an exception that extends IOException in the OkHttpClient interceptor level to avoid Azure quota limit (This is not the root cause of this issue, but IOException). We use a Rxjava wrapper to retry against the previous exception. This is why we need async and sync transformation. Do you have any suggestions for this impl since once this exception is thrown, IOException is not avoidable
We are trying to modify the usages of what is identified. Need some more time on testing.

The issue identified here originally could caused by requests for creating, getting, and maybe updating. Before we rolled back the SDK to the old one, it happened in many places. So my guess is it could be a framework-level issue or in some common area. With the same implementation using the old Azure SDK, no similar issue happened ever since. Hope this could help with identifying the real root cause.

Also, I will keep trying and keep you updated on any progress. Thanks!

waynewang1989 · 2023-06-06T08:52:01Z

@XiaofeiCao We just encountered another issue which might related to this one.
We are trying to re-enable the new Azure SDK in our production and when we try to call this API:

azure.resourceGroups().getByName(name);

We have several threads hang due to this API call. Please take sometime check. The SDK version is <com.azure.resourcemanager.version>2.26.0</com.azure.resourcemanager.version>
We are opening SR ticket to the Azure support team as well in the meantime. Please let us know if there's anything you need from the backend service.
Thanks!

XiaofeiCao · 2023-06-08T08:54:12Z

Thanks @waynewang1989 for reporting!

To clarify, will the thread hang forever, or terminate after sometime with failure (which is the similar 20 minutes behavior as before)?

If latter, is the error stack trace also similar as before?

waynewang1989 · 2023-06-08T09:48:32Z

@XiaofeiCao,
It seems to hang forever. We have an upper-layer timeout reported. From the logs, No REST call is happening to Azure via the configured OkHttpClient ( as client of HttpClient).
And this is a sync call

joshfree assigned weidongxu-microsoft Jan 19, 2023

joshfree added ARM Mgmt This issue is related to a management-plane library. pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing) labels Jan 19, 2023

ghost removed the needs-triage This is a new issue that needs to be triaged to the appropriate team. label Jan 19, 2023

weidongxu-microsoft assigned XiaofeiCao and unassigned weidongxu-microsoft Jan 20, 2023

XiaofeiCao added the CXP Attention label Mar 13, 2023

ghost added the needs-team-attention This issue needs attention from Azure service team or SDK team label Mar 13, 2023

navba-MSFT added Service Attention This issue is responsible by Azure service team. and removed CXP Attention labels Mar 17, 2023

XiaofeiCao mentioned this issue Jun 1, 2023

[BUG] mgmt, investigate on occasional thread stuck #35244

Open

[BUG] UndeliverableException when creating resource group and network security group in heavy load #33056

[BUG] UndeliverableException when creating resource group and network security group in heavy load #33056

Comments

wangwenbj commented Jan 18, 2023

joshfree commented Jan 19, 2023

weidongxu-microsoft commented Jan 20, 2023 • edited

wangwenbj commented Jan 20, 2023 via email

wangwenbj commented Jan 30, 2023 via email

XiaofeiCao commented Jan 30, 2023

wangwenbj commented Jan 30, 2023 via email

XiaofeiCao commented Jan 30, 2023 • edited

wangwenbj commented Jan 30, 2023 via email

XiaofeiCao commented Jan 30, 2023

wangwenbj commented Jan 30, 2023 via email

XiaofeiCao commented Jan 31, 2023

wangwenbj commented Feb 1, 2023 via email

XiaofeiCao commented Feb 1, 2023 • edited

wangwenbj commented Feb 1, 2023 via email

XiaofeiCao commented Feb 2, 2023

wangwenbj commented Feb 8, 2023 via email

XiaofeiCao commented Feb 8, 2023 • edited

wangwenbj commented Feb 9, 2023 via email

wangwenbj commented Feb 15, 2023 via email

XiaofeiCao commented Feb 15, 2023

wangwenbj commented Mar 1, 2023 via email

XiaofeiCao commented Mar 2, 2023 • edited

XiaofeiCao commented Mar 2, 2023 • edited

wangwenbj commented Mar 2, 2023 via email

XiaofeiCao commented Mar 8, 2023

wangwenbj commented Mar 8, 2023 via email

XiaofeiCao commented Mar 9, 2023 • edited

wangwenbj commented Mar 9, 2023 via email

XiaofeiCao commented Mar 10, 2023

XiaofeiCao commented Mar 10, 2023 • edited

weidongxu-microsoft commented Mar 10, 2023 • edited

wangwenbj commented Mar 13, 2023 via email

ghost commented Mar 13, 2023

wangwenbj commented Mar 15, 2023

ghost commented Mar 17, 2023

navba-MSFT commented Mar 17, 2023

XiaofeiCao commented Mar 21, 2023

XiaofeiCao commented Mar 28, 2023

wangwenbj commented Mar 28, 2023 via email

XiaofeiCao commented Mar 28, 2023 • edited

XiaofeiCao commented Apr 4, 2023

wangwenbj commented Apr 7, 2023

waynewang1989 commented Jun 6, 2023

XiaofeiCao commented Jun 8, 2023

waynewang1989 commented Jun 8, 2023

weidongxu-microsoft commented Jan 20, 2023 •

edited

XiaofeiCao commented Jan 30, 2023 •

edited

XiaofeiCao commented Feb 1, 2023 •

edited

XiaofeiCao commented Feb 8, 2023 •

edited

XiaofeiCao commented Mar 2, 2023 •

edited

XiaofeiCao commented Mar 2, 2023 •

edited

XiaofeiCao commented Mar 9, 2023 •

edited

XiaofeiCao commented Mar 10, 2023 •

edited

weidongxu-microsoft commented Mar 10, 2023 •

edited

XiaofeiCao commented Mar 28, 2023 •

edited