com.microsoft.azure.CloudException: A retryable error occurred. #1924

menih · 2017-09-26T04:17:18Z

This happens when creating a load balancer. Second call work just fine. Seems some sort of concurrency, timing, or resource readiness constraint. Does any of the arguments for loadBalancers().define() has state requirement? We're creating resource group right before LB creation.

menih · 2017-09-26T04:29:19Z

And we also create new security groups that are being attached to the network associated with the LB.

martinsawicki · 2017-09-27T12:13:03Z

we'd need to see a repro code sample.

menih · 2017-09-27T15:20:48Z

There is not much to show... just a plain define()...create() call...

azure.loadBalancers().define(name).withRegion(region).withExistingResourceGroup(rgName)
.defineLoadBalancingRule(LOAD_BALANCER_RULE).withProtocol(TransportProtocol.TCP)
.fromExistingSubnet(getVirtualNetwork(vNetId), subnetName).fromFrontendPort(LOAB_BALANCER_PORT)
.toBackend(LOAD_BALANCER_BACKEND_POOL).withProbe(LOAD_BALANCER_HEALTH_PROBE).attach()
.defineHttpProbe(LOAD_BALANCER_HEALTH_PROBE).withRequestPath("/").withPort(LOAB_BALANCER_PORT).attach()
.create()

menih · 2017-09-28T06:55:24Z

It happens intermittently. And the question is really more about what can cause this kind of exception...?

…

On Wed, Sep 27, 2017 at 5:13 AM, Martin Sawicki ***@***.***> wrote: we'd need to see a repro code sample. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1924 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHRQFWc2fUiwYWeAMcDcJEIFwj9m5ULyks5smjvTgaJpZM4PjpyU> .

martinsawicki · 2017-10-04T22:46:38Z

yep, there is nothing unique about the sample code shown here, it looks much like our test cases and samples. I haven't seen this error in this specific context, but the best guess right now is that there may be some transient issue in Azure. In general, SDK issues don't manifest themselves as retryable errors from the service, but more commonly as very consistent NPEs, or the like. Especially in cases like this, where the shown code sample doesn't do anything concurrently - this should result in just a single simple REST call to Azure networking. Other possibilities include: your app is getting throttled because of too many requests going into Azure within a sort time window. Just a guess - it'd not be related to this API specifically, but maybe there is a bunch of other things your app is going concurrently that could trigger throttling on the service side?

menih · 2017-10-05T03:03:44Z

Whatever the reason may be, Microsoft Azure needs to take a very close look at fundamentally and address their architecture. We're a ISV providing products on Azure environment and needs to be lots of explanation to customers about the reason for all these failure. Another issue, not related is different response to get() and list() APIs... the IDs of objects are not the same (!)... one is cased, the other lower-cased... applies to many APIs, namely VMs, Route Tables, Subnets, VNETs, etc.

…

On Wed, Oct 4, 2017 at 3:46 PM, Martin Sawicki ***@***.***> wrote: yep, there is nothing unique about the sample code shown here, it looks much like our test cases and samples. I haven't seen this error in this specific context, but the best guess right now is that there may be some transient issue in Azure. In general, SDK issues don't manifest themselves as retryable errors from the service, but more commonly as very consistent NPEs, or the like. Especially in cases like this, where the shown code sample doesn't do anything concurrently - this should result in just a single simple REST call to Azure networking. Other possibilities include: your app is getting throttled because of too many requests going into Azure within a sort time window. Just a guess - it'd not be related to this API specifically, but maybe there is a bunch of other things your app is going concurrently that could trigger throttling on the service side? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1924 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHRQFXRFtKHiSPsll2c2QJ4WXIvXsFURks5spArTgaJpZM4PjpyU> .

martinsawicki · 2017-10-11T22:35:17Z

yes, resource ID casings in Azure REST APIs may vary for the same resource, depending on how they are fetched. Strictly speaking, resource IDs are case-insensitive from the point of view of the service (therefore comparisons should ideally be done in case-insensitive ways), but ideally the casing should be consistent and preserved. This is a known issue. We've been advised not to try to work around that on the SDK side, but rely on the Azure backend addressing this issue eventually -- it's being looked at...

Btw, in case of such intermittent issues with backend services, rather than the SDK, which looks like the most likely case here, I'd recommend submitting them to Azure Support, since they could investigate a specific failure in a specific subscription based on service logs, and could get a deeper understanding of the root cause.

menih · 2017-10-12T00:27:51Z

Not ideally... but literally... would be good to at list document this. Other side comments on the design... 1) It is uncommon to use user defined names in IDs... it suggest the names are immutable. 2) It is uncommon to be use the unique object identifier in case insensitive. I think Microsoft Azure should probably reconsider and either use something like UUID which is most common in the industry. This will kill two birds in one stone...

…

On Wed, Oct 11, 2017 at 3:35 PM, Martin Sawicki ***@***.***> wrote: yes, resource ID casings in Azure REST APIs may vary for the same resource, depending on how they are fetched. Strictly speaking, resource IDs are case-insensitive from the point of view of the service (therefore comparisons should ideally be done in case-insensitive ways), but ideally the casing should be consistent and preserved. This is a known issue. We've been advised not to try to work around that on the SDK side, but rely on the Azure backend addressing this issue eventually -- it's being looked at... Btw, in case of such intermittent issues with backend services, rather than the SDK, which looks like the most likely case here, I'd recommend submitting them to Azure Support, since they could investigate a specific failure in a specific subscription based on service logs, and could get a deeper understanding of the root cause. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1924 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHRQFVnPKGanX1r8bm8TwX92ACDx8wyGks5srUKqgaJpZM4PjpyU> .

vhvb1989 · 2019-09-20T17:36:21Z

@martinsawicki , was this fixed on #1971? can we close this issue as well?

joshfree · 2019-10-01T23:53:38Z

Closing out (several year) old issues in the repo

Gagan059 · 2020-01-05T15:43:15Z

I observed a different flavor of this issue today. The load balancer was deployed already through the portal, and this error showed because the backend pool was still being deployed. I had to wait for it to finish before I could create the health probe without this problem.

martinsawicki self-assigned this Oct 4, 2017

martinsawicki mentioned this issue Nov 14, 2017

java.lang.IllegalArgumentException: Parameter resourceGroupName is required and cannot be null. #1971

Closed

praries880 added the Network - Load Balancer label Nov 14, 2018

praries880 unassigned martinsawicki Nov 14, 2018

joshfree added the Mgmt This issue is related to a management-plane library. label Oct 1, 2019

joshfree closed this as completed Oct 1, 2019

github-actions bot locked and limited conversation to collaborators Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

com.microsoft.azure.CloudException: A retryable error occurred. #1924

com.microsoft.azure.CloudException: A retryable error occurred. #1924

menih commented Sep 26, 2017

menih commented Sep 26, 2017

martinsawicki commented Sep 27, 2017

menih commented Sep 27, 2017

menih commented Sep 28, 2017 via email

martinsawicki commented Oct 4, 2017

menih commented Oct 5, 2017 via email

martinsawicki commented Oct 11, 2017

menih commented Oct 12, 2017 via email

vhvb1989 commented Sep 20, 2019

joshfree commented Oct 1, 2019

Gagan059 commented Jan 5, 2020 •

edited

com.microsoft.azure.CloudException: A retryable error occurred. #1924

com.microsoft.azure.CloudException: A retryable error occurred. #1924

Comments

menih commented Sep 26, 2017

menih commented Sep 26, 2017

martinsawicki commented Sep 27, 2017

menih commented Sep 27, 2017

menih commented Sep 28, 2017 via email

martinsawicki commented Oct 4, 2017

menih commented Oct 5, 2017 via email

martinsawicki commented Oct 11, 2017

menih commented Oct 12, 2017 via email

vhvb1989 commented Sep 20, 2019

joshfree commented Oct 1, 2019

Gagan059 commented Jan 5, 2020 • edited

Gagan059 commented Jan 5, 2020 •

edited