Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

com.microsoft.azure.CloudException: A retryable error occurred. #1924

Closed
menih opened this issue Sep 26, 2017 · 11 comments
Closed

com.microsoft.azure.CloudException: A retryable error occurred. #1924

menih opened this issue Sep 26, 2017 · 11 comments
Labels
Mgmt This issue is related to a management-plane library. Network - Load Balancer

Comments

@menih
Copy link

menih commented Sep 26, 2017

This happens when creating a load balancer. Second call work just fine. Seems some sort of concurrency, timing, or resource readiness constraint. Does any of the arguments for loadBalancers().define() has state requirement? We're creating resource group right before LB creation.

@menih
Copy link
Author

menih commented Sep 26, 2017

And we also create new security groups that are being attached to the network associated with the LB.

@martinsawicki
Copy link

we'd need to see a repro code sample.

@menih
Copy link
Author

menih commented Sep 27, 2017

There is not much to show... just a plain define()...create() call...

azure.loadBalancers().define(name).withRegion(region).withExistingResourceGroup(rgName)
.defineLoadBalancingRule(LOAD_BALANCER_RULE).withProtocol(TransportProtocol.TCP)
.fromExistingSubnet(getVirtualNetwork(vNetId), subnetName).fromFrontendPort(LOAB_BALANCER_PORT)
.toBackend(LOAD_BALANCER_BACKEND_POOL).withProbe(LOAD_BALANCER_HEALTH_PROBE).attach()
.defineHttpProbe(LOAD_BALANCER_HEALTH_PROBE).withRequestPath("/").withPort(LOAB_BALANCER_PORT).attach()
.create()

@menih
Copy link
Author

menih commented Sep 28, 2017 via email

@martinsawicki martinsawicki self-assigned this Oct 4, 2017
@martinsawicki
Copy link

yep, there is nothing unique about the sample code shown here, it looks much like our test cases and samples. I haven't seen this error in this specific context, but the best guess right now is that there may be some transient issue in Azure. In general, SDK issues don't manifest themselves as retryable errors from the service, but more commonly as very consistent NPEs, or the like. Especially in cases like this, where the shown code sample doesn't do anything concurrently - this should result in just a single simple REST call to Azure networking. Other possibilities include: your app is getting throttled because of too many requests going into Azure within a sort time window. Just a guess - it'd not be related to this API specifically, but maybe there is a bunch of other things your app is going concurrently that could trigger throttling on the service side?

@menih
Copy link
Author

menih commented Oct 5, 2017 via email

@martinsawicki
Copy link

yes, resource ID casings in Azure REST APIs may vary for the same resource, depending on how they are fetched. Strictly speaking, resource IDs are case-insensitive from the point of view of the service (therefore comparisons should ideally be done in case-insensitive ways), but ideally the casing should be consistent and preserved. This is a known issue. We've been advised not to try to work around that on the SDK side, but rely on the Azure backend addressing this issue eventually -- it's being looked at...

Btw, in case of such intermittent issues with backend services, rather than the SDK, which looks like the most likely case here, I'd recommend submitting them to Azure Support, since they could investigate a specific failure in a specific subscription based on service logs, and could get a deeper understanding of the root cause.

@menih
Copy link
Author

menih commented Oct 12, 2017 via email

@vhvb1989
Copy link
Member

@martinsawicki , was this fixed on #1971? can we close this issue as well?

@joshfree joshfree added the Mgmt This issue is related to a management-plane library. label Oct 1, 2019
@joshfree
Copy link
Member

joshfree commented Oct 1, 2019

Closing out (several year) old issues in the repo

@joshfree joshfree closed this as completed Oct 1, 2019
@Gagan059
Copy link

Gagan059 commented Jan 5, 2020

I observed a different flavor of this issue today. The load balancer was deployed already through the portal, and this error showed because the backend pool was still being deployed. I had to wait for it to finish before I could create the health probe without this problem.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Mgmt This issue is related to a management-plane library. Network - Load Balancer
Projects
None yet
Development

No branches or pull requests

6 participants