New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete failure "the number of retries has been exceeded: StatusCode=404" #596
Comments
Sorry for the delay. I looked into this a bit, if you attempt to delete a namespace that doesn't exist the service returns a 204 initial response so it's nothing that simple unfortunately. On the normal, success case, the initial response is a 202. Polling on the endpoint in the I thought perhaps it's a race condition, i.e. two goroutines attempting to delete the same namespace, however in my repro one of the calls to Other possibility is it's a bug in the endpoint itself. |
Do I understand that you can't repro this? It takes a while to get the error for us but we get it periodically. I ran the test 30 times in parallel (for different namespaces in different resource groups) and got one error, for example.
In the problematic case, the initial response is 202 for us and 404 happens later in the polling loop.
No, it doesn't, it issues one deletion command with awaiting and then gives up (we delete all stuck resources nightly but that's irrelevant).
I found this which suggests this might be the case. Do you have a backchannel to the service teams? That line is in the code for 3 years... Yet, it would be nice to do something with this case on the go-autorest side. Thank you for looking into this! |
Correct I wasn't able to repro this. I've updated my test app to create/delete the namespace in succession 50 times, let's see what happens. In what region are you seeing this happen? |
Running in a loop I was able to repro the issue. I will follow up with the service team to find out more. |
The service team investigated the issue, it's a bug on their side. They will work on deploying a fix but in the meantime you will need to work around the behavior. While you could write your own LRO polling loop, it might be simpler to check the status code along with a non-nil error. if err != nil && d.Response().StatusCode != http.StatusNotFound {
// real error, not due to spurious 404. handle appropriately
} |
Thank you @jhendrixMSFT! We did exactly this as a workaround. Hoping to see the fix on the service side. |
Glad that's working. Given this isn't a bug in the SDK I'm going to close this issue. |
We have a test in CI/CD which, as part of the cleanup, calls the Event Hub namespace DELETE operation. The operation is marked with
x-ms-long-running-operation
so we callWaitForCompletion
on the initial response.Sometimes, but not always,
WaitForCompletion
fails (returns an error) withIt would seem that a 404 on a DELETE operation is actually exactly what we need: this means the resource is successfully deleted. However, I'm not sure how exactly the awaiting goes sideways. I logged the initial response and it's a
Any idea what goes wrong here or how I can work around this behavior?
The text was updated successfully, but these errors were encountered: