Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of retries in Azure ARM destroy_node. #1134

Closed
wants to merge 5 commits into from

Conversation

@tetron
Copy link
Contributor

commented Oct 16, 2017

Limit number of retries in Azure ARM destroy_node.

Description

Azure ARM has several retry loops related to deleting the NIC and VHD as part of node cleanup. This bounds those loops so they don't retry forever.

Status

  • done, ready for review

Checklist (tick everything that applies)

  • Code linting (required, can be done after the PR checks)
  • Documentation
  • ICLA (required for bigger changes)
@pquentin
Copy link
Contributor

left a comment

Thanks, the change make sense! There's a small docstring change. Also, the lint does not pass.

A possible future improvement is to use libcloud.utils.misc.retry which would reduce verbosity here and allow to add exponential back-off.

@@ -706,6 +709,10 @@ def destroy_node(self, node, ex_destroy_nic=True, ex_destroy_vhd=True):
this node (default True).
:type node: ``bool``
:param ex_retries: Number of times to retry checking if the node is gone,
destroying the NIC or destroying the VHD.

This comment has been minimized.

Copy link
@pquentin

pquentin Oct 17, 2017

Contributor

Please mention the default value.

This comment has been minimized.

Copy link
@pquentin

pquentin Oct 17, 2017

Contributor

Now that I think more about it, I think ex_retries is misleading because it suggests that we retry destroying, but we only check if it's gone multiple times. Do you have an idea for a better name than ex_retries that would convey this?

@@ -733,12 +740,14 @@ def destroy_node(self, node, ex_destroy_nic=True, ex_destroy_vhd=True):

# Poll until the node actually goes away (otherwise attempt to delete
# NIC and VHD will fail with "resource in use" errors).
while do_node_polling:
retries = ex_retries
while do_node_polling and retries > 0:
try:
time.sleep(10)

This comment has been minimized.

Copy link
@pquentin

pquentin Oct 17, 2017

Contributor

I was wondering: in your experience, how much time does it take for a node to go away?

This comment has been minimized.

Copy link
@tetron

tetron Oct 20, 2017

Author Contributor

A couple of minutes. The initial delete request is asynchronous, it responds "202 Accepted", which is why we have an idle loop to wait for it to really go away, because we can't delete the other resources (the NIC and VHD) until the leases are released.

This comment has been minimized.

Copy link
@pquentin

pquentin Oct 20, 2017

Contributor

Sure. Thanks for the answer. In this case, the initial sleep of 10 seconds is acceptable, otherwise it would have made more sense to poll before sleeping.

@pquentin

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

@tetron Any progress?

@ldipenti

This comment has been minimized.

Copy link
Contributor

commented Oct 31, 2017

@pquentin I've renamed ex_retries as requested, added a new param to control the polling delay.

@codecov-io

This comment has been minimized.

Copy link

commented Oct 31, 2017

Codecov Report

Merging #1134 into trunk will increase coverage by <.01%.
The diff coverage is 75.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##            trunk    #1134      +/-   ##
==========================================
+ Coverage   85.49%   85.49%   +<.01%     
==========================================
  Files         348      348              
  Lines       66499    66527      +28     
  Branches     5921     5922       +1     
==========================================
+ Hits        56852    56876      +24     
- Misses       7239     7243       +4     
  Partials     2408     2408
Impacted Files Coverage Δ
libcloud/compute/drivers/azure_arm.py 51% <53.33%> (+0.15%) ⬆️
libcloud/test/compute/test_azure_arm.py 99.33% <90.9%> (-0.67%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4270962...1c4f533. Read the comment docs.

@pquentin

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2017

@ldipenti Do you think you could test the retry behavior and fix the conflict from #1137 that I just merged? Thank you!

@asfgit asfgit closed this in 0c5bee8 Nov 8, 2017

@pquentin

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2017

Thank you! Merged in trunk.

@ldipenti

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2017

Great! thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.