-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race condition in GCE’s list_nodes() #727
Conversation
for i in v.get('instances', []): | ||
try: | ||
list_nodes.append(self._to_node(i)) | ||
except ResourceNotFoundError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think silently swallowing the exception justifies a comment :-)
Thanks @lhuard1A for the contribution, please can you add a comment about the swallowed exceptions then I can merge this. |
Invoking GCE’s `list_nodes()` while some VMs are being shutdown can result in the following exception to be raised out of `list_nodes()`: ``` File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 1411, in list_nodes v.get('instances', [])] File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 5065, in _to_node extra['boot_disk'] = self.ex_get_volume(bd['name'], bd['zone']) File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 3982, in ex_get_volume response = self.connection.request(request, method='GET').object File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 684, in request *args, **kwargs) File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 736, in request response = responseCls(**kwargs) File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 119, in __init__ self.object = self.parse_body() File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 259, in parse_body raise ResourceNotFoundError(message, self.status, code) libcloud.common.google.ResourceNotFoundError: {'domain': 'global', 'message': "The resource 'projects/lenaic/zones/europe-west1-c/disks/devops-reg' was not found", 'reason': 'notFound'} ``` The above error occurred while the `devops-reg` machine was being deleted. The issue occurs when the following events happen in that order: * [`list_nodes()` sends a request to list all the instances.](https://github.com/apache/libcloud/blob/trunk/libcloud/compute/drivers/gce.py#L1622) At this point, the `devops-reg` was still existing. * The `devops-reg` instance is deleted. * `list_nodes()` calls `_to_node` which calls [`ex_get_volume` which attempts to retrieve the information of the volumes](https://github.com/apache/libcloud/blob/trunk/libcloud/compute/drivers/gce.py#L4235) But, as the instance was deleted since it was listed, `ex_get_volume` raises a `ResourceNotFoundError` exception. When this happens, we should simply discard the node that was deleted during the execution of `list_nodes()` and return the information about the other nodes.
f49cea8
to
6d2b3cf
Compare
Thanks for the review @tonybaloney. |
LGTM 👍 |
Signed-off-by: anthony-shaw <anthony.p.shaw@gmail.com>
Invoking GCE’s
list_nodes()
while some VMs are being shutdown can resultin the following exception to be raised out of
list_nodes()
:The above error occurred while the
devops-reg
machine was being deleted.The issue occurs when the following events happen in that order:
list_nodes()
sends a request to list all the instances.At this point, the
devops-reg
was still existing.devops-reg
instance is deleted.list_nodes()
calls_to_node
which callsex_get_volume
which attempts to retrieve the information of the volumesBut, as the instance was deleted since it was listed,
ex_get_volume
raises aResourceNotFoundError
exception.When this happens, we should simply discard the node that was deleted during the execution of
list_nodes()
and return the information about the other nodes.