Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflows: retry GCP VM creation up to 3 times #17068

Merged
merged 1 commit into from Aug 7, 2021

Conversation

nbusseneau
Copy link
Member

@nbusseneau nbusseneau commented Aug 6, 2021

Fixes an issue where the GCP VM fails to create:

Run gcloud compute instances create cilium-cilium-cli-1046557892-vm \
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The resource 'projects/***/regions/us-west2/subnetworks/default' is not ready

From GCP documentation, this comes from simultaneous resource operations (https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation) however we have no control over that. Their only recommendation is to add retries.

Considering it's an infrastructure issue and it also is easily contained to the GCP VM creation step, this is acceptable.

Picked up from cilium/cilium-cli#463.

Fixes: #17063

@nbusseneau nbusseneau added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Aug 6, 2021
@nbusseneau nbusseneau requested review from a team as code owners August 6, 2021 13:03
@nbusseneau nbusseneau requested a review from aanm August 6, 2021 13:03
Fixes an issue where the GCP VM fails to create:

```
Run gcloud compute instances create cilium-cilium-cli-1046557892-vm \
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The resource 'projects/***/regions/us-west2/subnetworks/default' is not ready
```

From GCP documentation, this comes from simultaneous resource operations
(https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation)
however we have no control over that. Their only recommendation is to
add retries.

Considering it's an infrastructure issue and it also is easily contained
to the GCP VM creation step, this is acceptable.

Picked up from cilium/cilium-cli#463.

Fixes: #17063

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@nbusseneau
Copy link
Member Author

Link to test run of workflow changes: https://github.com/cilium/cilium/actions/runs/1105157552
In this instance, we did not need to retry. For an example of how it behaves on retry, see this run from cilium-cli: https://github.com/cilium/cilium-cli/actions/runs/1102850699

  • Adds warnings in the logs:
Run nick-invision/retry@45ba062d357edb3b29c4a94b456b188716f61020

WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The resource 'projects/***/regions/us-west2/subnetworks/default' is not ready

Warning: Attempt 1 failed. Reason: Child_process exited with error code 1

WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - The resource 'projects/***/regions/us-west2/subnetworks/default' is not ready

Warning: Attempt 2 failed. Reason: Child_process exited with error code 1

WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/***/zones/us-west2-a/instances/cilium-cilium-cli-1102850699-vm].
NAME                             ZONE        MACHINE_TYPE                   PREEMPTIBLE  INTERNAL_IP   EXTERNAL_IP     STATUS
cilium-cilium-cli-1102850699-vm  us-west2-a  custom (e2, 2 vCPU, 4.00 GiB)  true         10.168.0.109  34.102.118.172  RUNNING
Command completed after 3 attempt(s).
  • And also adds warnings directly on the workflow run page.

@nbusseneau
Copy link
Member Author

Removing temporary test commit.

@nbusseneau nbusseneau force-pushed the pr/workflows-external-workloads-vm-retry branch from 70acc51 to c0414ea Compare August 6, 2021 13:16
@nbusseneau nbusseneau added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 6, 2021
@nbusseneau
Copy link
Member Author

Marking as ready-to-merge since all reviews are in and this PR does not need to run any additional CI.

@aanm aanm merged commit 7ed317f into master Aug 7, 2021
@aanm aanm deleted the pr/workflows-external-workloads-vm-retry branch August 7, 2021 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: External workloads (ci-external-workloads) installation-and-connectivity: Create GCP VM failed
3 participants