Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(deploy): retry gcloud ssh connections if it fails #5292

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

gustavovalverde
Copy link
Member

Previous behavior

From time to time SSH connections to deployed VMs fails with the following error: kex_exchange_identification: Connection closed by remote host

Expected behavior

If the connection fails, attempt to reconnect once again (or multiple times)

Solution

Add the ConnectionAttempts and ConnectTimeout with 20 and 5 values respectively, which attempst to reconnect 19 more times every 5 seconds

Review

Anyone can review this

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

Follow Up Work

If this issue keeps raising, even after the retries, we might want to apply some best SSH practices for Google Cloud (which we might want to apply anyways) as stated here https://cloud.google.com/compute/docs/instances/connecting-advanced

Previous behavior
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`

Expected behavior
If the connection fails, attempt to reconnect once again (or multiple times)

Solution
Add the `ConnectionAttempts` and `ConnectTimeout` with 20 and 5 values
respectively, which attempst to reconnect 19 more times every 5 seconds
@gustavovalverde gustavovalverde added C-bug Category: This is a bug A-devops Area: Pipelines, CI/CD and Dockerfiles P-Critical 🚑 I-integration-fail Continuous integration fails, including build and test failures labels Sep 28, 2022
@gustavovalverde gustavovalverde requested a review from a team as a code owner September 28, 2022 15:35
@gustavovalverde gustavovalverde self-assigned this Sep 28, 2022
@gustavovalverde gustavovalverde requested review from dconnolly and removed request for a team September 28, 2022 15:35
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Sep 28, 2022
@teor2345 teor2345 added C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG and removed C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Sep 28, 2022
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems like it might work, I guess we have to merge to find out.

Next time we need to change these parameters, let's put the settings in one place using a SSH config file or workflow environmental variables?
That way we don't have to modify the workflow in 10 places for the same change.

mergify bot added a commit that referenced this pull request Sep 28, 2022
@gustavovalverde
Copy link
Member Author

Next time we need to change these parameters, let's put the settings in one place using a SSH config file or workflow environmental variables?

Yeah, I think we should even see how to accomplish a DRY(er) approach on this workflows.

@mergify mergify bot merged commit aaad60d into main Sep 28, 2022
@mergify mergify bot deleted the retry-ssh-connection branch September 28, 2022 21:45
@teor2345
Copy link
Contributor

Next time we need to change these parameters, let's put the settings in one place using a SSH config file or workflow environmental variables?

Yeah, I think we should even see how to accomplish a DRY(er) approach on this workflows.

Another thing we can do is move workflow and Docker scripts into .sh files.

gustavovalverde added a commit that referenced this pull request Oct 4, 2022
Previous behavior:
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`

This was still happening after implementing #5292

Excpected behavior:
Ensure we're not creating SSH key pairs on the fly to improve our connections
guarantees

Solution:
- Enable the Cloud Identity-Aware Proxy API in GCP
- Create a firewall rule to enable connections from IAP
- Grant the required IAM permissions to enable IAP TCP forwarding
- Generate an SSH keys pair and set a private key as an input param
- Set the GitHub Action SA to have authorized ssh connection to the VMs
- Implement the `google-github-actions/ssh-compute` action to connect
mergify bot pushed a commit that referenced this pull request Oct 5, 2022
* refactor(ssh): connect using `ssh-compute` action by Google

Previous behavior:
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`

This was still happening after implementing #5292

Excpected behavior:
Ensure we're not creating SSH key pairs on the fly to improve our connections
guarantees

Solution:
- Enable the Cloud Identity-Aware Proxy API in GCP
- Create a firewall rule to enable connections from IAP
- Grant the required IAM permissions to enable IAP TCP forwarding
- Generate an SSH keys pair and set a private key as an input param
- Set the GitHub Action SA to have authorized ssh connection to the VMs
- Implement the `google-github-actions/ssh-compute` action to connect

* fix(ssh): id `compute-ssh` cannot be used more than once within the same scope

* fix(ci): try to enclose commands to override parsing issues

* tmp: remove ssh_args

* fix(action): secrets must be inherited to be used

* tmp: validate command enclosing fixes executin

* fix(ssh): ssh_args are not implemented correctly

* fix(ssh): login with the root user

* fix(privelege): uso sudo with docker commands

* tmp: add sudo

* fix(ssh): use sudo for all docker commands

* fix(ssh): add missing `sudo` commands

* fix(ssh): get sync height from ssh stdout

* fix(height): get the height correctly
@teor2345 teor2345 mentioned this pull request Oct 11, 2022
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants