Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of Process to Upgrade AMI for AWS Deployment #7680

Merged
merged 7 commits into from
Jan 27, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,40 @@ While running the restore command, If it prompts any error follow the steps give
- Also check the hab svc status in automate node by running `hab svc status`.
- If the deployment services is not healthy then reload it using `hab svc load chef/deployment-service`.
- Now, check the status of Automate node and then try running the restore command from bastion.

For **Disaster Recovery or AMI upgarde**, while running the restore in secondary cluster which is in different region follow the steps given below.

- Make a curl request in any opensearch node`curl -XGET https://localhost:9200/_snapshot?pretty --cacert /hab/svc/automate-ha-opensearch/config/certificates/root-ca.pem --key /hab/svc/automate-ha-opensearch/config/certificates/admin-key.pem --cert /hab/svc/automate-ha-opensearch/config/certificates/admin.pem -k`
- check the curl request response if the region is not matching with the primary cluster follow the below steps:
1. Modify the region in FrontEnd nodes by patching the below configs with command, `chef-automate config patch <file-name>.toml --fe`

```cmd
[global.v1.external.opensearch.backup.s3.settings]
region = "<FIRST-CLUSTER-REGION>"
```

2. Make a PUT request in an Opensearch node by running this script:

```cmd
indices=(
chef-automate-es6-automate-cs-oc-erchef
chef-automate-es6-compliance-service
chef-automate-es6-event-feed-service
chef-automate-es6-ingest-service
)
for index in ${indices[@]}; do
curl -XPUT -k -H 'Content-Type: application/json' https://<IP>:9200/_snapshot/$index --data-binary @- << EOF
{
"type" : "s3",
"settings" : {
"bucket" : "<YOUR-PRIMARY-CLUSTER-BUCKET-NAME>",
"base_path" : "elasticsearch/automate-elasticsearch-data/$index",
"region" : "<YOUR-PRIMARY-CLUSTER-REGION>",
"role_arn" : " ",
"compress" : "false"
}
}
EOF
done

```
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ When the primary cluster fails, accomplish the failover by updating DNS records
sudo chef-automate backup restore <backup-url-to-object-storage>/automate/$id/ --patch-config /path/to/current_config.toml --airgap-bundle /var/tmp/frontend-4.x.y.aib --skip-preflight --s3-access-key "Access_Key" --s3-secret-key "Secret_Key"
```

If the restore is unsuccessful check [Troubleshooting](/automate/ha_backup_restore_aws_s3/#troubleshooting)
### Switch to Disaster Recovery Cluster

Steps to switch to the disaster recovery cluster are as follows:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,92 @@ We can also pass a flag in upgade command to avoid prompt for workspace upgrade.
chef-automate upgrade run --airgap-bundle latest.aib --auto-approve --workspace-upgrade no
```

{{< note >}}

AMI Upgrade is only for AWS deployment, as in On-Premise Deployment all the resources are managed by the customers themselves.

{{< /note >}}

## AMI Upgrade Setup For AWS Deployment

{{< note >}}

In this following section, the old cluster with older AMI Images is referred as the **Primary Cluster** and the cluster which has upgraded AMI is referred as the **New Cluster**.

{{< /note >}}

{{< note >}}

The AWS deployment should be configured with S3, Both Primary and New cluster should be configured with same s3 bucket.

{{< /note >}}

### Steps to set up the AMI Upgraded Cluster

1. Deploy the New cluster into a same/different region with S3 backup configuration.you can refer [AWS Deployment steps](/automate/ha_aws_deploy_steps/#deployment).

2. Do the backup configuration only when you have not provided the (backup information) configuration at the time of deployment. Refer backup section for [s3 configuration](/automate/ha_backup_restore_aws_s3/#configuration-in-provision-host).

3. On Primary Cluster

- Take a backup of Primary cluster from bastion by running below command:

```sh
chef-automate backup create --no-progress > /var/log/automate-backups.log
```

- Create a bootstrap bundle; this bundle captures any local credentials or secrets that aren't persisted to the database. To create the bootstrap bundle, run the following command in one of the Automate nodes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Create a bootstrap bundle; this bundle captures any local credentials or secrets that aren't persisted to the database. To create the bootstrap bundle, run the following command in one of the Automate nodes:
- Create a bootstrap bundle, this bundle captures any local credentials or secrets that aren't persisted in the database. To create the bootstrap bundle, run the following command in one of the Automate nodes:


```sh
chef-automate bootstrap bundle create bootstrap.abb
```

- Copy `bootstrap.abb` to all Automate and Chef Infra frontend nodes in the New cluster.


4. On New AMI upgraded Cluster

- Install `bootstrap.abb` on all the Frontend nodes (Chef-server and Automate nodes) by running the following command:

```cmd
sudo chef-automate bootstrap bundle unpack bootstrap.abb
```

- Run the following command in bastion to get the ID of the backups:

```sh
chef-automate backup list
```

- Make sure all the services in New cluster are up and running by running the following command from bastion:

```sh
chef-automate status
```

- On New Cluster Trigger restore command from bastion.

- To run the restore command, you need to add the OpenSearch credentials to the applied config. If using Chef Managed OpenSearch,we need to have automate config.Run the below command in the Chef-Automate node to get the applied config into `current_config.toml`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- To run the restore command, you need to add the OpenSearch credentials to the applied config. If using Chef Managed OpenSearch,we need to have automate config.Run the below command in the Chef-Automate node to get the applied config into `current_config.toml`:
- For Chef Managed OpenSearch follow the below steps:


```bash
sudo chef-automate config show > current_config.toml
```

- Add the below config into `current_config.toml` (without any changes) and copy `current_config.toml` to bastion

```bash
[global.v1.external.opensearch.auth.basic_auth]
username = "admin"
password = "admin"
```

- On New cluster, use the following restore command to restore the backup of Primary Cluster from bastion.

```cmd
sudo chef-automate backup restore s3://<s3-bucket-name>/<path-to-backup>/<backup-id>/ --patch-config /path/to/current_config.toml --airgap-bundle /path/to/airgap-bundle --skip-preflight --s3-access-key "Access_Key" --s3-secret-key "Secret_Key"

```

- If you want to reuse the same custom domain used previously, then make sure to update the DNS entry to the Load-Balancer FQDN of the New cluster.
Copy link
Collaborator

@vivekshankar1 vivekshankar1 Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- If you want to reuse the same custom domain used previously, then make sure to update the DNS entry to the Load-Balancer FQDN of the New cluster.
- If you want to reuse the same custom domain used previously, update your DNS record to point to the Load-Balancer of the new cluster.


- Once the restore is successful ,you can destroy the Primary Cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Once the restore is successful ,you can destroy the Primary Cluster.
- Once the restore is successful you can destroy the Primary Cluster.