-
Notifications
You must be signed in to change notification settings - Fork 0
Disaster recovery plan
Martin Remmelgas edited this page Jun 16, 2026
·
1 revision
- 8h (based on the largest SLA)
- 4 hours for users and teams data, application configurations.
- 7 days for build history.
Major goals of a disaster recovery plan
Hardware and Software Inventory
information services backup overview
Database: Backup and Restore policy
International escalations procedure
This plan assumes all MongoDB nodes are unavailable and describes replacing the cluster with restoring data from backup files.
- Checklist before start
- GCP
us-east1-bzone operational as expected - Granted access to download latest backup files
- GCP
- Start new instances for the new cluster using the same instance type and MongoDB version as existing cluster.
- Provisioning new nodes to setup monitoring (see https://github.com/codemagic-dev/ansible/blob/main/setup_grafana_monitoring.yml)
- Ensure MongoDB connected to the new cluster.
- Download latest backup files to the master host.
- Restore the files in the following order:
- backup file with
allprefix -
applicationsfile -
teamsfile -
usersfile -
audit_logfile
- backup file with
- Run the cluster and ensure data is available in
appandvmmdatabases. - Update DNS settings to point to new IP-addresses.
- Restart backend and worker services and monitor logs that MongoDB connection established successfully.
- The test should be conducted using standalone MongoDB configuration.
- The test should be conducted using temporary VPC with default firewall settings prohibited any outside connection using MongoDB ports.
- The test doesn’t include steps related to production environment, like: configure monitoring (step 3), DNS update (step 8), and restart production services (step 9).
- The hosts and VPC should be deleted after the test results are recorded in this document.
- Disk requirements: ~100GB available disk space