Skip to content

Conversation

@SofiaSazonova
Copy link
Contributor

@SofiaSazonova SofiaSazonova commented Feb 2, 2025

Feature or Bugfix

  • Feature

Detail

  • Depending on CDK parameter CodeBuild project is created to copy data from one AuroraCluster to Another
  • Can be used for Aurora upgrade or emergency restore
  • To deploy, add to cdk.json to DeploymentEnvironments section
"aurora_migration_enabled": true,
"old_aurora_connection_secret_arn": ...arn...

Copy source DB connection secret before deploing! That's going to be old_aurora_connection_secret_arn

Upgrade Recommendations

Once deployed it will create a new v2 cluster with an empty database and a CodeBuild job that when it runs (manual trigger) will copy the data from the old cluster to the new one. Using the CodeBuild job is provided as a helper and is not mandatory to be used. Users can opt to perform the copy using their own scripts.

Suggested steps

  1. Communicate with stakeholders a downtime of data.all platform
  2. If you want to use the provided CodeBuild migration job you must set the following in your cdk.json
    1. Upon upgrade the db secret will be ovewritten hence you need to make a backup and provide this arn to the config
  3. Put data.all into maintenance mode, login as Admin, go to Admin Settings → Maintenance → Pick “NO-ACCESS” → Click “Start Maintenance”
  4. If previously you used the potgres version != 13, change the code of the codebuild job: change the line 38 of deploy/stacks/aurora_migration_task.py
  5. Push the changes in your code repo
  6. Wait for the backend deployment to complete
  7. Copy from the old Aurora cluster to the new
    1. If using the CodeBuild job from step 2 locate it and start the build. All the requiredinputs will be prepopulated
  8. Validate that data.all have all the data, is operating properly and the pipeline is green
  9. Stop “maintenance mode”
  10. If CodeBuild job was used you can now remove the relevant configs from your cdk.json to delete it.

IMPORTANT!

  1. Do not run the pipeline right away. First, try to create CodeBuild job and v2 cluster manually to test. Once you eliminate v1 cluster from your pipeline, you won't be able to create another v1 cluster again.
  2. Do not delete old cluster until you are completely sure.

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@SofiaSazonova
Copy link
Contributor Author

'/usr/lib/postgresql/$SRC_PGVER/bin/pg_dump -x -Fc -v > db.dump',
'export PGHOST=$TGT_HOST PGPORT=$TGT_PORT PGUSER=$TGT_USER PGPASSWORD=$TGT_PWD',
'/usr/lib/postgresql/$TGT_PGVER/bin/pg_isready',
'/usr/lib/postgresql/$TGT_PGVER/bin/pg_restore -v -x -O -C -c -d postgres db.dump',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to pg_restore docs

  • -c clean will DROP objects that will be restored
  • -C will create the database (not sure what happens if it already exists)
  • The combination of two flags will DROP and RECREATE the entire db

I am a bit concerned that customers might rerun this job after they have migrated and the two dbs have diverged something that will effectively restore the target db to the point in time of the old db.

Perhaps I am overthinking it but is there anything we can do to prevent this?

I think we need the -c because db_migrations trigger func will run after the deployment of the new cluster and will "pollute" the database. But maybe we don't need the -C because you already create the default database in CDK. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new cluster is created with DB in it and also, smth (e.g. permissions) is already there. So, it's easier and safer to drop everything and restore. It works the second time as well.
This code will give us a copy of old DB in new cluster. Thats' exactly what we need.

It's not a 'normal' thing to do, so I think that's the best option.

@dlpzx dlpzx added this to v2.8.0 Mar 6, 2025
@dlpzx dlpzx moved this to Review in progress in v2.8.0 Mar 6, 2025
@petrkalos petrkalos merged commit 37467f4 into data-dot-all:main Jun 2, 2025
10 checks passed
@petrkalos petrkalos moved this from Review in progress to Done in v2.8.0 Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants