Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved production -> staging database replication #1874

Closed
2 tasks
AetherUnbound opened this issue Apr 22, 2023 · 1 comment
Closed
2 tasks

Improved production -> staging database replication #1874

AetherUnbound opened this issue Apr 22, 2023 · 1 comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧭 project: thread An issue used to track a project and its progress skip-changelog 🧱 stack: api Related to the Django API
Projects

Comments

@AetherUnbound
Copy link
Contributor

AetherUnbound commented Apr 22, 2023

Start Date ETA Project Lead Actual Ship Date
TBD TBD TBD TBD

Description

#1154 describes a mechanism for updating staging by restoring snapshots of production. In the discussion for that issue, a few alternative proposals are brought forward (from @zackkrida's comment):

Name Pros Cons
1. Foreign Data Wrapper to point staging media tables to production - Access to latest data in real-time
- Minimal code required
- Can downsize staging DB
- Simple to secure read-only connection
- Increase in reads to production DB
2. Postgres' "logical replication" feature to keep production and staging media tables in sync - Near-realtime updates automatically
- Staging DB has its own media records which can be deleted and modified
- Limited experience with feature
- Increase in load on production DB
3. Foreign Data Wrapper to insert all records from production media tables into staging - Staging DB has its own media records which can be deleted and modified - Slower process
- Scheduled updates
- May duplicate data unnecessarily
  1. Use a foreign data wrapper to point staging media tables to their production counterparts. This means we read all media data directly from the production db without storing it in the staging db. It also means that staging always has access to the latest data in realtime. This also doesn't require us to write much code, at all. From a security perspective it's also quite simple to make sure this is a read-only connection that can't modify the production DB. The only thing to consider here is that we'd be increasing reads to the production DB but I think that's a minor concern. We can also dramatically downsize the staging DB with this approach.
  2. Use Postgres' "logical replication" feature keep the production and staging media tables in sync. I don't have experience with this feature, but it allows for Postgres to automatically snyc changes from a table in one DB to another using a pub/sub model. Basically, in the production DB we CREATE PUBLICATION pub_name FOR TABLE images; and are able to subscribe to this in the staging DB. The only benefit here is that we get near-realtime updates automatically and the staging DB has its own media records which can be deleted and modified.
  3. We could use a foreign data wrapper to insert all records from the production media tables into the staging media tables. We could do this incrementally, based on the created_at date of the newest records in the staging media tables, or we could do something more like the data refresh where we copy over all records. This would be slower, and scheduled, but the staging DB has its own media records which can be deleted and modified.

Some aspects to consider when planning this project:

  • Handling migration differences between staging and production
  • Impact on the production database
  • Per-table policies/refresh rates
  • Handling conflicting data

Documents

  • Project Proposal
  • Implementation Plan

Issues

Prior Art

#1154

@AetherUnbound AetherUnbound added skip-changelog ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🟩 priority: low Low priority and doesn't need to be rushed 🧭 project: thread An issue used to track a project and its progress 🧱 stack: api Related to the Django API labels Apr 22, 2023
@openverse-bot openverse-bot added this to Backlog in Openverse Apr 22, 2023
@AetherUnbound
Copy link
Contributor Author

This project was not completed in 2023 and we are closing it in favor of prioritizing other work. Closing this issue.

@AetherUnbound AetherUnbound closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧭 project: thread An issue used to track a project and its progress skip-changelog 🧱 stack: api Related to the Django API
Projects
No open projects
Openverse
  
Backlog
Development

No branches or pull requests

1 participant