Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Add documentation describing the data migration process we should follow #1030

Closed
1 task
sarayourfriend opened this issue Dec 5, 2022 · 0 comments · Fixed by #1082
Closed
1 task

Add documentation describing the data migration process we should follow #1030

sarayourfriend opened this issue Dec 5, 2022 · 0 comments · Fixed by #1082
Assignees
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed
Projects

Comments

@sarayourfriend
Copy link
Contributor

Problem

As part of the ECS-ification of the Django service, we are moving towards an automated database migration handling approach. Under this new approach, migrations will be automatically applied when a new version of the Django application is deployed. In order to avoid deployments that take hours, we must not rely on Django database migrations for data migrations: that is, we cannot rely on SQL to transform the data in the database. In addition to creating migrations that could last hours (depending on the contents of them), we also want to avoid creating additional database load.

Description

En lieu of using Django migrations to transform the data in the database, we will instead follow a data migration strategy that relies on Django management commands to programmatically transform the data. This has several benefits (some repeated from above):

  1. Can be throttled to prevent overwhelming database load;
  2. Encourages zero-downtime deployment planning;
  3. The transformation can be unit tested including more easily testing data edge cases that might be easy to forget about (and even harder to handle) in a regular SQL data migration;
  4. Prevents deployments from ever going longer than a few minutes because we avoid all long-running migrations.

We need to document this process in the Sphinx documentation and spread the word about this to the Openverse contributors. If possible, it would be nice to even put a linting check that verifies that we are not introducing migrations that include data transformations.

Alternatives

Additional context

Please refer to https://github.com/WordPress/openverse-infrastructure/issues/176 for the original discussion motivating this change. The repository is private and if you do not have access but would like to see the issue, ping a core contributor, and they can share the discussion with you.

Implementation

  • 🙋 I would be interested in implementing this feature.
@sarayourfriend sarayourfriend added 🟩 priority: low Low priority and doesn't need to be rushed 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository labels Dec 5, 2022
@openverse-bot openverse-bot added this to Backlog in Openverse Dec 5, 2022
@sarayourfriend sarayourfriend self-assigned this Jan 13, 2023
@sarayourfriend sarayourfriend moved this from Backlog to In progress in Openverse Jan 13, 2023
Openverse automation moved this from In progress to Done! Feb 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed
Projects
No open projects
Openverse
  
Done!
Development

Successfully merging a pull request may close this issue.

1 participant