Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Add zero-downtime deployments & data transformations guide #1082

Merged
merged 6 commits into from
Feb 8, 2023

Conversation

sarayourfriend
Copy link
Contributor

Fixes

Fixes #1030 by @sarayourfriend

Description

I am still working on this and there are significant sections and details still missing that I want to add before undrafting this. I'll update the PR description when I undraft the PR.

Testing Instructions

Checklist

  • My pull request has a descriptive title (not a vague title like
    Update index.md).
  • My pull request targets the default branch of the repository (main) or
    a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible
    errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@openverse-bot openverse-bot added this to In progress in Openverse PRs Jan 13, 2023
@openverse-bot openverse-bot added 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository 🟩 priority: low Low priority and doesn't need to be rushed labels Jan 13, 2023
@github-actions
Copy link

github-actions bot commented Jan 13, 2023

API Developer Docs Preview: Ready

https://wordpress.github.io/openverse-api/_preview/1082

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite eye-opening. Thank you for starting to write this, Sara. The way of how zero-downtime deployments work makes so much sense after reading it! (I was a bit worried at the begging of the API ECS-ification project for this part TBH). I wonder if you are reflecting on it from pure previous experience or if you know more documentation and resources on this topic. I'd love to read more.

Comment on lines 59 to 67

## Django management command data migrations

### Why use management commands for data migrations instead of SQL?

Django comes with a data migration feature built in that allows executing data transformations using SQL. If you want to move data between two columns, it is trivial to do so with SQL and Django makes it just as easy. [Documentation for this Django feature is available here](https://docs.djangoproject.com/en/4.1/topics/migrations/#data-migrations).

When considering the potential issues with using SQL data migrations with our current deployment strategy, keep in mind the following details:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a doubt about this section. The title and subtitle make it sound like Django management commands are the solution to the problems described here, but I'm not sure how that is the case. After all, DB-related commands are mostly plain SQL under the hood.
I can see the Django commands as an easier way to handle migrations automatically for our future improved CI/CD pipeline but how are they supposed to solve the problems of time?

Copy link
Contributor Author

@sarayourfriend sarayourfriend Jan 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to find a way to clarify the reasons why to use management commands at the start, in a clearer and more concise way. I'll summarise it here though, and you can let me know if there are still doubts that need to be clarified and addressed.

The reason why management commands solve the issue at hand (long-running data migrations), isn't because they make data migrations shorter or not use SQL. Of course, as you said, they still use SQL as they'll still be manipulating the database via the same tools that a migration file would. The difference is that a management command allows us to break up the SQL commands to iterate over the set of data that needs to be transformed and critically not at application startup. Additionally, they have the following features:

  1. They can be stopped and started to allow for other more critical operations to take precedence or to recover from unexpected errors
  2. They can be metered to prevent creating significant read/write lag that would hurt the overall application performance

Both of these might be doable in pure SQL (maybe through a stored procedure, for example), but it's more effort, harder to test and have standardised tools, and I don't think we have great tools at the moment for creating or maintaining stored procedures in the API anyway.

Management commands solve these problems in a familiar setting, with testing tools that we already know how to use and for which we have invested time into building utilities like data fixtures and helpers.

The main benefit, though, is being able to break up the transformation into smaller, iterable chunks, that can be executed outside of application startup time (which we can't do with Django migrations without making significant complications to our deployment workflow).

@sarayourfriend
Copy link
Contributor Author

This is based on previous experience. I actually haven't read any additional documents describing this process for Django, but the overall difficulty with database schema changes is a well known issue with automated zero-downtime deployments. There are lots of resources online discussing the issue and describing scenarios similar to the column name change example I share in the document.

Doing a quick search, I can't find anyone describing the process here using management commands. The only tool that Django has for dealing with part of this issue (data loss due to an unexpected error during a long-running data transformation) is to set migrations to be non-atomic: https://docs.djangoproject.com/en/4.1/howto/writing-migrations/#non-atomic-migrations

Here's a guide about zero-downtime migrations in Django, but focused exclusively on adding/removing tables and columns, rather than things like long-running data transformations that you can put into Django SQL migrations: https://gist.github.com/majackson/493c3d6d4476914ca9da63f84247407b

It's a useful resource still though, as it gives a good list of steps for each of some very common situations, so I'll include a link to it as well.

@sarayourfriend sarayourfriend changed the title Add data migration guide Add zero-downtime deployments & data transformations guide Jan 30, 2023
@sarayourfriend
Copy link
Contributor Author

@krysal In response to your questions, I realised that it might make the document make more sense (motivations wise) to reframe it as a general document about zero-downtime deployments with data transformations as a special case. I changed the language to more clearly distinguish between a "Django migration based data transformation" and a "management command based data transformation", primarily by switching to use "data transformation" rather than "data migration" as the generic term. Hopefully this helps clarify what was already there, but I am undrafting now and am eager to hear further thoughts on whether the suggested guidelines for data transformations make more sense and what I need to further clarify (or scrap entirely 😅)

@sarayourfriend sarayourfriend marked this pull request as ready for review January 30, 2023 03:46
@sarayourfriend sarayourfriend requested a review from a team as a code owner January 30, 2023 03:46
@openverse-bot openverse-bot moved this from In progress to Needs review in Openverse PRs Jan 30, 2023
Copy link
Contributor

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is seriously excellent! Thanks for making it a bit more generic too 🙂 I think one other piece that could be helpful here is an example or template Django management command which performs a data transformation as described in the document. I don't think it necessarily needs to be part of this PR, but I know for myself it'd be a useful reference when making the data transformations (especially if we end up deleting the code once the transformation is run). Thanks also for your example regarding column renaming, it feels appropriate particularly for #719 😅

api/docs/guides/zero-downtime-database-management.md Outdated Show resolved Hide resolved
api/docs/guides/zero-downtime-database-management.md Outdated Show resolved Hide resolved
@sarayourfriend
Copy link
Contributor Author

I think one other piece that could be helpful here is an example or template Django management command which performs a data transformation as described in the document.

That's a great idea and something I'd love to do. I think there are some clever ways to make some generic tools or a base class that gives the outline.

But also, yes, I think it's sufficiently complex work to have as a separate issue, if that is okay with other reviewers.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it carefully, and I must reiterate, this is an excellent piece of documentation! ⭐️ Thanks a lot for writing it. It makes clear the need for Django management commands for data migrations and an outline of how to do it.

I think it may include these two benefits you mentioned in the issue description too:

  • Can be throttled to prevent overwhelming database load;
  • The transformation can be unit tested including more easily testing data edge cases that might be easy to forget about (and even harder to handle) in a regular SQL data migration;

Also, +1 for an example or template with throttling, but this may wait even for when the need arises.

Comment on lines 124 to 126
1. Once the data transformation is complete, deploy a new version of the
application that removes the old column and the fallback reads to it and only
uses the new column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's pertinent to add this note here or in a new step.

  1. Once the data transformation is complete, deploy a new version of the
    application that removes the old column and the fallback reads to it and only
    uses the new column. Also, add the corresponding constraints for the said column if required, e.g. non-nullable, default value, etc.

Comment on lines +209 to +210
- Migrations are run _at the time of deployment_ by the first instance of the
new version of the application that runs in the pool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe link the code here? I know it can drift in the future, but it will help to know that this can be configured as well.

Suggested change
- Migrations are run _at the time of deployment_ by the first instance of the
new version of the application that runs in the pool.
- Migrations are run _at the time of deployment_ by the first instance of the
new version of the application that runs in the pool (when is configured for that with the [`DJANGO_MIGRATE_DB_ON_STARTUP`](https://github.com/WordPress/openverse-api/blob/main/api/env.docker#L15) variable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be the case once ECS is deployed, so maybe I'll make a note about that instead. The variable would be set in our infrastructure anyway, which we can't link to from this document, so just linking to the variable doesn't give a ton of new information to the reader, but it's probably easier to figure out if we've transitioned to the ECS deployment yet.

Openverse PRs automation moved this from Needs review to Reviewer approved Feb 8, 2023
@sarayourfriend
Copy link
Contributor Author

Thanks @krysal. I added the two additional benefits to the document, not sure why I forgot to include those, but they are important ones. I also added a clarification about the "automatic" migration running that should be more timeless than either of our initial suggestions. At some point in the future we can remove it, but the information will never be inaccurate (unless we stop zero-downtime deployments 😅)

@sarayourfriend sarayourfriend merged commit 2450c38 into main Feb 8, 2023
Openverse PRs automation moved this from Reviewer approved to Merged! Feb 8, 2023
@sarayourfriend sarayourfriend deleted the add/data-migration-documentation branch February 8, 2023 19:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟩 priority: low Low priority and doesn't need to be rushed
Projects
No open projects
Openverse PRs
  
Merged!
Development

Successfully merging this pull request may close these issues.

Add documentation describing the data migration process we should follow
4 participants