New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Live data migrations #1086

Closed
Diggsey opened this Issue Aug 10, 2017 · 5 comments

Comments

Projects
None yet
3 participants
@Diggsey
Contributor

Diggsey commented Aug 10, 2017

It would be useful to have a way to do live data migrations with diesel.

Here's how I would like to do these migrations:

  1. Have a version number column on tables that need migrating
  2. Deploy new version of service, which can read both row versions, but only writes new versions
  3. Run data migration, to convert all remaining old versions to the new version

This means the service code only needs to deal with the most recent two versions, with legacy versions only needing to be handled in the migration files themselves.

However, this doesn't work currently for two reasons:

  1. Typically you would run structural migrations before deploying the new service, on the basis that the old service can continue working, ignoring any new columns, whereas data migrations need to run afterwards.
  2. The migration runs in a transaction, so a data migration (which may take some time to run) will either block the service or cause it to deadlock, depending on the isolation level.

For the data migration to be unobtrusive, it needs to update rows in relatively small batches, with one transaction per batch, so that it doesn't cause too many conflicts with other applications accessing the database.

@killercup

This comment has been minimized.

Member

killercup commented Aug 14, 2017

This seems like a good idea, but I'm not sure how actionable that is for Diesel?

Your 3 steps can be implemented in application code, right? I guess we could in theory offer helpers for that, but those might also work as an external crate.

Or, asked in another way: What needs to change to make this easier/possible?

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Aug 14, 2017

I think that's fine for it to be handled in application code, or via a library, but this was more around having a story for that: maybe that means implementing a demo application that shows how one might achieve that, and endorsing that particular technique.

The reason for this is that when choosing a technology stack, it's important to have confidence that these kind of things are doable, even if it requires additional work.

Here are some difficulties that I ran into when attempting to do the above:

  1. I couldn't find out if it's possible to access a different column when deserializing a field, which would be required to read the version number.
  2. The migration infrastructure provided by diesel is not extensible in a way that would allow including data migrations.

In light of these difficulties, I changed my strategy:

  • Only support data migrations on JSON fields (there are alternative approaches for simpler columns)
  • Store the version number in JSON fields
  • Handle the data migration as part of the deserialization step, and require the codebase to handle all historical versions
  • Either never bother eagerly migrating all of the older rows, or implementing an application-level post-deploy hook which fetches and migrates all of the rows
  • Use a macro to prevent the upgrade logic from growing out of control as the number of versions increases

You can see my experiments here: https://github.com/Diggsey/rust-crud-example

@killercup

This comment has been minimized.

Member

killercup commented Aug 14, 2017

I have not thought about it long enough to give a qualified answer, but…

Quick thought: How about having a view for each table version? Like,

CREATE VIEW baskets_v1 AS
    SELECT *
    FROM baskets
    WHERE version = 1;

and putting that in a

mod schema {
    mod v1 {
        table! { baskets… }
    }
}

At least in postgres, simple views like this allow INSERT/UPDATE/…

The migration infrastructure provided by diesel is not extensible in a way that would allow including data migrations.

Huh? We allow arbitrary SQL? With embed_migrations! you can include the migration process in your app.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Aug 14, 2017

Quick thought: How about having a view for each table version? Like,

That's an interesting idea, but I'm not sure how it would work in practice. If my data migration is "add 10 to value", there's no way postgres could figure out that values inserted into the old view should be modified accordingly.

Huh? We allow arbitrary SQL? With embed_migrations! you can include the migration process in your app.

Each migration is run in a transaction, which is unsuitable for large data migrations. That said, I do like the simplicity of diesel's migrations system, which avoids certain problems that eg. alembic has, by being pure SQL. You could embed the migration process in your app, although I'm not sure how you would keep structural and data migrations in the same history.

@sgrif

This comment has been minimized.

Member

sgrif commented Dec 16, 2017

I think I should clarify what the intended role of Diesel's migrations are here. The term "structural" vs "data" migrations has been thrown around, which is fine. Diesel's migrations are really only meant for "structural" migrations. Things like UPDATE should only be getting run if it's in service of some structure change (e.g. denormalization) and can be reversed. If it's just "oops all values in this column should be lowercased", Diesel migrations are a poor tool.

However, that distinction has little to do with the issue that seems to be at hand here, which is zero downtime deploys. For backwards compatible changes (adding tables or columns), there's not much to do here. You run your migrations, then you deploy code. For destructive changes, the process with Diesel is the same as anything else:

  • Deploy code which works with both the new and old schema
  • Deploy the schema change
  • Deploy code which only works with the new schema

However, it sounds like you're also asking about things like how to deploy migrations which would require a significant amount of down time. I think that is something which is out of scope for Diesel. There are existing tools (e.g. https://github.com/shopify/lhm) which handle this. Perhaps one will be written in Rust eventually, but I don't think the language that tools like that are written in is relevant here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment