Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: run all migrations in a transaction #10966

Merged
merged 9 commits into from Dec 1, 2023

Conversation

coadler
Copy link
Member

@coadler coadler commented Nov 30, 2023

Updates https://github.com/coder/customers/issues/365

This PR updates our migration framework to run all migrations in a single transaction. This is the same behavior we had in v1 and ensures that failed migrations don't bring the whole deployment down. If a migration fails now, it will automatically be rolled back to the previous version, allowing the deployment to continue functioning.

@coadler
Copy link
Member Author

coadler commented Nov 30, 2023

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@coadler coadler force-pushed the colin/featrunallmigrationsinatransaction branch from 5a5be7b to 8c20d9d Compare November 30, 2023 22:33
@coadler coadler marked this pull request as ready for review December 1, 2023 05:48
@@ -1 +1,22 @@
ALTER TYPE login_type ADD VALUE IF NOT EXISTS 'token';
-- ALTER TYPE login_type ADD VALUE IF NOT EXISTS 'token';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment explaining why we need to do the below now instead

ALTER TYPE resource_type ADD VALUE IF NOT EXISTS 'group';

COMMIT;
ALTER TYPE resource_type ADD VALUE IF NOT EXISTS 'group';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also do the new table + rename dance here as well?
Seems safer to just always do it that way instead of someone suddenly adding a new migration referencing this enum type and wondering why their new migration doesn't work.

Copy link
Member Author

@coadler coadler Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only went back historically and fixed usage of ADD VALUE if we reference the newly added value later in a different migration. It is otherwise fine to use within a transaction.

Edit: Ah sorry I didn't fully read. I see what you mean but I wasn't sure if it was worth going back to every type. It was really a pain to try and figure out for each enum what the exact values were at the time of the migration. I don't really mind following up though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix all of them in a follow up 👍

Comment on lines 1 to 3
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'autolock';
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'failedstop';
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'autodelete';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new table + rename dance here too

@@ -1,5 +1,3 @@
BEGIN;

-- Add "exectrace" to workspace_agent_subsystem type.
ALTER TYPE workspace_agent_subsystem ADD VALUE 'exectrace';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new table + rename

DROP TYPE display_app;
COMMIT;

DROP TYPE display_app;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does dropping a type also potentially have the same issue as adding a type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's dropping the entire type it doesn't have the same consequences as ADD VALUE.

@mafredri
Copy link
Member

mafredri commented Dec 1, 2023

It would be pretty nice if we could code-gen the in-transaction type modifications. Code-gen would essentially apply migrations without transactions, and when there's a type-change, it generates code for the new type/rename dance on all tables using the type. It might be somewhat nice to implement our own custom SQL here as well for managing types (ALTER TYPE .. DROP VALUE), a bit strange, granted 😂.

Are types/enums the only issue we may run into in migrations running under a single tx?

Comment on lines +13 to +14
safer, but means that 'ALTER TYPE resource_type ADD VALUE' cannot be used if the
enum value needs to be referenced in another migration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document how to work around this limitation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation alone is not sufficient IMO (hence suggesting code-gen). This is the type of issue that might surface a week, month, or year later when a migration using the earlier type is added. And only some customers will hit it (those that rarely update).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really a bug that customers will run into, it'll always be caught by tests. The issue only pops up when you attempt to use an enum value added by ADD VALUE in a later migration. Otherwise, these are fine to use.

Copy link
Member

@mafredri mafredri Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can always catch this as it is now, (e.g. without codegen, linting, etc). One scenario I can think of is:

  1. Customer A and B run v2.0.0
  2. We add a value (foo(bar)) in v2.0.1, but we forget the type dance because there's nothing to catch it when the value is unused
  3. Customer B updates to v2.0.1
  4. We release v2.0.2 and add a migration that modifies rows and change foo(baz) => foo(bar), we don't catch it because we don't have fixtures for everything and the relevant tables are empty
  5. Customer A and B update to v2.0.2
  6. Customer A runs into an error

Copy link
Member Author

@coadler coadler Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration tests will catch this since it doesn't matter if the tables are empty. Any reference to the new enum will fail. I ran into this exact scenario while writing this.

https://github.com/coder/coder/blob/main/coderd/database/migrations/000103_add_apikey_name.up.sql does an UPDATE using the login_type(token) enum added using ADD VALUE in an earlier migration. The tests are able to catch this without fixtures since it doesn't matter if any rows are in the table, it still has to convert 'token' to the proper enum which causes the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, that's good then. I wasn't aware that postgres would catch it for an empty table. Thanks for clarifying. 👍🏻

@coadler
Copy link
Member Author

coadler commented Dec 1, 2023

@mafredri

It would be pretty nice if we could code-gen the in-transaction type modifications. Code-gen would essentially apply migrations without transactions, and when there's a type-change, it generates code for the new type/rename dance on all tables using the type. It might be somewhat nice to implement our own custom SQL here as well for managing types (ALTER TYPE .. DROP VALUE), a bit strange, granted 😂.

I think it would be cool, but I'm not sure entirely worthwhile. The whole DROP TYPE & CREATE TYPE dance is pretty copy-pastable and doesn't take more than a few minutes. Given how infrequently these happen I think this is a fine trade-off.

Are types/enums the only issue we may run into in migrations running under a single tx?

In terms of Postgres limitations, yes. They only cause an issue if you ADD VALUE and then try to reference that same value in a later migration. There are other limitations though such as making two temp tables with the same name in different migrations, but those are more semantics.

@coadler coadler force-pushed the colin/featrunallmigrationsinatransaction branch from 423df44 to 3ca5ebf Compare December 1, 2023 21:32
@coadler coadler merged commit 8e684c8 into main Dec 1, 2023
22 checks passed
@coadler coadler deleted the colin/featrunallmigrationsinatransaction branch December 1, 2023 22:11
@github-actions github-actions bot locked and limited conversation to collaborators Dec 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants