New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: run all migrations in a transaction #10966
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
5a5be7b
to
8c20d9d
Compare
@@ -1 +1,22 @@ | |||
ALTER TYPE login_type ADD VALUE IF NOT EXISTS 'token'; | |||
-- ALTER TYPE login_type ADD VALUE IF NOT EXISTS 'token'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment explaining why we need to do the below now instead
ALTER TYPE resource_type ADD VALUE IF NOT EXISTS 'group'; | ||
|
||
COMMIT; | ||
ALTER TYPE resource_type ADD VALUE IF NOT EXISTS 'group'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do the new table + rename dance here as well?
Seems safer to just always do it that way instead of someone suddenly adding a new migration referencing this enum type and wondering why their new migration doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only went back historically and fixed usage of ADD VALUE
if we reference the newly added value later in a different migration. It is otherwise fine to use within a transaction.
Edit: Ah sorry I didn't fully read. I see what you mean but I wasn't sure if it was worth going back to every type. It was really a pain to try and figure out for each enum what the exact values were at the time of the migration. I don't really mind following up though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix all of them in a follow up 👍
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'autolock'; | ||
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'failedstop'; | ||
ALTER TYPE build_reason ADD VALUE IF NOT EXISTS 'autodelete'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new table + rename dance here too
@@ -1,5 +1,3 @@ | |||
BEGIN; | |||
|
|||
-- Add "exectrace" to workspace_agent_subsystem type. | |||
ALTER TYPE workspace_agent_subsystem ADD VALUE 'exectrace'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new table + rename
DROP TYPE display_app; | ||
COMMIT; | ||
|
||
DROP TYPE display_app; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does dropping a type also potentially have the same issue as adding a type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's dropping the entire type it doesn't have the same consequences as ADD VALUE
.
It would be pretty nice if we could code-gen the in-transaction type modifications. Code-gen would essentially apply migrations without transactions, and when there's a type-change, it generates code for the new type/rename dance on all tables using the type. It might be somewhat nice to implement our own custom SQL here as well for managing types ( Are types/enums the only issue we may run into in migrations running under a single tx? |
safer, but means that 'ALTER TYPE resource_type ADD VALUE' cannot be used if the | ||
enum value needs to be referenced in another migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should document how to work around this limitation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation alone is not sufficient IMO (hence suggesting code-gen). This is the type of issue that might surface a week, month, or year later when a migration using the earlier type is added. And only some customers will hit it (those that rarely update).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a bug that customers will run into, it'll always be caught by tests. The issue only pops up when you attempt to use an enum value added by ADD VALUE
in a later migration. Otherwise, these are fine to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can always catch this as it is now, (e.g. without codegen, linting, etc). One scenario I can think of is:
- Customer A and B run v2.0.0
- We add a value (
foo(bar)
) in v2.0.1, but we forget the type dance because there's nothing to catch it when the value is unused - Customer B updates to v2.0.1
- We release v2.0.2 and add a migration that modifies rows and change
foo(baz) => foo(bar)
, we don't catch it because we don't have fixtures for everything and the relevant tables are empty - Customer A and B update to v2.0.2
- Customer A runs into an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration tests will catch this since it doesn't matter if the tables are empty. Any reference to the new enum will fail. I ran into this exact scenario while writing this.
https://github.com/coder/coder/blob/main/coderd/database/migrations/000103_add_apikey_name.up.sql does an UPDATE
using the login_type(token)
enum added using ADD VALUE
in an earlier migration. The tests are able to catch this without fixtures since it doesn't matter if any rows are in the table, it still has to convert 'token'
to the proper enum which causes the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, that's good then. I wasn't aware that postgres would catch it for an empty table. Thanks for clarifying. 👍🏻
I think it would be cool, but I'm not sure entirely worthwhile. The whole
In terms of Postgres limitations, yes. They only cause an issue if you |
423df44
to
3ca5ebf
Compare
Updates https://github.com/coder/customers/issues/365
This PR updates our migration framework to run all migrations in a single transaction. This is the same behavior we had in v1 and ensures that failed migrations don't bring the whole deployment down. If a migration fails now, it will automatically be rolled back to the previous version, allowing the deployment to continue functioning.