Summary
`DataProviderMigrate migrate` does not detect when a column-level CHECK constraint expression has changed in `schema.yaml`. The migration runs "successfully" with N ops but silently skips the constraint replacement, leaving prod with the stale CHECK definition. New writes from the updated app code then fail with `CheckViolationError` and the operator is left to drop+recreate the constraint manually with raw SQL — exactly the workflow [MIGRATIONS-PROD] is supposed to eliminate.
Repro (from real NAP prod outage 2026-05-09)
- `schema.yaml` had:
```yaml
- name: kind
type: Text
checkConstraint: "kind IN ('request','input_tokens','output_tokens','sandbox_seconds')"
```
- PR renamed the kinds + added an `llm_` prefix:
```yaml
- name: kind
type: Text
checkConstraint: "kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds')"
```
- Operator ran `DataProviderMigrate migrate --schema migrations/schema.yaml --provider postgres`. Tool reported `23 of 23 operation(s)` applied + `Migration completed successfully`.
- Operations were 5 `AddColumnOperation`, 4 `CreateTableOperation`, 5 `CreateIndexOperation`, 1 `GrantPrivilegesOperation`, 2 `EnableRlsOperation`, 2 `EnableForceRlsOperation`, 4 `CreateRlsPolicyOperation`. Zero check-constraint operations.
- Application then 500'd on every chat turn with:
```
asyncpg.exceptions.CheckViolationError:
new row for relation "usage_events" violates check constraint "usage_events_kind_check"
```
- Querying `pg_constraint` confirmed prod still had the old constraint:
```
usage_events_kind_check |
CHECK ((kind = ANY (ARRAY['request'::text, 'input_tokens'::text, 'output_tokens'::text, 'sandbox_seconds'::text])))
```
Expected
DataProviderMigrate should diff the existing constraint definition against `schema.yaml`'s `checkConstraint`, and emit a drop-and-recreate (or equivalent `AlterCheckConstraintOperation`) when they differ. "Schema converged" should mean every CHECK matches.
Workaround applied in prod
Manual SQL — exactly the thing [MIGRATIONS-PROD] forbids:
```sql
BEGIN;
ALTER TABLE public.usage_events DROP CONSTRAINT usage_events_kind_check;
UPDATE public.usage_events SET kind = 'llm_request' WHERE kind = 'request';
UPDATE public.usage_events SET kind = 'llm_input_tokens' WHERE kind = 'input_tokens';
UPDATE public.usage_events SET kind = 'llm_output_tokens' WHERE kind = 'output_tokens';
UPDATE public.usage_events SET kind = 'machine_seconds' WHERE kind = 'sandbox_seconds';
ALTER TABLE public.usage_events ADD CONSTRAINT usage_events_kind_check
CHECK (kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds'));
COMMIT;
```
(Drop must come BEFORE the UPDATE, or the old constraint blocks the rename.)
Severity
S1 — silent prod outage. The tool exits 0 and reports "Migration completed successfully" while leaving the schema in a state guaranteed to break the new app revision. Any consumer who renames an enum-style CHECK constraint will hit this.
Versions
- DataProviderMigrate: `/tmp/dp-tool/DataProviderMigrate` (NAP CI installs `0.9.10-beta`)
- PostgreSQL: 15 (Supabase managed)
Tracking
NAP outage filed at https://github.com/Nimblesite/NimblesiteAgenticPlatform/issues/36 — operator workaround applied at ~08:25Z.
Summary
`DataProviderMigrate migrate` does not detect when a column-level CHECK constraint expression has changed in `schema.yaml`. The migration runs "successfully" with N ops but silently skips the constraint replacement, leaving prod with the stale CHECK definition. New writes from the updated app code then fail with `CheckViolationError` and the operator is left to drop+recreate the constraint manually with raw SQL — exactly the workflow [MIGRATIONS-PROD] is supposed to eliminate.
Repro (from real NAP prod outage 2026-05-09)
```yaml
type: Text
checkConstraint: "kind IN ('request','input_tokens','output_tokens','sandbox_seconds')"
```
```yaml
type: Text
checkConstraint: "kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds')"
```
```
asyncpg.exceptions.CheckViolationError:
new row for relation "usage_events" violates check constraint "usage_events_kind_check"
```
```
usage_events_kind_check |
CHECK ((kind = ANY (ARRAY['request'::text, 'input_tokens'::text, 'output_tokens'::text, 'sandbox_seconds'::text])))
```
Expected
DataProviderMigrate should diff the existing constraint definition against `schema.yaml`'s `checkConstraint`, and emit a drop-and-recreate (or equivalent `AlterCheckConstraintOperation`) when they differ. "Schema converged" should mean every CHECK matches.
Workaround applied in prod
Manual SQL — exactly the thing [MIGRATIONS-PROD] forbids:
```sql
BEGIN;
ALTER TABLE public.usage_events DROP CONSTRAINT usage_events_kind_check;
UPDATE public.usage_events SET kind = 'llm_request' WHERE kind = 'request';
UPDATE public.usage_events SET kind = 'llm_input_tokens' WHERE kind = 'input_tokens';
UPDATE public.usage_events SET kind = 'llm_output_tokens' WHERE kind = 'output_tokens';
UPDATE public.usage_events SET kind = 'machine_seconds' WHERE kind = 'sandbox_seconds';
ALTER TABLE public.usage_events ADD CONSTRAINT usage_events_kind_check
CHECK (kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds'));
COMMIT;
```
(Drop must come BEFORE the UPDATE, or the old constraint blocks the rename.)
Severity
S1 — silent prod outage. The tool exits 0 and reports "Migration completed successfully" while leaving the schema in a state guaranteed to break the new app revision. Any consumer who renames an enum-style CHECK constraint will hit this.
Versions
Tracking
NAP outage filed at https://github.com/Nimblesite/NimblesiteAgenticPlatform/issues/36 — operator workaround applied at ~08:25Z.