Skip to content

DataProviderMigrate misses CHECK constraint changes on existing columns #57

@MelbourneDeveloper

Description

@MelbourneDeveloper

Summary

`DataProviderMigrate migrate` does not detect when a column-level CHECK constraint expression has changed in `schema.yaml`. The migration runs "successfully" with N ops but silently skips the constraint replacement, leaving prod with the stale CHECK definition. New writes from the updated app code then fail with `CheckViolationError` and the operator is left to drop+recreate the constraint manually with raw SQL — exactly the workflow [MIGRATIONS-PROD] is supposed to eliminate.

Repro (from real NAP prod outage 2026-05-09)

  1. `schema.yaml` had:
    ```yaml
    • name: kind
      type: Text
      checkConstraint: "kind IN ('request','input_tokens','output_tokens','sandbox_seconds')"
      ```
  2. PR renamed the kinds + added an `llm_` prefix:
    ```yaml
    • name: kind
      type: Text
      checkConstraint: "kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds')"
      ```
  3. Operator ran `DataProviderMigrate migrate --schema migrations/schema.yaml --provider postgres`. Tool reported `23 of 23 operation(s)` applied + `Migration completed successfully`.
  4. Operations were 5 `AddColumnOperation`, 4 `CreateTableOperation`, 5 `CreateIndexOperation`, 1 `GrantPrivilegesOperation`, 2 `EnableRlsOperation`, 2 `EnableForceRlsOperation`, 4 `CreateRlsPolicyOperation`. Zero check-constraint operations.
  5. Application then 500'd on every chat turn with:
    ```
    asyncpg.exceptions.CheckViolationError:
    new row for relation "usage_events" violates check constraint "usage_events_kind_check"
    ```
  6. Querying `pg_constraint` confirmed prod still had the old constraint:
    ```
    usage_events_kind_check |
    CHECK ((kind = ANY (ARRAY['request'::text, 'input_tokens'::text, 'output_tokens'::text, 'sandbox_seconds'::text])))
    ```

Expected

DataProviderMigrate should diff the existing constraint definition against `schema.yaml`'s `checkConstraint`, and emit a drop-and-recreate (or equivalent `AlterCheckConstraintOperation`) when they differ. "Schema converged" should mean every CHECK matches.

Workaround applied in prod

Manual SQL — exactly the thing [MIGRATIONS-PROD] forbids:

```sql
BEGIN;
ALTER TABLE public.usage_events DROP CONSTRAINT usage_events_kind_check;
UPDATE public.usage_events SET kind = 'llm_request' WHERE kind = 'request';
UPDATE public.usage_events SET kind = 'llm_input_tokens' WHERE kind = 'input_tokens';
UPDATE public.usage_events SET kind = 'llm_output_tokens' WHERE kind = 'output_tokens';
UPDATE public.usage_events SET kind = 'machine_seconds' WHERE kind = 'sandbox_seconds';
ALTER TABLE public.usage_events ADD CONSTRAINT usage_events_kind_check
CHECK (kind IN ('llm_request','llm_input_tokens','llm_output_tokens','machine_seconds'));
COMMIT;
```

(Drop must come BEFORE the UPDATE, or the old constraint blocks the rename.)

Severity

S1 — silent prod outage. The tool exits 0 and reports "Migration completed successfully" while leaving the schema in a state guaranteed to break the new app revision. Any consumer who renames an enum-style CHECK constraint will hit this.

Versions

  • DataProviderMigrate: `/tmp/dp-tool/DataProviderMigrate` (NAP CI installs `0.9.10-beta`)
  • PostgreSQL: 15 (Supabase managed)

Tracking

NAP outage filed at https://github.com/Nimblesite/NimblesiteAgenticPlatform/issues/36 — operator workaround applied at ~08:25Z.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions