Skip to content

Bug: MS-SQL create-tables fails with schema-qualified FK targets and multiple cascade paths #101

@myyong

Description

@myyong

Problem

Two related bugs surface when running `create-tables` against an MS-SQL destination whose source `orm.yaml` was generated from a schema-qualified PostgreSQL or MS-SQL source (e.g. `SRC_SCHEMA=mimic100`).


Bug 1 — `NoReferencedTableError` on schema-qualified FK targets

FK targets are stored in `orm.yaml` with the source schema prefix:

```yaml
gender_concept_id:
foreign_keys:
- mimic100.concept.concept_id # 3-part: schema.table.column
```

When `dict_to_metadata` builds the destination `MetaData`, tables are registered without a schema prefix (just `concept`). SQLAlchemy interprets `mimic100.concept.concept_id` as schema `mimic100`, table `concept` — and cannot find it in the MetaData. This causes a crash during `metadata.create_all()`'s topological sort:

```
sqlalchemy.exc.NoReferencedTableError: Foreign key associated with column
'person.gender_concept_id' could not find table 'mimic100.concept'
```

Fix

Added `_unqualify_fk_target(fk)` in `datafaker/serialize_metadata.py` to strip the schema qualifier before constructing `ForeignKey` objects:

```python
def _unqualify_fk_target(fk: str) -> str:
parts = fk.split(".")
return ".".join(parts[-2:]) if len(parts) == 3 else fk
```

The original 3-part string is still passed to `ignore_fk` so that `should_ignore_fk`'s schema-aware lookup continues to work correctly.


Bug 2 — MS-SQL error 1785 (multiple cascade paths)

Once Bug 1 is fixed, MS-SQL raises error 1785 when creating tables that have multiple FK columns all pointing to the same target table with `ON DELETE CASCADE`:

```
ProgrammingError: ('42000', "Introducing FOREIGN KEY constraint
'person_ethnicity_source_concept_id_fkey' on table 'person' may cause cycles or
multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or
modify other FOREIGN KEY constraints.")
```

OMOP-style schemas (and many vocabulary-heavy schemas) commonly have many FK columns on a single table all referencing the same vocabulary table (e.g. `concept`), which triggers this MS-SQL restriction. PostgreSQL allows multiple cascade paths freely.

Fix

Added a `@compiles(CreateTable, "mssql")` hook in `datafaker/create.py` that strips `ON DELETE CASCADE` from MS-SQL DDL, mirroring the existing DuckDB hook:

```python
@compiles(CreateTable, "mssql")
def remove_mssql_on_delete_cascade(element, compiler, **kw):
text = compiler.visit_create_table(element, **kw)
return text.replace(" ON DELETE CASCADE", "")
```

Dropping CASCADE is safe for datafaker because rows are inserted in topological (FK-respecting) order — referential integrity is enforced by insert order, not by the database engine.


Tests added

`tests/test_serialize_metadata_mssql.py`

  • `TestUnqualifyFkTarget` — 3 unit tests for `_unqualify_fk_target`: 3-part target drops schema, 2-part unchanged, single-part unchanged
  • `TestSchemaQualifiedFKResolution.test_schema_qualified_fk_resolves_in_metadata` — builds a MetaData with a schema-qualified FK target and asserts the FK resolves without `NoReferencedTableError`

`tests/test_create_mssql.py`

  • `TestMSSQLRemoveOnDeleteCascade` — 3 tests: CASCADE absent from MS-SQL DDL, FOREIGN KEY constraint preserved, DuckDB hook not broken

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions