Skip to content

_insert_dataset raises CircularDependencyError on empty datasets #22

@alialavia

Description

@alialavia

Summary

SqlProof.client_for_dataset({}) calls _insert_dataset unconditionally,
which calls insertion_order(schema_info.tables) to topo-sort tables
before iterating any rows. If the target schema has a foreign-key cycle,
the topo sort raises CircularDependencyError — even when the dataset
is empty and no rows ever need to land.

Reproduction

Any Postgres schema with a mutual FK cycle. In our schema the cycle is:

  • content.current_snapshot_idcontent_import_snapshots(id)
  • content_import_snapshots.content_idcontent(id)

(A common "row points at the most recent child" pattern.)

from sqlproof import SqlProof

proof = SqlProof.from_connection_string(dsn)
with proof.client_for_dataset({}) as client:   # raises here
    pass

Output:

sqlproof.exceptions.CircularDependencyError: Circular foreign-key dependency detected: ai_response_contexts, ai_responses, blog_drafts, content, content_edit_feedback, content_edits, content_findings, content_import_attempts, content_import_snapshots, content_versions, gap_content_drafts, geo_citations_old, geo_questions, webflow_published_items, wordpress_published_posts, workflow_metadata

Note: the table list in the error reports everything transitively
downstream of the cycle, not just the two tables that form it — which
makes the actual cycle hard to spot.

Why this is a bug

For an empty dataset the topo-sort result is never used. The loop body
in src/sqlproof/core.py:250-261 finds no rows for any table and exits:

for table in insertion_order(schema_info.tables):
    rows = dataset.get(table.name, [])
    for row in rows:
        if not row:
            continue
        # … insert

The effect: every property test that uses the standard
proof.client_for_dataset({}) pattern against a schema with a cycle
can't even open the client. In our repo this took out the entire
DB-backed test suite (~30 tests), not just the ones that would
actually have wanted to insert into the cyclic tables.

Suggested fix

Short-circuit _insert_dataset when the dataset has no rows to insert:

def _insert_dataset(client, schema_info, dataset):
    if not any(rows for rows in dataset.values()):
        return
    for table in insertion_order(schema_info.tables):
        …

Two-line patch. Doesn't change behavior for non-empty datasets and
doesn't paper over real cycle errors when rows actually need to land —
those still get raised, with the same message, at the same call site.

Workaround we're using

Reaches into a private attribute and skips the framework's
setup/teardown lifecycle, but unblocks tests today:

@pytest.fixture
def cycle_safe_db(proof: SqlProof):
    with proof._db_manager.acquire() as client:
        client.execute("SAVEPOINT my_test")
        try:
            yield client
        finally:
            client.execute("ROLLBACK TO SAVEPOINT my_test")
            client.execute("RELEASE SAVEPOINT my_test")

Related (probably a separate issue / feature request)

There's no public escape hatch for schemas with a real cycle the test
author can't fix (e.g. tables owned by another team, third-party
extensions, materialized snapshots of historical schemas). An
excluded_tables or included_tables argument on SqlProofConfig
would let those projects opt cyclic tables out of introspection
entirely. The empty-dataset fix above unblocks the common case; this
would unblock the harder case. Happy to file separately if useful.

Environment

  • sqlproof: 0.1.0a1 (editable, local path)
  • Python: 3.11.8
  • Postgres: 16 (Supabase local)
  • macOS 14

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions