fix: Disable semantic_check on populate antijoin (parallels #1383)#1453
fix: Disable semantic_check on populate antijoin (parallels #1383)#1453dimitri-yatsenko wants to merge 1 commit into
Conversation
Same fix #1383 applied to the Job table's antijoin in refresh(), now applied to AutoPopulate._populate_direct's antijoin and the progress() fallback path. The two-arg subtract `key_source - self` triggers QueryExpression.__sub__ which calls .restrict(Not(...)) with semantic_check=True by default. The semantic-check requirement is wrong here: this antijoin is a plain set-difference, not a join — we ask "which key_source rows aren't yet in self." Whether the same-named PK attribute carries the same source-table lineage tag on both sides is irrelevant. Where it bites: dj.Imported / dj.Computed tables whose primary key is fully inherited from a single FK, with no own-table PK attributes. On those, self.proj() returns the PK attribute with lineage=None (or pointing to self rather than the FK parent), while key_source's matching attribute carries the parent's lineage tag. The semantic-check fails with: Cannot join on attribute 'X': different lineages (schema.parent.X vs None). Use .proj() to rename one of the attributes. This pattern is legitimate ("one row downstream per parent row, no intermediate ID") but rare in typical Elements / SciOps pipelines, which extend the inherited PK with own-table attributes (trial_id, experiment_id, etc.) that anchor proj()'s lineage. That's why the existing #1405 test suite didn't surface it. Changes: - src/datajoint/autopopulate.py - Import Not from .condition at module top. - _populate_direct: replace `(LHS - self.proj())` with `LHS.restrict(Not(self.proj()), semantic_check=False)`. - progress(): same swap on the no-common-attrs fallback branch. - tests/integration/test_autopopulate.py - New test_populate_antijoin_fk_inherited_pk regression test: Spec(Manual) -> Item(Imported with only -> Spec) — the minimal shape that triggers the bug. Without the fix Item.populate() raises DataJointError; with the fix it populates correctly, progress() reports correct counts, and partial-then-full populate works. Stacked on top of #1452 (the secrets-loading + dead-code fix); rebase to master after that lands.
|
Closing — the diagnosis was wrong. I assumed Both sides of the populate antijoin carry the same lineage tag. The The original Apologies for the noise. #1452 is unaffected and still stands on its own. |
Summary
#1383 disabled
semantic_checkfor the jobs table antijoin inJob.refresh().The same defect lives in
AutoPopulate._populate_direct's antijoin (and theprogress()fallback). This PR applies the analogous fix.The
-operator on aQueryExpressioncalls.restrict(Not(...))with defaultsemantic_check=True. For a plain set-difference ("which rows of the LHS aren'tin the RHS?"), the semantic check is wrong — we don't care whether the same-named
PK attribute carries the same source-table lineage tag on each side; we just want
the set difference.
Where it bites
dj.Imported/dj.Computedtables whose PK is fully inherited from a singleforeign key, with no own-table PK attributes:
On master:
Item.proj()returnsspec_idwithlineage=Nonewhile the key_source'sspec_idcarriesSpec's lineage. The semantic check rejects the antijoin.Why this didn't surface from #1405's test suite
The existing regression tests (
test_populate_antijoin_with_secondary_attrs,test_populate_antijoin_overlapping_attrs) all use tables that either havemultiple FK parents (so PK attributes come from multiple sources and
proj()lineage gets anchored) or extend the FK-inherited PK with own-tableattributes. Elements / SciOps pipelines almost always follow that shape.
The single-FK-inherited-PK + no-own-PK-attrs pattern is uncommon but
legitimate — "one row downstream per parent row, no intermediate ID".
Changes
src/datajoint/autopopulate.pyNotfrom.conditionat module top._populate_direct: replace(LHS - self.proj())withLHS.restrict(Not(self.proj()), semantic_check=False).progress(): same swap on the no-common-attrs fallback branch.tests/integration/test_autopopulate.pytest_populate_antijoin_fk_inherited_pkregression test.Constructs the minimal shape (
Spec(Manual) → Item(Imported)) thattriggers the bug. Exercises partial populate,
progress()counts,full populate, and confirms no re-processing.
Test plan
test_populate_antijoin_fk_inherited_pkexercises the bug and the fixtest_populate_antijoin_with_secondary_attrs,test_populate_antijoin_overlapping_attrs) unchanged and should still pass— testcontainers needs Docker which isn't running on the contributor box)