Skip to content

test(indexing): cross-modality ModalityWorker.derive+sync contract supplement#1730

Merged
earayu merged 1 commit into
mainfrom
chenyexuan/wave4-modality-worker-tests
Apr 27, 2026
Merged

test(indexing): cross-modality ModalityWorker.derive+sync contract supplement#1730
earayu merged 1 commit into
mainfrom
chenyexuan/wave4-modality-worker-tests

Conversation

@earayu
Copy link
Copy Markdown
Collaborator

@earayu earayu commented Apr 27, 2026

Summary

Per @earayu2 / @不穷 standby task (msg=981ae30d) — supplement the existing per-modality acceptance tests in test_t1_2_graph.py / test_t1_3_vector_fulltext.py / test_t1_4_summary_vision.py with cross-modality contract tests that exercise invariants the spec promises across all 5 modalities.

Disjoint with PR #1729 (Wave 3 hard-cut) — this branch is based on origin/main and verified passing against the pre-Wave-3 codebase. Per PM msg=ca69198a: open in parallel, no merge dependency on Wave 3.

Coverage (11 tests, +456 LOC, 1 new file)

  1. §C.7 reschedule contract (2 tests): summary + vision derive() with a missing upstream source returns DeriveResult(derived_artifact_path="") (the empty-string signal the orchestrator interprets as "derive incomplete, leave PENDING for next reconciler cycle"). Vector + fulltext are pass-through and covered by existing per-modality "no-op on missing chunks" tests.

  2. N-call replay convergence (4 tests, parametrized vector/fulltext/summary/vision): 5 consecutive sync() calls produce a backend state byte-equivalent to a single sync — extends existing 2-call tests under arbitrary retry storms (§D.4).

  3. Cross-document parse_version isolation (4 tests, parametrized): sync() of doc-A's (doc_a, parse_version) slot must NOT touch doc-B's backend state. Locks the §D.1 DELETE scope: WHERE document_id=A AND parse_version=V only. Uses two distinct doc bodies because the parser's content-hashed chunk_id would otherwise collide cross-doc (fixture limitation, not production-code invariant violation; called out in test docstring).

  4. All-5-modality enum discriminator (1 test): each ModalityWorker subclass binds the class-level modality attribute to the matching Modality enum value — orchestrator route key, a misbind would silently mis-route work.

Graph modality is intentionally NOT in the parametric sweeps because test_t1_2_graph.py already covers the §D.3 lineage semantic exhaustively (D3.6 5-step scenario + Nebula race + byte-equivalent re-sync + tenant_scope_key propagation). The all-5-modality enum test is the only graph touch here.

Test plan

  • pytest the new file → 11 passed (against origin/main)
  • pytest the new file → 11 passed (against chenyexuan/celery-wave3-cutover Wave 3 branch — Wave 3 model NOT NULL flip + dispatcher changes don't perturb ModalityWorker contracts)
  • ruff check + format --check clean
  • CI lint-and-unit + e2e-http-smoke + e2e-http-provider + provider-preflight (CI to run)

🤖 Generated with Claude Code

…pplement

Per @earayu2 / @不穷 msg=981ae30d standby task: supplement the existing
per-modality acceptance tests in
``test_t1_2_graph.py`` / ``test_t1_3_vector_fulltext.py`` /
``test_t1_4_summary_vision.py`` with **cross-modality contract tests**
that exercise invariants the spec promises across all 5 modalities.

Per @不穷 msg=0d35f537 scope decision: this lands as a follow-up PR
(NOT into the current Wave 3 PR #1729). The branch is based on
``origin/main`` and verified passing against the pre-Wave-3 codebase;
no Wave 3 dependency.

Coverage added (11 tests):

1. **§C.7 reschedule contract** (2 tests): summary + vision
   ``derive()`` with a missing upstream source returns
   ``DeriveResult(derived_artifact_path="")`` (the empty-string signal
   the orchestrator interprets as "derive incomplete, leave PENDING
   for next reconciler cycle"). Vector + fulltext are pass-through and
   covered by existing per-modality "no-op on missing chunks" tests.

2. **N-call replay convergence** (4 tests, parametrized across
   vector/fulltext/summary/vision): 5 consecutive ``sync()`` calls
   produce a backend state byte-equivalent to a single sync (extends
   existing 2-call tests under arbitrary retry storms — §D.4).

3. **Cross-document parse_version isolation** (4 tests, same params):
   ``sync()`` of doc-A's ``(doc_a, parse_version)`` slot must NOT
   touch doc-B's backend state. Locks the §D.1 DELETE scope:
   ``WHERE document_id=A AND parse_version=V`` only. Uses two
   distinct doc bodies because the parser's content-hashed
   ``chunk_id`` would otherwise collide cross-doc — fixture
   limitation, not a production-code invariant violation; called out
   in the test docstring.

4. **All-5-modality enum discriminator** (1 test): each
   ``ModalityWorker`` subclass binds the class-level ``modality``
   attribute to the matching ``Modality`` enum value (orchestrator
   route key — a misbind would silently mis-route work).

Graph modality is intentionally NOT in the parametric sweeps because
``test_t1_2_graph.py`` already covers the §D.3 lineage semantic
exhaustively (D3.6 5-step scenario + Nebula race + byte-equivalent
re-sync + tenant_scope_key propagation). The all-5-modality enum
test is the only graph touch here.

Local gates:
- pytest tests/unit_test/indexing/test_modality_worker_contract.py → 11 passed
- ruff check + format --check clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@earayu earayu merged commit 07599fa into main Apr 27, 2026
4 checks passed
@earayu earayu deleted the chenyexuan/wave4-modality-worker-tests branch April 27, 2026 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant