Surface OC_SUBTREE_GROUP relationships in semantic search (#1645)#1669
Conversation
…loses #1645) Two-front delivery on the materialised OC_SUBTREE_GROUP rows from #1646: (a) Annotation hits now carry a block_context for the smallest enclosing subtree-group. CoreAnnotationVectorStore._attach_block_context_sync runs a single join after reranking, picks the smallest containing OC_SUBTREE_GROUP per hit, and returns a bounded SUBTREE_GROUP_BLOCK_TEXT_MAX_CHARS-capped block_text. Wired into search/async_search/hybrid_search/async_hybrid_search/global_search and surfaced through PydanticAIVectorSearchResponse and the semanticSearch GraphQL resolver. (b) Relationships are first-class vector targets. Polymorphic Embedding.relationship FK (migration 0073) + partial-unique constraint, Relationship now mixes in HasEmbeddingMixin, new calculate_embeddings_for_relationship_batch task dispatched by build_subtree_groups_for_document (dual-embedding strategy: default + corpus-preferred). New CoreRelationshipVectorStore runs cosine distance against the Embedding table, scoped by Relationship.objects.visible_to_user. New semanticSearchRelationships GraphQL query returns relationship pk, source/target annotation pks, label, block_text, and document/corpus context. Doc-viewer jump-to: ?rel=<relationship_pk> URL param parsed by CentralRouteManager into a new selectedRelationshipId reactive var. useJumpToRelationship hook (mounted in DocumentKnowledgeBase) watches the param + allRelationsAtom and, once relations are loaded, selects the relation and scrolls its source annotation into view. CorpusAnnotationCards click handler now forwards block_context. relationship_id as a relationshipId query param so users landing from a leaf-annotation hit also see the containing block selected. Permissioning mirrors resolve_semantic_search: empty-list response for missing-or-denied document/corpus prevents IDOR enumeration; corpus- scoped queries derive the embedder from Corpus.preferred_embedder so relationship vectors stay in the corpus's frozen embedding space. Tests cover smallest-enclosing-group selection, root-level pass-through, block_text truncation, end-to-end relationship retrieval, IDOR denial, and the rel= URL-builder round-trip.
Code Review — PR #1669: Surface OC_SUBTREE_GROUP relationships in semantic searchOverall this is a well-structured, well-documented PR with solid security hygiene. The IDOR model mirrors the annotation store exactly, the migration is correct, and the test coverage hits the meaningful paths. The comments below are mostly about one DRY violation, one potential front-end bug, and a handful of minor issues. 🐛 Potential Bug —
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
- Fix relationship deep-link bug: useJumpToRelationship now compares
numeric PKs (URL ?rel=<pk> carries raw PK, but RelationGroup.id is
a Relay global ID), so the find() actually matches.
- Clear hoveredAnnotationId on relationship deselection so the stale
hover indicator doesn't linger after ?rel= is cleared.
- DRY block-text construction: extract join_block_text_parts() and
share it between synthesize_relationship_block_text (embedder),
CoreRelationshipVectorStore._shape_results, and
CoreAnnotationVectorStore._attach_block_context_sync. Cap/truncation
logic now lives in one place.
- Fix N+1 in calculate_embeddings_for_relationship_batch: order the
prefetch by id and read raw_text from prefetched objects when the
cache is populated.
- Fix smallest-enclosing-block bug: Count("target_annotations") with
target_annotations__in=hit_ids in the same queryset is restricted by
the filter join, collapsing every group's descendant_count to 1 and
forcing a tie-break on lowest pk (i.e. always selecting ROOT instead
of the innermost enclosing block). Compute descendant_count from the
prefetched target set instead. Existing test now correctly passes.
- Remove dead visible_qs.exists() check, dead RuntimeError branch in
CoreRelationshipVectorStore.__init__, and dead getattr fallback for
similarity_score.
- Fix test isolation: mutating tests now refetch rows so shared
cls-attributes aren't dirtied for sibling tests.
…tionship hook Backend (opencontractserver/tests/test_semantic_search_graphql.py): - New SemanticSearchRelationshipsQueryTest class covering the semantic_search_relationships GraphQL resolver: basic dispatch, corpus/document scoping via Relay global IDs, IDOR contract (invisible corpus/document → empty list), limit clamping, and anonymous rejection. - New SemanticSearchBlockContextFieldTest covering the BlockContext field mapping on semanticSearch — exercises the GraphQL type wiring introduced alongside OC_SUBTREE_GROUP relationship surfacing. Frontend (useJumpToRelationship.test.tsx): - Unit tests pinning the URL-PK ↔ Relay-global-ID bridge that was fixed earlier in this PR: a regression test ensures ?rel=<raw-pk> resolves against RelationGroup.id (Relay ID) via numeric comparison, not string equality. - Covers no-op paths, non-numeric fallback, hover-clear on deselection, and the lastAppliedRef guard.
Code Review: #1645 — Surface OC_SUBTREE_GROUP in semantic searchThis is a well-scoped, well-documented PR. The architecture is consistent, security properties are maintained, and test coverage is solid. A few items below worth addressing before merge. OverviewThe PR builds end-to-end relationship-targeted vector search on top of the materialised OC_SUBTREE_GROUP rows from #1646. The approach is sound: add a polymorphic Issues1.
2. Accessing Django private In prefetched = getattr(relationship, "_prefetched_objects_cache", None) or {}
if "source_annotations" in prefetched:
sources = [a.raw_text or "" for a in src_qs.all()]
3. The docstring says "the M2M join is annotated with 4. Relay global ID / raw PK inconsistency in the GraphQL schema
5. No retry/scroll when virtualised page has not rendered yet In const ref = targetId ? refs[targetId] : undefined;
if (ref && typeof ref.scrollIntoView === "function") {
ref.scrollIntoView({ behavior: "smooth", block: "center" });When the source annotation lives on a page not yet materialised by the virtualised renderer (common for large PDFs on initial load), 6. embed_func=cast(
"Callable[[HasEmbeddingMixin, BaseEmbedder, str], bool]",
_embed_relationship,
),
Minor / Nits
SecurityIDOR contract is correctly maintained: both the store ( Test CoverageCoverage is strong across all new components:
Summary: Items 1 (utility file placement), 3 (misleading docstring), and 5 (scroll no-op on unrendered pages) should be fixed before merge. Items 2 and 4 are lower urgency but worth a follow-up. The rest are optional improvements. |
…relationships-d6CAR
- Move join_block_text_parts and synthesize_relationship_block_text from
tasks/embeddings_task.py to utils/embeddings.py so the GraphQL-facing
vector stores no longer depend on a Celery task module (SRP / utility
placement).
- Drop the _prefetched_objects_cache introspection in
synthesize_relationship_block_text and always go through
``order_by("id").values_list``; the cache shape is a Django internal
that has shifted across versions, and the two extra value queries are
negligible for a Celery task path.
- Fix the misleading docstring in _attach_block_context_sync — there is
no ``Count(target_annotations)`` annotation in the queryset; the
descendant count is computed in Python from the prefetched ID set
(as the inline comment explains).
- Make _apply_dual_embedding_strategy generic in the embeddable type
(TypeVar bound to HasEmbeddingMixin) and drop the cast that was
forcing _embed_relationship's Relationship-typed signature through
the HasEmbeddingMixin-shaped parameter.
- In useJumpToRelationship, track scroll application separately from
selection so the deep-link retries scrolling once the virtualised
renderer mounts the source annotation's page. Selection is still
applied exactly once per relId change, so user-driven edits are not
fought.
- Update test_embedding_manager assertion to match the new error
message that now lists relationship_id among the valid parent IDs.
- Replace the ``as any`` in useJumpToRelationship.test.tsx with a
properly-typed AnnotationLabelType literal to keep the any-baseline
gate green.
Code Review — PR #1669: Surface OC_SUBTREE_GROUP relationships in semantic searchThis is a well-engineered, substantial feature addition. The architecture is sound and the PR description is exemplary. Below is a breakdown of findings across the stack. OverviewThe PR extends the vector search pipeline end-to-end: materialised Strengths
Issues1.
|
…relationships-d6CAR
… test - _embed_relationship: accept optional precomputed_text so the dual- embedding strategy only synthesizes the relationship block text once per relationship instead of once per embedder pass. - core_relationship_vector_store._run_vector_search: return (Relationship, similarity) tuples instead of mutating a dynamic similarity_score attribute on the Django model. Drops the type: ignore[attr-defined] suppression at the consumer site. - useJumpToRelationship: add a defensive 300 ms retry timer for the scroll-into-view path so deep-links into virtualized PDF pages reliably catch the late-registered annotation ref instead of stalling silently if the atom re-render doesn't land. - Add async_search coverage: delegation-to-sync contract and the text-only query rebuild path that flows the regenerated embedding back through search. Note: review item #2 (raw Django PKs vs Relay global IDs on the new relationship_id GraphQL fields) is intentionally deferred — the choice is documented on the field descriptors and changing the convention would cascade across the resolver, hook, and URL routing.
Code Review — #1669: Surface OC_SUBTREE_GROUP relationships in semantic searchThis is a well-designed PR that adds a solid end-to-end feature. The security model is consistent, the async wrapping is handled correctly, and the core algorithm (smallest-enclosing-group selection via a single batched join) avoids N+1 queries. A few items to address: DRY Violation —
|
Extract shared vector-store machinery into BaseVectorStore (user resolution, IDOR checks, embedder resolution, query-embedding generation) so CoreAnnotationVectorStore and CoreRelationshipVectorStore share the permission-filtering contract instead of duplicating it. Also: - Tighten EmbeddingManager.store_embedding to require exactly-one parent FK (previously any/all were silently accepted). - Trim multi-paragraph docstrings on _attach_block_context_sync, BlockContext, VectorSearchResult, join_block_text_parts and synthesize_relationship_block_text per CLAUDE.md. - Document the ordering invariant (by id, not document position) in synthesize_relationship_block_text. - Remove the void useAtomValue / void PdfAnnotations smell in useJumpToRelationship.test.tsx — the imports were genuinely unused. - Add tests: store_embedding rejects multiple parent FKs; both corpus_id + document_id provided where the document isn't in the corpus returns []. Update test patches to target generate_embeddings_from_text / get_embedder at their new home in base_vector_store.
| if running == 0: | ||
| if len(chunk) >= max_chars: | ||
| parts.append(chunk[:max_chars]) | ||
| running = max_chars |
Code Review — PR #1669: Surface OC_SUBTREE_GROUP relationships in semantic searchOverviewThis is a well-architected feature addition that wires the materialised Issues1. Prefetch wasted in
|
…relationships-d6CAR
The generate_embeddings_from_text and agenerate_embeddings_from_text functions are now imported and used in base_vector_store.py (BaseVectorStore parent class), not in core_vector_stores.py. Update the @patch decorators in TestHybridSearch to target the base module so mocks bind successfully.
Cover the new code paths surfaced by issue #1645: - _embed_relationship: empty-text/None-vector/None-embedding short-circuits, precomputed_text reuse, and synthesize fallback. - calculate_embeddings_for_relationship_batch: empty-list early return, explicit-embedder load failure, per-relationship outcome counting, dual-embedding dispatch when no embedder is supplied, and individual failure aggregation. - PydanticAIVectorSearchResponse: async_from_core_results block_context dict-building and label-resolution branches, plus the sync variant.
Items 1 and 2 from the most recent automated review on #1669: - synthesize_relationship_block_text uses values_list which bypasses Django's prefetch cache (it stores instances, not raw column tuples), so the prefetch_related calls in calculate_embeddings_for_relationship_batch were doing zero work. Drop them and document why; the 2N queries are bounded by the small subtree-group cardinality per batch. - Correct the stale 'inlined' comment in the dual-embedding branch — the code calls _apply_dual_embedding_strategy directly; update the comment to describe the actual delegation.
Code Review: Surface OC_SUBTREE_GROUP Relationships in Semantic Search (#1645)This is a well-architected feature. The Performance1. N+1 queries in The block-context attachment loop calls # core_vector_stores.py – add to the Relationship queryset before the loop
qs = qs.prefetch_related("source_annotations", "target_annotations")2. N+1 in
Security / IDOR3. Block-context attachment does not re-apply corpus/document scope — Medium
Suggested fix — carry the scope into the attachment helper: if self.corpus_id:
qs = qs.filter(corpus_id=self.corpus_id)
if self.document_id:
qs = qs.filter(document_id=self.document_id)4. No explicit The resolver relies on user = info.context.user
if not user or not user.is_authenticated:
return []5. Frontend
const relParam = searchParams.get("rel");
const relationshipId = relParam && /^\d+$/.test(relParam) ? relParam : null;
selectedRelationshipId(relationshipId);Consistency / Race Condition6. Embedding task dispatch outside the atomic block — Low In Test Coverage7. No async integration test for
8. No coverage for empty source-annotation edge case — Low
def test_synthesize_with_no_source_annotations(self):
rel.source_annotations.clear()
text = synthesize_relationship_block_text(rel)
self.assertEqual(text, "") # or assert it equals only target textMinor / Code Quality9.
self.user_id = int(user_id) if isinstance(user_id, str) and user_id else user_idPositives Worth Noting
|
Code Review: PR #1669 — Surface OC_SUBTREE_GROUP relationships in semantic searchOverall this is a well-designed, comprehensive implementation. The end-to-end wiring from materialised subtree groups through to vector search and frontend deep-linking is coherent, and the design decisions are sound. A few things to call out: Strengths
Issues & Suggestions1. Pagination over a pre-fetched slice — potential correctness gapIn results = store.search(RelationshipVectorSearchQuery(query_text=query, similarity_top_k=limit + offset))
paginated_results = results[offset : offset + limit]The store is told to fetch
This is the same pattern used by the annotation resolver, so I understand the consistency argument, but it's worth noting that deep pagination will be costly. The existing 2. Raw PKs vs Relay global IDs — document the contract explicitly
Suggestion: either encode all IDs consistently (even in this type), or add a note to 3.
|
Resolves CHANGELOG conflict by keeping both entries (subtree-group semantic search + PII scanner analyzer).
Codecov flagged three files on PR 1669 below patch threshold: * useJumpToRelationship.ts (75%) — added unit tests for the scroll happy path, the source-ref → target-ref fallback, the deferred setTimeout retry path, and the malformed-relay-id catch branch. Also tightened test isolation: hook instances from prior tests now unmount in afterEach so shared scroll-spy calls don't leak across cases. * CorpusAnnotationCards.tsx (30%) — extracted the blockContext derivation (annotation id → containing relationship id) and the click-handler queryParams builder into a pure corpusAnnotationCardsHelpers module so they can be unit-tested without mounting the full Apollo + router + reactive-var stack. * CentralRouteManager.tsx (50%) — added Phase-2 tests pinning that ?rel=42 populates selectedRelationshipId and absence clears it.
* Document the raw-PK ID convention on SemanticSearchRelationshipResultType so future callers don't accidentally feed those values into resolvers expecting Relay global IDs (e.g. node(id: $documentId)). * Add a SECURITY CONTRACT docstring to _attach_block_context_sync stating that it does not apply visible_to_user filtering on its own — callers must pass results already permission-filtered by the store's visibility queryset, or block_text could leak to users who can't see the underlying source/target annotations. * Remove the 'Lazy import keeps the GraphQL module light' explainer comment per the project's no-explain-what conventions; the pattern is consistent with resolve_semantic_search and self-evident.
Code Review: Surface OC_SUBTREE_GROUP Relationships in Semantic Search (#1645)This is a well-architected feature that correctly extends the semantic search surface to cover materialised subtree-group relationships. The permission model mirrors the existing annotation-store patterns, the migration is clean and backward-compatible, and the test coverage is thorough. A few items worth addressing before merge: Security
The method fetches Suggested mitigation (low-effort): the method already receives the full PerformanceAnnotation text fetch in Lines 612-616: after picking winning groups, the method runs a new
Migration
One minor suggestion: add a comment in the migration (or in Code Quality
Test mock path fixes in The 8 Test CoverageThe new
Two coverage gaps:
Frontend
The dual Nit: the cleanup return Raw PK vs Relay ID in deep-link URL The convention that SummaryOverall this is high-quality work. The two most actionable items before merge are:
The security contract in |
…relationships-d6CAR
…ral decoy Migration 0073: add inline comment explaining the partial unique constraint — the condition (relationship IS NOT NULL) protects existing annotation/note/message/document rows that legally carry relationship=NULL. Test: add coverage for the structural=True filter in _attach_block_context_sync. A non-structural OC_SUBTREE_GROUP look-alike must not contribute block context even when no genuine structural group is available — guards against analyzer-copied label rows polluting embedded context.
Code Review — PR #1669: Surface OC_SUBTREE_GROUP Relationships in Semantic SearchThis is a well-architected end-to-end feature with strong separation of concerns, correct IDOR handling, solid test coverage, and clean migration safety. The key concerns are two correctness issues in the embedding pipeline that warrant attention before merge. High PriorityB1 —
|
…relationships-d6CAR
Code Review: Surface OC_SUBTREE_GROUP relationships in semantic search (#1645)This is a well-structured, ambitious PR that cleanly extends the semantic search pipeline end-to-end. The Bugs / Correctness Issues1.
provided = [x for x in (...) if x] # ← treats 0 as not-providedShould be 2.
# If you want to enforce "exactly one," just check parent_references != 1This comment says "if you want to enforce exactly one" as if it's optional, but 3.
The frontend currently uses Performance Considerations4. Extra database round-trip inside
structural_set_id = (
Document.objects.filter(pk=self.document_id)
.values_list("structural_annotation_set_id", flat=True)
.first()
)This is a separate query that runs on every 5. Potential duplicate rows in
The query JOINs 6.
For a corpus with deep subtree groups (hundreds of annotations per group) and a full top-k result set, the Code Quality / Consistency7.
8. Corpus-id fallback in
# corpus_id is NULL on structural relationships; best-effort
# fallback to scoping context for breadcrumbs / deep-links
# (this is a hint, not ground truth — a structural set can be
# shared across corpora).If the GraphQL client deep-links using the returned 9. As noted in point 2, Security10.
The comment "relies on upstream permission gates" is correct: hit annotation IDs come from Minor / Nits
SummaryThe architecture is sound and the security patterns are consistent with the rest of the codebase. The four items that should be addressed before merge are:
The remaining items are improvements that can be addressed in follow-up issues. |
…relationships-d6CAR
|
Code Review: PR 1669 - Surface OC_SUBTREE_GROUP in semantic search. Full review text below - this is a placeholder while formatting is resolved. |
|
Code Review: Surface OC_SUBTREE_GROUP relationships in semantic search Overview Well-architected, substantial feature. Thorough, security-conscious implementation with solid test coverage. Positives
Issue 1 (Medium) — Breaking change in store_embedding File: opencontractserver/shared/Managers.py line ~1571 Old check: if not any([...]) (zero parents -> error). New check: len(provided) != 1 (zero OR multiple parents -> ValueError). Semantically correct but a breaking change: callers that inadvertently passed two parent IDs previously got a DB constraint violation later; now they get ValueError immediately. Worth grepping all store_embedding call sites before merging. Issue 2 (Medium) — Raw Django PKs in BlockContextType / SemanticSearchRelationshipResultType File: config/graphql/social_types.py relationship_id, source_annotation_id, target_annotation_ids, document_id, and corpus_id are raw Django PKs, breaking the Relay global ID convention used everywhere else. Code-gen tools type these as ID! indistinguishable from Relay-encoded IDs. A future resolver that receives one and calls from_global_id() on it will silently return wrong results (no error, just no data). Consider prefixing as relationship_pk/document_pk, encoding as Relay IDs server-side, or at minimum making the warning visible in GraphQL introspection descriptions rather than only in Python comments. Issue 3 (Low-Medium) — _attach_block_context_sync security pre-condition is implicit File: opencontractserver/llms/vector_stores/core_vector_stores.py The security comment is correct today, but as new entry points are added the pre-condition is easy to forget. Consider accepting a user parameter and applying visible_to_user internally to make the gate mechanical rather than relying on caller knowledge. Issue 4 (Low) — block_text ordering mismatch for multi-source relationships File: opencontractserver/llms/vector_stores/core_relationship_vector_store.py, _shape_results synthesize_relationship_block_text orders sources by ID (order_by id). _shape_results builds ordered_ids from sources[0].id from the unsorted prefetch cache. Not a bug for current single-source OC_SUBTREE_GROUP rows, but latent for future multi-source types. Fix: source_id = min(ann.id for ann in sources) if sources else None. Issue 5 (Low) — Duplicate block_context dict building File: opencontractserver/llms/vector_stores/pydantic_ai_vector_stores.py Identical blocks in from_core_results and async_from_core_results both build the block_context dict. A _block_context_to_dict(bc) helper eliminates the duplication. Issue 6 (Low/FYI) — Dual-embedding task branch unreachable from dispatcher File: opencontractserver/utils/subtree_groups.py, _dispatch_relationship_embeddings The dispatcher always passes an explicit embedder_path to calculate_embeddings_for_relationship_batch, so the task's dual-embedding branch (embedder_path=None) is dead code from this call site. Dual-embedding is achieved by dispatching two separate tasks instead. Functionally correct; worth a docstring note so future maintainers understand the branch exists for direct callers, not the dispatcher. Issue 7 (Nitpick) — similarity_top_k = limit + offset vs. cap limit is capped at SEMANTIC_SEARCH_MAX_RESULTS but similarity_top_k=limit+offset is uncapped. With a large offset the underlying vector scan can significantly exceed the declared cap. Matches the existing annotation resolver pattern; fine in practice. Summary The core feature is solid and the security model is sound. The two items most worth addressing before merge are the raw-PK ID convention (long-lived API footgun) and auditing the store_embedding exactly-one-parent guard for existing callers. Everything else is low-priority polish. Test coverage is strong. The BaseVectorStore extraction is a genuine improvement. |
…relationships-d6CAR
- Make _attach_block_context_sync apply Relationship.visible_to_user internally so block-context exposure is mechanical, not caller-policed - Sort source/target annotations by id in _shape_results so block_text mirrors synthesize_relationship_block_text for multi-source future - Extract _block_context_to_dict helper to drop duplicated payload building between sync and async pydantic-ai response builders - Document that calculate_embeddings_for_relationship_batch's no-path dual-embedding branch is for direct callers, not the dispatcher
Captures CorpusAnnotationCards browse-mode rendering — provides a stable visual baseline for the semantic search annotations card surface that this PR rewires for block-context deep linking.
Code Review: Surface OC_SUBTREE_GROUP Relationships in Semantic Search (#1645)Well-executed feature PR. The architecture is clean, security is properly enforced end-to-end, and the test coverage is comprehensive. Below are observations ranging from minor clarifications to small improvement opportunities — no blocking issues found. Security ✅
Performance ✅
Potential Issues / Minor Observations1. Implicit assumption: single source annotation per OC_SUBTREE_GROUP ( source_id = next(iter(rel.source_annotations.all()), None)
2. _, corpus_pk = from_global_id(corpus_id)If a caller passes a malformed or wrong-type Relay ID, 3. The dispatch function accepts a assert AnnotationLabel.objects.filter(pk=label_id, text=OC_SUBTREE_GROUP_LABEL_NAME).exists()(or raise 4. Block-context dict serialization is duplicated in two places
5. Error messages in When embedding fails, the log message uses the generic exception 6. Test gap: The GraphQL resolver tests ( Nits
Summary
The feature is solid. The notes above are mostly hardening opportunities rather than correctness issues. The most actionable items are (2) guarding |
Summary
Implements end-to-end vector search over materialised
OC_SUBTREE_GROUPrelationships, enabling semantic search to return entire document blocks (not just individual annotations) and deep-link the document viewer directly to them.Key Changes
Backend
New
CoreRelationshipVectorStore(core_relationship_vector_store.py): MirrorsCoreAnnotationVectorStorebut searches the polymorphicEmbedding.relationshipslot. Enforces the same visibility/IDOR model viaRelationship.objects.visible_to_user()and scopes by corpus/document.Relationship embedding pipeline (
embeddings_task.py):synthesize_relationship_block_text(): Concatenates source + target annotation text (newline-separated, capped atSUBTREE_GROUP_BLOCK_TEXT_MAX_CHARS) — same string the embedder sees, so GraphQL clients can render snippets without re-fetching.calculate_embeddings_for_relationship_batch(): Dual-embedding task (default + corpus-preferred embedder) for subtree groups, dispatched automatically by the materialiser.Block-context augmentation (
core_vector_stores.py):BlockContextdataclass: Surfaces the smallest enclosingOC_SUBTREE_GROUPfor annotation hits._attach_block_context_sync(): Joins annotation hits against materialised subtree groups, picks the smallest enclosing block per hit, and attaches it toVectorSearchResult.block_context.Relationship embedding support (
annotations/models.py):Relationshipnow inheritsHasEmbeddingMixinso subtree groups can be embedded.get_embedding_reference_kwargs()wires the polymorphicEmbedding.relationshipFK.Database schema (
migrations/0073_embedding_relationship.py):Embedding.relationshipFK + partial unique constraint for polymorphic embedding storage.Subtree materialiser integration (
utils/subtree_groups.py):_dispatch_relationship_embeddings(): Enqueues embedding tasks for fresh subtree groups outside the atomic block (idempotent upsert prevents double-embedding on retries).GraphQL API (
config/graphql/):BlockContextTypeGraphQL type with relationship ID, source/target annotation IDs, and bounded block text.SemanticSearchResultType.block_contextfield: Populated post-hoc for annotation hits inside a subtree.semantic_search_relationshipsquery: ReturnsSemanticSearchRelationshipResulthits ranked by cosine similarity, scoped by corpus/document.Frontend
Deep-linking hook (
useJumpToRelationship.ts): Wires the URL?rel=<pk>parameter to select the relationship and scroll the source annotation into view.Relationship selection UI (
CorpusAnnotationCards.tsx): Maps annotation IDs to containing relationship IDs so clicks can deep-link the doc viewer to the block (not just the leaf).Route integration (
CentralRouteManager.tsx,navigationUtils.ts): Addsrelquery parameter support for relationship deep-links.GraphQL types (
frontend/src/graphql/queries.ts): NewBlockContextPayloadandSemanticSearchRelationshipResultinterfaces.Notable Implementation Details
Visibility enforcement: Both stores use
visible_to_user()+ Q-OR filters to handle structural relationships (anchored viaStructuralAnnotationSet) and non-structural rows uniformly — same pattern asCoreAnnotationVectorStore.Idempotent embedding: The materialiser's delete-then-insert pattern keeps relationship PKs stable; dual-embedding tasks use
add_embedding()'s upsert logic to short-circuit unchanged inputs.Block-text consistency:
synthesize_relationship_block_text()is shared between the embedder (input)