feat: co-occurrence relation extraction for KG edges by EtanHey · Pull Request #178 · EtanHey/brainlayer

EtanHey · 2026-04-02T00:13:54Z

Summary

Root cause found: brain_digest found entities via seed matching but never created knowledge graph edges. Relation extraction only ran via the LLM path (extract_entities_llm), which requires an llm_caller parameter. When brain_digest calls process_chunk() without an LLM, the relations list is always empty.

Fix: Added extract_cooccurrence_relations() — a rule-based relation extractor that infers co_occurs_with relations between entities of different types that appear in the same text. This runs as step 4 in extract_entities_combined(), always active regardless of LLM availability.

How it works

Pairs entities of different types (e.g., project + technology)
Skips same-type pairs (project + project) — too noisy
Deduplicates: each pair gets at most one relation
Confidence = min(entity_a.confidence, entity_b.confidence) * 0.7

Stub tools status

Investigated brain_update, brain_expand, brain_tags — they already return isError: true with deprecation messages. Not fake-success stubs. No fix needed.

MCP response visibility

Investigated brain_search response format — Python MCP server uses clean TextContent(type="text", text=...) with no ANSI codes. Format is correct per MCP spec. The "invisible results" issue is a Claude Code UI behavior (collapses tool results), not a BrainLayer bug.

Test plan

5 new tests in tests/test_kg_relations.py
ruff check + ruff format clean

🤖 Generated with Claude Code

Note

Add co-occurrence relation extraction for knowledge graph edges

Adds extract_cooccurrence_relations in entity_extraction.py that generates co_occurs_with relations between entities of different types appearing in the same text.
Pairs are de-duplicated using a sorted (a.text, b.text) key; confidence is set to 0.7 * min(a.confidence, b.confidence).
extract_entities_combined now calls this function after sorting final entities and extends all_relations with the result, regardless of whether LLM extraction is enabled.
New tests in test_kg_relations.py cover cross-type pairing, single-entity edge cases, de-duplication, and same-type skipping.

^{Macroscope summarized 6487081.}

Summary by CodeRabbit

Release Notes

New Features
- Entity extraction now automatically infers relationships between entities of different types based on co-occurrence patterns. Confidence scores for these relationships are calculated from the minimum confidence of the related entities.
Tests
- Added comprehensive test coverage for co-occurrence relationship extraction functionality.

Root cause: brain_digest found entities but never created edges because relation extraction only ran via the LLM path (which requires an llm_caller). No rule-based relation extractor existed. Fix: Added extract_cooccurrence_relations() — infers "co_occurs_with" relations between entities of different types in the same text. Runs without LLM, integrated into extract_entities_combined() as step 4. Confidence = min(entity_a, entity_b) * 0.7 (conservative). Same-type pairs skipped (too noisy). Deduplicated. 5 new tests covering: pair creation, single entity, dedup, same-type exclusion, integration with combined extraction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

EtanHey · 2026-04-02T00:13:55Z

@coderabbitai review

coderabbitai · 2026-04-02T00:14:00Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-04-02T00:14:07Z

📝 Walkthrough

Walkthrough

The changes add a new extract_cooccurrence_relations() function that automatically generates co_occurs_with relations between entities of differing types. The function computes confidence scores as the minimum of both entities' confidences multiplied by 0.7 and deduplicates relations by tracking seen entity pairs. This is integrated into the existing extract_entities_combined() pipeline to produce relations on every extraction call.

Changes

Cohort / File(s)	Summary
Entity Co-occurrence Relations `src/brainlayer/pipeline/entity_extraction.py`	Added `extract_cooccurrence_relations()` function to generate `co_occurs_with` relations between entities with different types, with deduplication and confidence calculation. Integrated into `extract_entities_combined()` to append co-occurrence relations to all extracted relations.
Relation Extraction Tests `tests/test_kg_relations.py`	New test module with `TestCooccurrenceRelations` verifying correct generation, deduplication, and type-filtering of co-occurrence relations, and `TestCombinedExtractsRelations` confirming `extract_entities_combined()` produces relations without LLM involvement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: KG extraction pipeline — wires entities to KG tables #47: Directly related; adds co_occurs_with relation generation to the entity extraction pipeline, affecting downstream knowledge graph extraction workflows.

Poem

🐰 Relations bloom where entities meet,
Different types in co-occurrence sweet,
Confidence flows through deduped pairs,
A web of knowledge, layer by layer! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: co-occurrence relation extraction for KG edges' directly and clearly describes the main change: a new feature for extracting co-occurrence relations to create knowledge graph edges.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/kg-relations-stub-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/pipeline/entity_extraction.py`:
- Line 517: The current pair key uses raw casing (pair = (a.text, b.text) if
a.text < b.text else (b.text, a.text)), which causes inconsistent deduplication
for the same entity with different case; change the comparison to use a
case-normalized form (use Unicode-aware casefold or lowercase) for ordering
(e.g., compare a_key = a.text.casefold() and b_key = b.text.casefold()), but
keep the original a.text/b.text values for the stored pair if you need to
preserve original casing; update the code that constructs pair in
entity_extraction.py to order by the normalized keys instead of the raw texts.
- Around line 513-520: The nested pairwise loop in entity_extraction.py (the for
i, a in enumerate(entities): for b in entities[i + 1 :] loop that builds
relations and uses seen) can blow up to O(n^2); before generating pairs filter
and/or cap candidates (e.g., filter entities by a confidence threshold field
like entity.confidence, or limit to top-K entities by confidence), then produce
at most max_relations_per_chunk relations (stop generating once count reaches
the cap). Implement this in the code around the loop that builds pair and seen:
first create a filtered_entities list (apply threshold or take top-K by
confidence), then run the pairwise loop on that list and break early when the
relation count reaches max_relations_per_chunk; make max_relations_per_chunk and
confidence threshold configurable constants or function parameters.

In `@tests/test_kg_relations.py`:
- Around line 10-52: Add two edge-case tests to TestCooccurrenceRelations:
implement test_empty_entities_returns_empty which calls
extract_cooccurrence_relations([]) and asserts an empty list (relations == []),
and implement test_three_different_types_produce_three_relations which creates
three ExtractedEntity instances of distinct entity_type values and asserts
len(extract_cooccurrence_relations(entities)) == 3 to verify the N*(N-1)/2
co-occurrence behavior; reference the existing ExtractedEntity and
extract_cooccurrence_relations symbols when adding these tests.
- Around line 35-43: The test test_no_duplicate_relations doesn't exercise
deduplication because it supplies only unique entities; update it to include
duplicate entity instances (e.g., two ExtractedEntity objects with the same
text/entity_type/start/end or same text/source) so
extract_cooccurrence_relations receives repeated entity pairs, then assert that
only one relation is produced for that duplicate pair (check length and that
pairs == set(pairs) or that count of that specific pair == 1). Locate this logic
in the test_no_duplicate_relations function and modify the entities list (using
ExtractedEntity) and the assertions accordingly so the deduplication behavior of
extract_cooccurrence_relations is actually validated.
- Around line 13-25: Update the test_two_entities_produce_relation test to
assert the exact confidence value produced by extract_cooccurrence_relations:
compute expected_conf = min(0.9, 0.8) * 0.7 and replace the loose check (0 <
rel.confidence <= 1.0) with an exact assertion rel.confidence == expected_conf;
ensure you reference the ExtractedEntity instances used in the test and the
relation object rel returned by extract_cooccurrence_relations and also add
import pytest at the top of the test file if the test runner or assertion style
requires it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d100aa5a-160f-4177-b456-d17123f4de51

📥 Commits

Reviewing files that changed from the base of the PR and between febf02b and 6487081.

📒 Files selected for processing (2)

src/brainlayer/pipeline/entity_extraction.py
tests/test_kg_relations.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: test (3.11)
GitHub Check: test (3.13)
GitHub Check: test (3.12)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

tests/test_kg_relations.py
src/brainlayer/pipeline/entity_extraction.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest for testing

Files:

tests/test_kg_relations.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package in src/brainlayer/
All scripts and CLI must use paths.py:get_db_path() for resolving database path instead of hardcoding
Implement chunk lifecycle management with columns superseded_by, aggregated_into, archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes on chunks table and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic on SQLITE_BUSY errors; each worker must use its own database connection
Use ruff check src/ && ruff format src/ for linting and formatting

Files:

src/brainlayer/pipeline/entity_extraction.py

🧠 Learnings (1)

📓 Common learnings

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

🔇 Additional comments (3)

src/brainlayer/pipeline/entity_extraction.py (2)
500-531: co_occurs_with will be normalized to related_to downstream.

Per kg_extraction.py lines 99-102, any relation_type not in CANONICAL_RELATION_TYPES is normalized to "related_to". The set at lines 31-41 does not include "co_occurs_with".

If the semantic distinction matters (e.g., for filtering or weighting co-occurrence edges differently), add "co_occurs_with" to CANONICAL_RELATION_TYPES. If all you need is some edge, the current behavior is fine—just note that co_occurs_with becomes related_to after validation.

[raise_major_issue, request_verification]
#!/bin/bash
# Verify CANONICAL_RELATION_TYPES contents
rg -n "CANONICAL_RELATION_TYPES" src/brainlayer/pipeline/kg_extraction.py -A 12
580-582: Integration looks correct—co-occurrence runs on deduplicated entities.

Running after entity deduplication ensures relations reference the canonical entity text. The relations are appended to all_relations, preserving any LLM-extracted relations when use_llm=True.
tests/test_kg_relations.py (1)

58-69: Integration test validates the core PR objective.

This test confirms that extract_entities_combined produces co_occurs_with relations without an LLM, which is the main fix described in the PR objectives.

coderabbitai · 2026-04-02T00:17:39Z

+    for i, a in enumerate(entities):
+        for b in entities[i + 1 :]:
+            if a.entity_type == b.entity_type:
+                continue
+            pair = (a.text, b.text) if a.text < b.text else (b.text, a.text)
+            if pair in seen:
+                continue
+            seen.add(pair)


🧹 Nitpick | 🔵 Trivial

Potential quadratic blowup for large entity lists.

For n entities, this produces up to n*(n-1)/2 relations. With 50+ entities in a chunk, you get 1000+ relations—most of which may be noise.

Consider capping output (e.g., top-k by confidence) or limiting to entities above a confidence threshold. This is a minor concern for typical chunk sizes but could degrade performance on outlier chunks.

🛡️ Optional: Add a cap on relation count

+MAX_COOCCURRENCE_RELATIONS = 50 # Cap to avoid noise explosion + def extract_cooccurrence_relations(entities: list[ExtractedEntity]) -> list[ExtractedRelation]: ... relations: list[ExtractedRelation] = [] seen: set[tuple[str, str]] = set() for i, a in enumerate(entities): for b in entities[i + 1 :]: + if len(relations) >= MAX_COOCCURRENCE_RELATIONS: + break if a.entity_type == b.entity_type: continue ... + if len(relations) >= MAX_COOCCURRENCE_RELATIONS: + break return relations

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/entity_extraction.py` around lines 513 - 520, The nested pairwise loop in entity_extraction.py (the for i, a in enumerate(entities): for b in entities[i + 1 :] loop that builds relations and uses seen) can blow up to O(n^2); before generating pairs filter and/or cap candidates (e.g., filter entities by a confidence threshold field like entity.confidence, or limit to top-K entities by confidence), then produce at most max_relations_per_chunk relations (stop generating once count reaches the cap). Implement this in the code around the loop that builds pair and seen: first create a filtered_entities list (apply threshold or take top-K by confidence), then run the pairwise loop on that list and break early when the relation count reaches max_relations_per_chunk; make max_relations_per_chunk and confidence threshold configurable constants or function parameters.

coderabbitai · 2026-04-02T00:17:39Z

+        for b in entities[i + 1 :]:
+            if a.entity_type == b.entity_type:
+                continue
+            pair = (a.text, b.text) if a.text < b.text else (b.text, a.text)


🧹 Nitpick | 🔵 Trivial

Lexicographic ordering uses raw text casing—may cause inconsistent deduplication.

a.text < b.text compares by Unicode codepoint, so "Apple" vs "apple" yields different orderings. If the same entity appears with different casing from different sources (e.g., "SQLite" vs "sqlite"), they won't dedupe correctly.

Consider normalizing to lowercase for the pair key:

♻️ Proposed fix

- pair = (a.text, b.text) if a.text < b.text else (b.text, a.text) + key_a, key_b = a.text.lower(), b.text.lower() + pair = (key_a, key_b) if key_a < key_b else (key_b, key_a)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pair = (a.text, b.text) if a.text < b.text else (b.text, a.text)

key_a, key_b = a.text.lower(), b.text.lower()

pair = (key_a, key_b) if key_a < key_b else (key_b, key_a)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/entity_extraction.py` at line 517, The current pair key uses raw casing (pair = (a.text, b.text) if a.text < b.text else (b.text, a.text)), which causes inconsistent deduplication for the same entity with different case; change the comparison to use a case-normalized form (use Unicode-aware casefold or lowercase) for ordering (e.g., compare a_key = a.text.casefold() and b_key = b.text.casefold()), but keep the original a.text/b.text values for the stored pair if you need to preserve original casing; update the code that constructs pair in entity_extraction.py to order by the normalized keys instead of the raw texts.

coderabbitai · 2026-04-02T00:17:39Z

+class TestCooccurrenceRelations:
+    """Rule-based relation extraction from co-occurring entities."""
+
+    def test_two_entities_produce_relation(self):
+        """Two entities in the same text should produce a co-occurrence relation."""
+        entities = [
+            ExtractedEntity(text="BrainLayer", entity_type="project", start=0, end=10, confidence=0.9, source="seed"),
+            ExtractedEntity(text="SQLite", entity_type="technology", start=20, end=26, confidence=0.8, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        assert len(relations) >= 1
+        rel = relations[0]
+        assert rel.source_text == "BrainLayer"
+        assert rel.target_text == "SQLite"
+        assert rel.relation_type == "co_occurs_with"
+        assert 0 < rel.confidence <= 1.0
+
+    def test_no_relations_for_single_entity(self):
+        """A single entity can't have co-occurrence relations."""
+        entities = [
+            ExtractedEntity(text="BrainLayer", entity_type="project", start=0, end=10, confidence=0.9, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        assert len(relations) == 0
+
+    def test_no_duplicate_relations(self):
+        """Same entity pair should produce at most one relation."""
+        entities = [
+            ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"),
+            ExtractedEntity(text="Bar", entity_type="technology", start=10, end=13, confidence=0.8, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        pairs = [(r.source_text, r.target_text) for r in relations]
+        assert len(pairs) == len(set(pairs))
+
+    def test_same_type_entities_not_related(self):
+        """Entities of the same type shouldn't get co-occurrence relations (too noisy)."""
+        entities = [
+            ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"),
+            ExtractedEntity(text="Bar", entity_type="project", start=10, end=13, confidence=0.8, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        assert len(relations) == 0


🧹 Nitpick | 🔵 Trivial

Consider adding edge case tests.

Missing coverage for:

Empty entity list (should return empty list)

Three entities of different types (should produce 3 relations)

💚 Additional test cases

def test_empty_entities_returns_empty(self): """Empty input should return empty relations.""" relations = extract_cooccurrence_relations([]) assert relations == [] def test_three_different_types_produce_three_relations(self): """N entities of different types produce N*(N-1)/2 relations.""" entities = [ ExtractedEntity(text="A", entity_type="project", start=0, end=1, confidence=0.9, source="seed"), ExtractedEntity(text="B", entity_type="technology", start=5, end=6, confidence=0.8, source="seed"), ExtractedEntity(text="C", entity_type="person", start=10, end=11, confidence=0.7, source="seed"), ] relations = extract_cooccurrence_relations(entities) assert len(relations) == 3 # A-B, A-C, B-C

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_kg_relations.py` around lines 10 - 52, Add two edge-case tests to TestCooccurrenceRelations: implement test_empty_entities_returns_empty which calls extract_cooccurrence_relations([]) and asserts an empty list (relations == []), and implement test_three_different_types_produce_three_relations which creates three ExtractedEntity instances of distinct entity_type values and asserts len(extract_cooccurrence_relations(entities)) == 3 to verify the N*(N-1)/2 co-occurrence behavior; reference the existing ExtractedEntity and extract_cooccurrence_relations symbols when adding these tests.

coderabbitai · 2026-04-02T00:17:39Z

+    def test_two_entities_produce_relation(self):
+        """Two entities in the same text should produce a co-occurrence relation."""
+        entities = [
+            ExtractedEntity(text="BrainLayer", entity_type="project", start=0, end=10, confidence=0.9, source="seed"),
+            ExtractedEntity(text="SQLite", entity_type="technology", start=20, end=26, confidence=0.8, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        assert len(relations) >= 1
+        rel = relations[0]
+        assert rel.source_text == "BrainLayer"
+        assert rel.target_text == "SQLite"
+        assert rel.relation_type == "co_occurs_with"
+        assert 0 < rel.confidence <= 1.0


🧹 Nitpick | 🔵 Trivial

Test should verify the confidence calculation.

The test checks 0 < rel.confidence <= 1.0 but doesn't verify the expected value min(0.9, 0.8) * 0.7 = 0.56. Adding an exact assertion would catch regressions in the confidence formula.

💚 Proposed enhancement

assert 0 < rel.confidence <= 1.0 + # min(0.9, 0.8) * 0.7 = 0.56 + assert rel.confidence == pytest.approx(0.56)

Add import pytest at the top of the file.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_kg_relations.py` around lines 13 - 25, Update the test_two_entities_produce_relation test to assert the exact confidence value produced by extract_cooccurrence_relations: compute expected_conf = min(0.9, 0.8) * 0.7 and replace the loose check (0 < rel.confidence <= 1.0) with an exact assertion rel.confidence == expected_conf; ensure you reference the ExtractedEntity instances used in the test and the relation object rel returned by extract_cooccurrence_relations and also add import pytest at the top of the test file if the test runner or assertion style requires it.

coderabbitai · 2026-04-02T00:17:39Z

+    def test_no_duplicate_relations(self):
+        """Same entity pair should produce at most one relation."""
+        entities = [
+            ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"),
+            ExtractedEntity(text="Bar", entity_type="technology", start=10, end=13, confidence=0.8, source="seed"),
+        ]
+        relations = extract_cooccurrence_relations(entities)
+        pairs = [(r.source_text, r.target_text) for r in relations]
+        assert len(pairs) == len(set(pairs))


🧹 Nitpick | 🔵 Trivial

Test doesn't fully verify deduplication—only checks uniqueness of returned pairs.

This test passes even if duplicates were never possible in the first place. To properly test deduplication, provide duplicate entity pairs (e.g., same text appearing twice in the input list) and verify only one relation is produced.

💚 Proposed fix to test actual deduplication logic

def test_no_duplicate_relations(self): - """Same entity pair should produce at most one relation.""" + """Duplicate entity pairs in input should produce only one relation.""" entities = [ ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"), ExtractedEntity(text="Bar", entity_type="technology", start=10, end=13, confidence=0.8, source="seed"), + # Duplicate mention of same pair + ExtractedEntity(text="Foo", entity_type="project", start=20, end=23, confidence=0.85, source="gliner"), + ExtractedEntity(text="Bar", entity_type="technology", start=30, end=33, confidence=0.75, source="gliner"), ] relations = extract_cooccurrence_relations(entities) - pairs = [(r.source_text, r.target_text) for r in relations] - assert len(pairs) == len(set(pairs)) + # Should only produce one Foo-Bar relation despite 4 possible pairings + foo_bar_relations = [r for r in relations if {r.source_text, r.target_text} == {"Foo", "Bar"}] + assert len(foo_bar_relations) == 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_no_duplicate_relations(self):

"""Same entity pair should produce at most one relation."""

entities = [

ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"),

ExtractedEntity(text="Bar", entity_type="technology", start=10, end=13, confidence=0.8, source="seed"),

]

relations = extract_cooccurrence_relations(entities)

pairs = [(r.source_text, r.target_text) for r in relations]

assert len(pairs) == len(set(pairs))

def test_no_duplicate_relations(self):

"""Duplicate entity pairs in input should produce only one relation."""

entities = [

ExtractedEntity(text="Foo", entity_type="project", start=0, end=3, confidence=0.9, source="seed"),

ExtractedEntity(text="Bar", entity_type="technology", start=10, end=13, confidence=0.8, source="seed"),

# Duplicate mention of same pair

ExtractedEntity(text="Foo", entity_type="project", start=20, end=23, confidence=0.85, source="gliner"),

ExtractedEntity(text="Bar", entity_type="technology", start=30, end=33, confidence=0.75, source="gliner"),

]

relations = extract_cooccurrence_relations(entities)

# Should only produce one Foo-Bar relation despite 4 possible pairings

foo_bar_relations = [r for r in relations if {r.source_text, r.target_text} == {"Foo", "Bar"}]

assert len(foo_bar_relations) == 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_kg_relations.py` around lines 35 - 43, The test test_no_duplicate_relations doesn't exercise deduplication because it supplies only unique entities; update it to include duplicate entity instances (e.g., two ExtractedEntity objects with the same text/entity_type/start/end or same text/source) so extract_cooccurrence_relations receives repeated entity pairs, then assert that only one relation is produced for that duplicate pair (check length and that pairs == set(pairs) or that count of that specific pair == 1). Locate this logic in the test_no_duplicate_relations function and modify the entities list (using ExtractedEntity) and the assertions accordingly so the deduplication behavior of extract_cooccurrence_relations is actually validated.

greptile-apps Bot reviewed Apr 2, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 2, 2026

View reviewed changes

EtanHey merged commit 0fee1e3 into main Apr 2, 2026
6 checks passed

EtanHey deleted the fix/kg-relations-stub-cleanup branch April 2, 2026 00:18

	pair = (a.text, b.text) if a.text < b.text else (b.text, a.text)
	key_a, key_b = a.text.lower(), b.text.lower()
	pair = (key_a, key_b) if key_a < key_b else (key_b, key_a)

Conversation

EtanHey commented Apr 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Stub tools status

MCP response visibility

Test plan

Add co-occurrence relation extraction for knowledge graph edges

Summary by CodeRabbit

Release Notes

Uh oh!

EtanHey commented Apr 2, 2026

Uh oh!

coderabbitai Bot commented Apr 2, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading