feat: promote concept tags into KG entities by EtanHey · Pull Request #199 · EtanHey/brainlayer

EtanHey · 2026-04-04T15:15:53Z

Summary

add a reusable tag-promotion pipeline that promotes high-frequency non-activity tags from chunk_tags into KG entities with conservative type heuristics
seed the new entity hierarchy types needed for concept-level extraction: topic, protocol, community, health_metric, workflow, and device
add a small CLI script for running tag promotion and regression tests for candidate filtering, type classification, entity creation, and chunk linking

Test plan

python3 -m pytest tests/test_tag_entity_promotion.py -v
python3 -m pytest tests/test_tag_entity_promotion.py tests/test_kg_standard.py -q
python3 -m pytest full suite not captured to completion in this turn

Note

Promote high-frequency concept tags into knowledge graph entities

Adds tag_entity_promotion.py with a pipeline that queries chunk_tags for high-frequency tags, filters out tags already in kg_entities and activity-like tags, then creates new KG entities with deterministic IDs and links them to their source chunks.
Tag entity type is inferred via heuristics mapping tags to person, technology, community, or topic.
Adds a CLI script promote_tag_entities.py with --min-count, --limit, and --dry-run flags that prints promotion statistics as JSON.
Extends VectorStore._init_db to seed new entity type hierarchy entries: topic, protocol, community, health_metric, workflow, and device.
Behavioral Change: DB initialization now inserts additional rows into entity_type_hierarchy on every startup (via INSERT OR IGNORE).

^{Macroscope summarized f2d1a35.}

Summary by CodeRabbit

New Features
- Added command-line tool to automatically promote frequently-used tags into knowledge-graph entities with configurable thresholds
- Extended entity type hierarchy with new types: topic, protocol, health metric, workflow, community, and device
Tests
- Added comprehensive test coverage for tag-to-entity promotion pipeline

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-04-04T15:16:02Z

@coderabbitai review

EtanHey · 2026-04-04T15:16:02Z

@codex review

EtanHey · 2026-04-04T15:16:02Z

@cursor @BugBot review

cursor · 2026-04-04T15:16:05Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-04-04T15:16:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-04T15:16:08Z

📝 Walkthrough

Walkthrough

This change introduces a tag-to-entity promotion pipeline that discovers high-frequency concept tags not yet in the knowledge graph and promotes them into entities. It includes a CLI script, core pipeline logic with tag classification, candidate discovery, and entity creation, updated database schema seeds, and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
CLI Entry Point `scripts/promote_tag_entities.py`	New executable script providing command-line interface for tag promotion with `--min-count`, `--limit`, and `--dry-run` arguments. Initializes VectorStore, invokes promotion pipeline, and outputs JSON statistics.
Promotion Pipeline `src/brainlayer/pipeline/tag_entity_promotion.py`	Core promotion logic: tag classification (person, technology, community, topic), candidate discovery from `chunk_tags` excluding already-promoted tags, and entity creation with deterministic IDs (`auto-tag-<slug>`). Includes conditional `mention_type` column handling based on schema inspection.
Database Schema `src/brainlayer/vector_store.py`	Added entity type hierarchy seed data extending taxonomy with `topic`, `protocol`, `health_metric`, `workflow`, `community`, and `device` types.
Test Coverage `tests/test_tag_entity_promotion.py`	Comprehensive tests validating tag classification mappings, candidate filtering logic, entity creation with correct IDs and types, and kg_entity_chunks link insertion with proper mention_type values.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as promote_tag_entities.py
    participant Store as VectorStore
    participant DB as SQLite DB
    participant Pipeline as tag_entity_promotion

    CLI->>Store: Initialize VectorStore
    Store->>DB: Connect & seed entity_type_hierarchy
    Store-->>CLI: Ready

    CLI->>Pipeline: promote_tag_entities(store, min_count, limit, dry_run)
    
    Pipeline->>Pipeline: find_promotion_candidates(store, min_count, limit)
    Pipeline->>DB: Query chunk_tags for unlinked tags
    DB-->>Pipeline: Tag frequency results
    Pipeline->>Pipeline: Filter & classify each tag
    Pipeline-->>Pipeline: Return candidates with entity_type

    alt not dry_run
        Pipeline->>DB: Check schema for mention_type column
        DB-->>Pipeline: Schema info
        
        loop For each candidate
            Pipeline->>Store: upsert_entity(auto-tag-<slug>)
            Store->>DB: Insert/get kg_entities entry
            DB-->>Store: Entity ID
            Store-->>Pipeline: Entity confirmed
            
            Pipeline->>DB: INSERT kg_entity_chunks with mention_type
            DB-->>Pipeline: Links created count
            Pipeline->>Pipeline: Update stats
        end
    end

    Pipeline-->>CLI: Return promotion stats (JSON)
    CLI->>Store: Close VectorStore
    CLI->>DB: Disconnect
    CLI-->>User: Print stats, exit 0

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: KG rebuild pipeline with audit fixes #67: Shares tag-based KG workflows and kg_entity_chunks link insertion with mention_type handling
feat: KG extraction pipeline — wires entities to KG tables #47: Both PRs create and link knowledge-graph entities into kg_entities and kg_entity_chunks tables

Poem

🐰 Hops through the tags with glee,
Where concepts bloom into entities!
High-frequency friends get their throne,
In graphs they call home, fully grown. 🌱✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: promoting concept tags into KG entities, which is the core functionality added across the new modules and CLI script.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/concept-entity-extraction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector · 2026-04-04T15:16:13Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

EtanHey · 2026-04-04T15:19:12Z

@coderabbitai review

coderabbitai · 2026-04-04T15:19:18Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/promote_tag_entities.py`:
- Around line 19-28: The script opens the canonical DB implicitly which can
accidentally write to the wrong database; add an explicit CLI option (e.g.,
--db-path) to argparse and use that value when constructing
VectorStore(get_db_path()) (or replace get_db_path() with the provided arg), and
require explicit confirmation for non-dry runs (either a --confirm flag or an
interactive prompt) before calling promote_tag_entities and performing writes;
ensure the code path that performs writes checks args.dry_run and the
confirmation flag/response before proceeding so accidental writes are prevented.

In `@src/brainlayer/pipeline/tag_entity_promotion.py`:
- Around line 132-146: The SQL uses raw ct.tag which leads to split counts and
missed joins for casing/whitespace variants; update all queries (the
SELECT/GROUP BY/ORDER BY block that defines query and any later link queries) to
normalize keys using lower(trim(ct.tag)) (alias it e.g. norm_tag) everywhere you
reference ct.tag, use the same normalized expression in the JOIN condition
against lower(e.name), in NOT IN placeholders convert the excluded tags to
lower(trim(...)), and GROUP BY the normalized expression so counts and
subsequent linking (the code that inserts/promotes using ct.tag) operate on the
normalized tag values consistently.
- Around line 181-223: The loop that upserts entities and inserts into
kg_entity_chunks (uses cursor, store.conn, upsert_entity, kg_entity_chunks,
chunk_tags, stats) must be protected with exclusive-write orchestration and
SQLITE_BUSY retries: wrap the entire promotion sequence for all candidates in a
single exclusive write transaction (e.g., acquire a DB-level exclusive lock via
BEGIN EXCLUSIVE or an application-level one-writer mutex) so only one promoter
runs at a time, perform a WAL checkpoint before taking the exclusive lock, and
implement retry/backoff on SQLITE_BUSY (or set busy_timeout) for transient
contention; ensure you commit or rollback the exclusive transaction and release
the lock in a finally block so partial writes don’t occur and stats remain
consistent.

In `@tests/test_tag_entity_promotion.py`:
- Around line 47-57: Update the
test_find_candidates_skips_existing_and_activity_tags test (and similar tests
between lines 60-91) to include mixed-case tag variants (e.g., insert chunks
with ["Telegram", "telegram", "existing-topic"]) using the existing helper
_insert_chunk_with_tags so tags are created with differing cases, then call
find_promotion_candidates(store, min_count=2) and assert that the result
normalizes case by returning a single candidate "telegram" (lowercased) and that
existing entities inserted via store.upsert_entity("existing-topic", "topic",
"existing-topic") are still skipped; this ensures find_promotion_candidates and
the chunk/link logic are case-normalized end-to-end.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d22d7fd4-60c1-48e0-b866-3e437f31b05c

📥 Commits

Reviewing files that changed from the base of the PR and between 58ea11c and abdbbb2.

📒 Files selected for processing (4)

scripts/promote_tag_entities.py
src/brainlayer/pipeline/tag_entity_promotion.py
src/brainlayer/vector_store.py
tests/test_tag_entity_promotion.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Macroscope - Correctness Check
GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: test (3.13)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

Files:

src/brainlayer/vector_store.py
scripts/promote_tag_entities.py
tests/test_tag_entity_promotion.py
src/brainlayer/pipeline/tag_entity_promotion.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use Python/Typer CLI architecture for the main package in src/brainlayer/
All scripts and CLI must use paths.py:get_db_path() for resolving database path instead of hardcoding
Implement chunk lifecycle management with columns superseded_by, aggregated_into, archived_at; default search excludes lifecycle-managed chunks
Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first
Drop FTS triggers before bulk deletes on chunks table and recreate after; batch deletes in 5-10K chunks with checkpoint every 3 batches
Implement retry logic on SQLITE_BUSY errors; each worker must use its own database connection
Use ruff check src/ && ruff format src/ for linting and formatting

Files:

src/brainlayer/vector_store.py
src/brainlayer/pipeline/tag_entity_promotion.py

src/brainlayer/vector_store.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use sqlite-vec storage via APSW in vector_store.py for vector operations

Files:

src/brainlayer/vector_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest for testing

Files:

tests/test_tag_entity_promotion.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Build extraction, classification, chunking, embedding, and indexing pipeline with post-processing for enrichment, brain graph, and Obsidian export

📚 Learning: 2026-03-29T23:19:51.321Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Build extraction, classification, chunking, embedding, and indexing pipeline with post-processing for enrichment, brain graph, and Obsidian export

Applied to files:

src/brainlayer/pipeline/tag_entity_promotion.py

📚 Learning: 2026-03-29T23:19:51.321Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-29T23:19:51.321Z
Learning: Applies to src/brainlayer/**/*.py : Implement chunk lifecycle management with columns `superseded_by`, `aggregated_into`, `archived_at`; default search excludes lifecycle-managed chunks

Applied to files:

src/brainlayer/pipeline/tag_entity_promotion.py

📚 Learning: 2026-03-17T01:04:22.497Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: Applies to src/brainlayer/mcp/**/*.py and brain-bar/Sources/BrainBar/MCPRouter.swift: The 8 required MCP tools are `brain_search`, `brain_store`, `brain_recall`, `brain_entity`, `brain_expand`, `brain_update`, `brain_digest`, `brain_tags`. `brain_tags` is the 8th tool, replacing `brain_get_person`, as defined in the Phase B spec merged in PR `#72`. The Python MCP server already implements `brain_tags`. Legacy `brainlayer_*` aliases must be maintained for backward compatibility.

Applied to files:

src/brainlayer/pipeline/tag_entity_promotion.py

🔇 Additional comments (2)

src/brainlayer/vector_store.py (1)

630-635: Type hierarchy seed additions look consistent with promotion tests.

The new entity_type_hierarchy seed rows are coherent and match the expected child_type -> parent_type assertions used by the new regression test coverage.

tests/test_tag_entity_promotion.py (1)

92-108: Hierarchy seed regression assertion is solid.

This test correctly locks in the new taxonomy edges and protects future seed regressions.

coderabbitai · 2026-04-04T15:20:33Z

+    parser = argparse.ArgumentParser(description="Promote high-frequency chunk tags into KG entities")
+    parser.add_argument("--min-count", type=int, default=500, help="Minimum tagged chunk count to promote")
+    parser.add_argument("--limit", type=int, default=None, help="Optional candidate limit")
+    parser.add_argument("--dry-run", action="store_true", help="Show candidates without writing")
+    args = parser.parse_args()
+
+    store = None
+    try:
+        store = VectorStore(get_db_path())
+        stats = promote_tag_entities(


⚠️ Potential issue | 🟠 Major

Add explicit DB-target and write confirmation for non-dry runs.

Line [27] currently opens the canonical DB implicitly. A mistaken shell/env can write to the wrong database with no guard.

Proposed hardening

def main() -> int: parser = argparse.ArgumentParser(description="Promote high-frequency chunk tags into KG entities") + parser.add_argument("--db-path", type=Path, default=None, help="Override DB path (defaults to get_db_path())") parser.add_argument("--min-count", type=int, default=500, help="Minimum tagged chunk count to promote") parser.add_argument("--limit", type=int, default=None, help="Optional candidate limit") parser.add_argument("--dry-run", action="store_true", help="Show candidates without writing") + parser.add_argument("--yes", action="store_true", help="Confirm writes when not using --dry-run") args = parser.parse_args() + if not args.dry_run and not args.yes: + parser.error("Refusing to write without --yes (or use --dry-run).") + + db_path = args.db_path or get_db_path() store = None try: - store = VectorStore(get_db_path()) + store = VectorStore(db_path) stats = promote_tag_entities( store, min_count=args.min_count, limit=args.limit, dry_run=args.dry_run, )

Also applies to: 34-35

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/promote_tag_entities.py` around lines 19 - 28, The script opens the canonical DB implicitly which can accidentally write to the wrong database; add an explicit CLI option (e.g., --db-path) to argparse and use that value when constructing VectorStore(get_db_path()) (or replace get_db_path() with the provided arg), and require explicit confirmation for non-dry runs (either a --confirm flag or an interactive prompt) before calling promote_tag_entities and performing writes; ensure the code path that performs writes checks args.dry_run and the confirmation flag/response before proceeding so accidental writes are prevented.

coderabbitai · 2026-04-04T15:20:33Z

+    query = f"""
+        SELECT ct.tag, COUNT(*) as cnt
+        FROM chunk_tags ct
+        LEFT JOIN kg_entities e ON lower(e.name) = lower(ct.tag)
+        WHERE e.id IS NULL
+          AND ct.tag IS NOT NULL
+          AND ct.tag != ''
+          AND ct.tag NOT LIKE 'act:%'
+          AND ct.tag NOT LIKE 'dom:%'
+          AND ct.tag NOT LIKE 'meta/%'
+          AND lower(ct.tag) NOT IN ({placeholders})
+        GROUP BY ct.tag
+        HAVING COUNT(*) >= ?
+        ORDER BY cnt DESC, ct.tag ASC
+    """


⚠️ Potential issue | 🟠 Major

Normalize tag keys in SQL to prevent split counts and missed links.

Line [133]-Line [146] aggregates by raw ct.tag, and Line [209]/Line [219] links by exact tag equality. Mixed-case/whitespace variants can be undercounted or unlinked.

Proposed normalization fix

- query = f""" - SELECT ct.tag, COUNT(*) as cnt + query = f""" + SELECT lower(trim(ct.tag)) AS normalized_tag, COUNT(*) as cnt FROM chunk_tags ct - LEFT JOIN kg_entities e ON lower(e.name) = lower(ct.tag) + LEFT JOIN kg_entities e ON lower(e.name) = lower(trim(ct.tag)) WHERE e.id IS NULL AND ct.tag IS NOT NULL AND ct.tag != '' AND ct.tag NOT LIKE 'act:%' AND ct.tag NOT LIKE 'dom:%' AND ct.tag NOT LIKE 'meta/%' - AND lower(ct.tag) NOT IN ({placeholders}) - GROUP BY ct.tag + AND lower(trim(ct.tag)) NOT IN ({placeholders}) + GROUP BY normalized_tag HAVING COUNT(*) >= ? - ORDER BY cnt DESC, ct.tag ASC + ORDER BY cnt DESC, normalized_tag ASC """ @@ - "tag": row[0], + "tag": row[0], "count": row[1], "entity_type": classify_tag_entity_type(row[0]), @@ - WHERE ct.tag = ? + WHERE lower(trim(ct.tag)) = ? """, - (entity_id, tag), + (entity_id, tag.lower().strip()), ) @@ - WHERE ct.tag = ? + WHERE lower(trim(ct.tag)) = ? """, - (entity_id, tag), + (entity_id, tag.lower().strip()), )

Also applies to: 153-161, 203-222

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/tag_entity_promotion.py` around lines 132 - 146, The SQL uses raw ct.tag which leads to split counts and missed joins for casing/whitespace variants; update all queries (the SELECT/GROUP BY/ORDER BY block that defines query and any later link queries) to normalize keys using lower(trim(ct.tag)) (alias it e.g. norm_tag) everywhere you reference ct.tag, use the same normalized expression in the JOIN condition against lower(e.name), in NOT IN placeholders convert the excluded tags to lower(trim(...)), and GROUP BY the normalized expression so counts and subsequent linking (the code that inserts/promotes using ct.tag) operate on the normalized tag values consistently.

coderabbitai · 2026-04-04T15:20:33Z

+    cursor = store.conn.cursor()
+    kg_entity_chunk_cols = {row[1] for row in cursor.execute("PRAGMA table_info(kg_entity_chunks)")}
+    has_mention_type = "mention_type" in kg_entity_chunk_cols
+
+    for candidate in candidates:
+        tag = candidate["tag"]
+        entity_type = candidate["entity_type"]
+        entity_id = f"auto-tag-{_slugify_tag(tag)}"
+        existing = store.get_entity_by_name(entity_type, tag)
+        if existing is None:
+            store.upsert_entity(
+                entity_id,
+                entity_type,
+                tag,
+                metadata={"source": "tag-promotion", "tag_count": candidate["count"]},
+                confidence=0.8,
+                importance=0.6,
+            )
+            stats["entities_created"] += 1
+        else:
+            entity_id = existing["id"]
+
+        if has_mention_type:
+            cursor.execute(
+                """
+                INSERT OR IGNORE INTO kg_entity_chunks (entity_id, chunk_id, relevance, context, mention_type)
+                SELECT ?, ct.chunk_id, 0.8, 'tag-promotion', 'tag'
+                FROM chunk_tags ct
+                WHERE ct.tag = ?
+                """,
+                (entity_id, tag),
+            )
+        else:
+            cursor.execute(
+                """
+                INSERT OR IGNORE INTO kg_entity_chunks (entity_id, chunk_id, relevance, context)
+                SELECT ?, ct.chunk_id, 0.8, 'tag-promotion'
+                FROM chunk_tags ct
+                WHERE ct.tag = ?
+                """,
+                (entity_id, tag),
+            )
+        stats["links_created"] += store.conn.changes()


⚠️ Potential issue | 🟠 Major

Protect promotion writes with exclusive-write orchestration and BUSY retries.

This function performs bulk writes (kg_entities, kg_entity_chunks) without explicit one-writer coordination. Under concurrent enrichment/MCP writes, this can fail partially or contend heavily.

As per coding guidelines: "Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work" and "Never run bulk database operations while enrichment workers are writing; always stop workers and checkpoint WAL first."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/pipeline/tag_entity_promotion.py` around lines 181 - 223, The loop that upserts entities and inserts into kg_entity_chunks (uses cursor, store.conn, upsert_entity, kg_entity_chunks, chunk_tags, stats) must be protected with exclusive-write orchestration and SQLITE_BUSY retries: wrap the entire promotion sequence for all candidates in a single exclusive write transaction (e.g., acquire a DB-level exclusive lock via BEGIN EXCLUSIVE or an application-level one-writer mutex) so only one promoter runs at a time, perform a WAL checkpoint before taking the exclusive lock, and implement retry/backoff on SQLITE_BUSY (or set busy_timeout) for transient contention; ensure you commit or rollback the exclusive transaction and release the lock in a finally block so partial writes don’t occur and stats remain consistent.

coderabbitai · 2026-04-04T15:20:33Z

+    def test_find_candidates_skips_existing_and_activity_tags(self, store):
+        from brainlayer.pipeline.tag_entity_promotion import find_promotion_candidates
+
+        _insert_chunk_with_tags(store, "chunk-1", ["telegram", "debugging", "existing-topic"])
+        _insert_chunk_with_tags(store, "chunk-2", ["telegram", "debugging", "existing-topic"])
+        store.upsert_entity("existing-topic", "topic", "existing-topic")
+
+        candidates = find_promotion_candidates(store, min_count=2)
+
+        assert [candidate["tag"] for candidate in candidates] == ["telegram"]
+


🧹 Nitpick | 🔵 Trivial

Add a mixed-case tag regression test.

Current scenarios only use lowercase tags. Please add coverage for variants like ["Telegram", "telegram"] to ensure candidate counting and chunk linking are case-normalized end-to-end.

Also applies to: 60-91

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/test_tag_entity_promotion.py` around lines 47 - 57, Update the test_find_candidates_skips_existing_and_activity_tags test (and similar tests between lines 60-91) to include mixed-case tag variants (e.g., insert chunks with ["Telegram", "telegram", "existing-topic"]) using the existing helper _insert_chunk_with_tags so tags are created with differing cases, then call find_promotion_candidates(store, min_count=2) and assert that the result normalizes case by returning a single candidate "telegram" (lowercased) and that existing entities inserted via store.upsert_entity("existing-topic", "topic", "existing-topic") are still skipped; this ensures find_promotion_candidates and the chunk/link logic are case-normalized end-to-end.

feat: promote high-signal tags into entities

abdbbb2

greptile-apps Bot reviewed Apr 4, 2026

View reviewed changes

macroscopeapp Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread src/brainlayer/pipeline/tag_entity_promotion.py Outdated

coderabbitai Bot reviewed Apr 4, 2026

View reviewed changes

fix: remove dead hebrew marker

f2d1a35

EtanHey merged commit 4d7573d into main Apr 4, 2026
5 of 6 checks passed

coderabbitai Bot mentioned this pull request Apr 5, 2026

fix: sentence-boundary shortening + telemetry + temporal dispersion (R75) #210

Merged

Conversation

EtanHey commented Apr 4, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Promote high-frequency concept tags into knowledge graph entities

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented Apr 4, 2026

Uh oh!

EtanHey commented Apr 4, 2026

Uh oh!

EtanHey commented Apr 4, 2026

Uh oh!

cursor Bot commented Apr 4, 2026

Uh oh!

coderabbitai Bot commented Apr 4, 2026

Uh oh!

coderabbitai Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot commented Apr 4, 2026

Uh oh!

Uh oh!

EtanHey commented Apr 4, 2026

Uh oh!

coderabbitai Bot commented Apr 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EtanHey commented Apr 4, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented Apr 4, 2026 •

edited

Loading