Skip to content

fix: watch nested JSONL transcripts recursively#236

Merged
EtanHey merged 4 commits intomainfrom
feat/pr-a5-enrichment-watcher-globs
Apr 12, 2026
Merged

fix: watch nested JSONL transcripts recursively#236
EtanHey merged 4 commits intomainfrom
feat/pr-a5-enrichment-watcher-globs

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 11, 2026

Summary

  • recursively discover watcher JSONL files so nested session artifacts like subagents/*.jsonl and sibling nested JSONL files are picked up
  • anchor project extraction on the directory immediately under projects/ so nested files still map back to the owning repo
  • preserve full encoded repo names like brainlayer-grill instead of collapsing them to grill
  • add RED/GREEN tests proving nested watcher discovery, canonical project attribution, and startup backfill of pre-existing nested files

Why

  • the watcher previously only scanned one level deep, so real conversation content under nested session folders never got ingested
  • nested source paths were also mis-attributed because the bridge normalized from the immediate parent directory rather than the project root
  • this is the concrete code-side fix for the scanner finding referenced in the collab about subagent JSONLs and brainlayer-grill self-ingestion

Test plan

  • pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py -q
  • ruff check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.py
  • ruff format --check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.py

CodeRabbit note

  • I attempted local cr review --plain twice before push, but CodeRabbit returned a rate-limit error both times (try after 22 minutes, then try after 16 minutes 41 seconds on retry). I am explicitly not claiming a local CR pass here; please use the GitHub CodeRabbit review on this PR as the review source.

Context

  • Collab: /Users/etanheyman/Gits/orchestrator/collab/brainlayer-a5-a6-flex-restart-bundle.md

Note

Fix JSONLWatcher to discover nested JSONL transcripts recursively

  • Changes _discover_jsonl_files in watcher.py to use rglob('*.jsonl') instead of iterating only immediate children, so session subdirectories (e.g. subagents/) are included.
  • Fixes _extract_project_from_source and _normalize_project_name in watcher_bridge.py to correctly resolve project names from nested paths and encoded directory names (e.g. -Users-...-Gits-<repo>-...).
  • Moves Gemini service tier from a top-level service_tier field to http_options.extra_body.serviceTier in enrichment_controller.py to align with the SDK's expected config shape.

Macroscope summarized 1368b80.

Summary by CodeRabbit

  • New Features

    • Watcher now recursively discovers JSONL files in nested subdirectories.
  • Bug Fixes

    • Improved project name extraction for a wider range of encoded path formats.
    • Gemini generation settings now surface via HTTP options (includes service tier and timeout behavior).
  • Tests

    • Added and updated integration/unit tests covering nested discovery, project extraction, and Gemini config behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

📝 Walkthrough

Walkthrough

Moved Gemini service-tier into http_options.extra_body.serviceTier; added a helper to build Gemini HTTP options with optional timeout; made JSONL discovery recursive to include nested subagents; improved project-name extraction for encoded path formats and added related tests.

Changes

Cohort / File(s) Summary
Gemini Configuration Refactor
src/brainlayer/enrichment_controller.py, tests/test_enrichment_controller.py
Relocated service_tier into http_options.extra_body.serviceTier; added `_build_gemini_http_options(timeout_ms: int
JSONL Discovery (recursive)
src/brainlayer/watcher.py, tests/test_jsonl_watcher.py
Changed _discover_jsonl_files() to use project_dir.rglob("*.jsonl") to include nested .jsonl files; added integration test asserting nested subagent discovery.
Project Name Extraction & Normalization
src/brainlayer/watcher_bridge.py, tests/test_watcher_bridge.py
Enhanced _normalize_project_name() to parse encoded paths (e.g., starting with -Users-/-home-) by locating marker segments and joining following parts; _extract_project_from_source() adjusted to prefer .../projects/... segments and simplified fallback. Added tests for encoded and nested subagent paths.
Test Double and Test Updates
tests/test_enrichment_flex_integration.py, tests/test_watcher_bridge.py, tests/test_jsonl_watcher.py
Made FakeClient generate_content timestamp logging conditional for a specific message; updated/added tests to assert new Gemini config structure, nested JSONL discovery, and project extraction behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hopped through folders wide and deep,
I nudged the tiers where secrets sleep,
I sniffed the nested jsonl trail,
I stitched the name from encoded trail,
A happy rabbit, code to keep!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: enabling recursive discovery of nested JSONL files in the watcher, which is the primary objective across multiple modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/pr-a5-enrichment-watcher-globs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

@coderabbitai review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

@cursor @BugBot review

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

@coderabbitai Nested Claude transcript artifacts under session folders (especially subagents/*.jsonl) belong to the project directory immediately under projects/, not to the immediate parent folder. Treat the project root segment as the canonical owner for watcher ingestion, and preserve full repo names like brainlayer-grill rather than collapsing them to the last dash-separated token. Please learn this design for future reviews on watcher paths.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

@codex review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 12, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Tip

For best results, initiate chat on the files or code changes.

@EtanHey, understood — I've recorded this design for future watcher path reviews.


✏️ Learnings added
Learnt from: EtanHey
URL: https://github.com/EtanHey/brainlayer/pull/236

Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: EtanHey
URL: https://github.com/EtanHey/brainlayer/pull/236

Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

Merged current main into this branch in commit 4b59737 so PR #236 is no longer behind PR-A6. Re-verified on the updated branch: pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q -> 133 passed, 2 warnings; ruff check and ruff format --check are clean. @coderabbitai review @codex review @cursor @BugBot re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

@EtanHey, thanks for the update! Glad to hear the merge from main went smoothly and that all 133 tests are passing with clean linting. Triggering a fresh review now.


🧠 Learnings used
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b59737e47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +164 to +168
if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
name = "-".join(repo_parts) if repo_parts else raw
else:
name = raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fall back to repo tail when path markers are missing

In _normalize_project_name, encoded names that start with -Users-/-home- but do not contain one of the hard-coded markers now fall back to name = raw. For common Claude paths like -home-alice-code-myrepo, this stores the full encoded path as the project instead of myrepo, which regresses prior behavior and causes project attribution/filtering mismatches downstream (records are written under the encoded string, while callers typically filter by normalized repo name).

Useful? React with 👍 / 👎.

Comment on lines +185 to +186
if "projects" in parts:
project_index = parts.index("projects") + 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve project root from the last projects path segment

_extract_project_from_source uses parts.index("projects"), which always picks the first occurrence in the absolute path. In environments where another projects directory appears earlier (for example home paths rooted under /projects/...), this returns the wrong segment as the project root and misattributes ingested chunks, breaking project-scoped retrieval.

Useful? React with 👍 / 👎.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

Follow-up fix pushed in 560b9f26 after the branch surfaced a real SDK compatibility issue from the merged A6 work. google-genai==1.63.0 does not accept top-level service_tier on GenerateContentConfig; the supported transport is http_options.extra_body.serviceTier. This commit switches realtime + extraction to that SDK-compatible shape and adds a validation test against the real SDK model class.

Also hardened tests/test_enrichment_flex_integration.py::test_sustained_rate_no_contention to count only the 100 prompts owned by the test. Full-suite order was picking up unrelated background auto-enrich calls from other tests; that noise was inflating the raw call counter without changing the workload this test is meant to measure.

Verification on this branch after the fix:

  • pytest tests/test_enrichment_controller.py -q -k "service_tier or validates_against_sdk" -> 3 passed
  • pytest tests/test_enrichment_flex_integration.py::test_sustained_rate_no_contention -q -> 1 passed
  • pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q -> 134 passed, 2 warnings
  • pytest tests/ -v --tb=short -m "not integration" -x now gets past the previous enrichment blockers and stops at tests/test_eval_baselines.py::TestMemoryRetrieval::test_whoop_discussion_findable

That WHOOP eval failure is already red on local main at cc39f38f as well, so it is baseline repo debt rather than a regression from this branch. @coderabbitai review @codex review @cursor @BugBot re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

@EtanHey, thanks for the detailed update. A few observations on commit 560b9f26:

SDK compatibility fix — confirmed correct. The GenerateContentConfig class in google-genai does not expose service_tier as a top-level field. The SDK's HttpOptions supports an extra_body field for adding extra parameters to the request body, where "the structure must match the backend API's request structure" — so routing serviceTier through http_options.extra_body.serviceTier is the right transport path for this SDK version.

Test hardening — isolating test_sustained_rate_no_contention to count only its own 100 prompts rather than picking up background auto-enrich calls from other tests is the correct fix for that flakiness vector.

WHOOP eval — noted that test_whoop_discussion_findable is already red on main at cc39f38f, making it repo debt rather than a regression introduced by this branch.

Triggering a fresh incremental review now.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/watcher_bridge.py`:
- Around line 185-188: The current logic in watcher_bridge.py uses
parts.index("projects") which picks the first "projects" token and can
mis-attribute nested paths; change it to locate the last "projects" segment
(e.g., find the last index of "projects" in parts) and then use that index+1 to
pass into _normalize_project_name so ownership resolves to the canonical project
directory under the final "projects" segment.
- Around line 156-168: The parsing block that splits raw by "-" (variables raw,
parts, markers, last_marker_idx, repo_parts, name) mistakenly chooses the last
occurrence of a structural marker, which can pick up tokens inside a
dash-separated repo name; change the logic to locate the first structural marker
after the "-Users-" / "-home-" prefix (stop at the first match instead of
recording the last), then build repo_parts from parts[last_marker_idx+1:] as
before so the full encoded directory name (all remaining dash-separated tokens)
is preserved as the canonical repo identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c6d6edff-d80c-4a6a-8a61-48290710c135

📥 Commits

Reviewing files that changed from the base of the PR and between cc39f38 and 560b9f2.

📒 Files selected for processing (7)
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/watcher.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_enrichment_controller.py
  • tests/test_enrichment_flex_integration.py
  • tests/test_jsonl_watcher.py
  • tests/test_watcher_bridge.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

  • tests/test_enrichment_flex_integration.py
  • src/brainlayer/watcher.py
  • tests/test_enrichment_controller.py
  • src/brainlayer/enrichment_controller.py
  • tests/test_jsonl_watcher.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_watcher_bridge.py
src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

  • src/brainlayer/watcher.py
  • src/brainlayer/enrichment_controller.py
  • src/brainlayer/watcher_bridge.py
src/brainlayer/watcher.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/watcher.py: Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip
Persist watcher offsets in ~/.local/share/brainlayer/offsets.json; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Send Axiom telemetry (startup, flush, error, heartbeat at 60s intervals) to brainlayer-watcher dataset

Files:

  • src/brainlayer/watcher.py
🧠 Learnings (13)
📓 Common learnings
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.
📚 Learning: 2026-04-12T00:00:23.977Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Applied to files:

  • src/brainlayer/watcher.py
  • tests/test_jsonl_watcher.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_watcher_bridge.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Applied to files:

  • src/brainlayer/watcher.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Applied to files:

  • src/brainlayer/watcher.py
  • src/brainlayer/watcher_bridge.py
  • tests/test_watcher_bridge.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Applied to files:

  • src/brainlayer/watcher.py
📚 Learning: 2026-04-11T23:47:49.746Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.

Applied to files:

  • tests/test_enrichment_controller.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

  • tests/test_enrichment_controller.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-11T16:54:45.605Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Applied to files:

  • tests/test_enrichment_controller.py
  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-11T16:54:45.605Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py` and `src/brainlayer/pipeline/rate_limiter.py`: Gemini API calls in the enrichment pipeline are gated by a token bucket rate limiter. The rate is controlled by `BRAINLAYER_ENRICH_RATE` (default `5/s`, burst `10`) to keep throughput inside the Gemini Flex intended envelope. This default supersedes the earlier 0.2 (12 RPM) default for the Gemini Flex integration path.

Applied to files:

  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via `enrichment_controller.py`, and Ollama as offline last-resort; allow override via `BRAINLAYER_ENRICH_BACKEND` env var

Applied to files:

  • src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-12T00:00:23.977Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Applied to files:

  • src/brainlayer/watcher_bridge.py
  • tests/test_watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Applied to files:

  • src/brainlayer/watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher.py : Watcher uses Axiom telemetry for startup, flush, error, and heartbeat (60s) events to `brainlayer-watcher` dataset

Applied to files:

  • tests/test_watcher_bridge.py
🔇 Additional comments (10)
src/brainlayer/watcher.py (1)

287-299: Recursive JSONL discovery looks correct for nested session artifacts.

rglob("*.jsonl") with the file guard cleanly captures nested transcripts (e.g., subagents/*.jsonl) without changing tailing behavior.

Based on learnings: nested session artifacts belong to the canonical project directory and must be discovered recursively.

tests/test_watcher_bridge.py (2)

119-136: Project extraction test updates are aligned with the new canonical naming behavior.

These assertions correctly lock in both pass-through simple names and nested-source extraction to brainlayer-grill.

Based on learnings: preserve full encoded repo names and attribute nested artifacts to the directory directly under projects.


300-331: Great end-to-end regression coverage for nested startup backfill.

This test validates the actual watcher→bridge→DB path for nested subagent files and confirms canonical project attribution.

Based on learnings: nested transcript files under session folders must map to the canonical project root under projects.

tests/test_jsonl_watcher.py (1)

236-253: Nice targeted test for recursive nested JSONL discovery.

This is a solid regression guard for the rglob behavior change.

Based on learnings: watcher discovery must include nested session artifacts such as subagents/*.jsonl.

tests/test_enrichment_flex_integration.py (1)

37-40: Scoped timestamp capture is a good stabilization for this rate test.

Limiting call_times updates to the targeted prompt content reduces noise from unrelated generate_content calls while keeping the concurrency-safe lock path intact.

tests/test_enrichment_controller.py (2)

144-144: Assertion update correctly tracks the new Gemini config shape.

The check now matches the nested http_options.extra_body.serviceTier structure used by the controller.


489-503: SDK-backed schema validation test is a strong regression guard.

Validating with GenerateContentConfig.model_validate(...) helps catch future config-shape drift early.

src/brainlayer/enrichment_controller.py (3)

301-307: Nice centralization of Gemini HTTP option construction.

Routing config through _build_gemini_http_options() reduces duplicated wiring and keeps service-tier behavior consistent.


347-347: Good reuse of the shared HTTP-options helper in extraction path.

Using the same builder at this call site keeps service-tier and timeout wiring consistent with the rest of Gemini integrations.


314-320: No changes needed. The request-level http_options are safe: extra_body and timeout do not override the client-level retry behavior (which is preserved at attempts=1 per the SDK's per-request retry semantics), and timeout_ms units are milliseconds as expected by the SDK.

Comment on lines +156 to +168
if raw.startswith("-Users-") or raw.startswith("-home-"):
parts = raw.split("-")
markers = {"Gits", "Desktop", "projects", "config"}
last_marker_idx = -1
for i, part in enumerate(parts):
if part in markers:
last_marker_idx = i

if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
name = "-".join(repo_parts) if repo_parts else raw
else:
name = raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent repo-name truncation when marker tokens appear inside the repo name.

Line 158–Line 166 can mis-parse names like ...-Gits-my-config-service into just service because config is treated as a structural marker. That breaks canonical project attribution.

💡 Suggested fix
-    if raw.startswith("-Users-") or raw.startswith("-home-"):
-        parts = raw.split("-")
-        markers = {"Gits", "Desktop", "projects", "config"}
-        last_marker_idx = -1
-        for i, part in enumerate(parts):
-            if part in markers:
-                last_marker_idx = i
-
-        if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
-            repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
-            name = "-".join(repo_parts) if repo_parts else raw
-        else:
-            name = raw
+    if raw.startswith("-Users-") or raw.startswith("-home-"):
+        decoded = None
+        if "-Gits-" in raw:
+            decoded = raw.rsplit("-Gits-", 1)[1].lstrip("-")
+        elif "-projects-" in raw:
+            decoded = raw.rsplit("-projects-", 1)[1].lstrip("-")
+        elif "-config-" in raw:
+            decoded = raw.rsplit("-config-", 1)[1].lstrip("-")
+        name = decoded or raw
     else:
         name = raw

Based on learnings: preserve the full encoded directory name as the canonical repo identifier, without collapsing dash-separated tokens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/watcher_bridge.py` around lines 156 - 168, The parsing block
that splits raw by "-" (variables raw, parts, markers, last_marker_idx,
repo_parts, name) mistakenly chooses the last occurrence of a structural marker,
which can pick up tokens inside a dash-separated repo name; change the logic to
locate the first structural marker after the "-Users-" / "-home-" prefix (stop
at the first match instead of recording the last), then build repo_parts from
parts[last_marker_idx+1:] as before so the full encoded directory name (all
remaining dash-separated tokens) is preserved as the canonical repo identifier.

Comment on lines +185 to +188
if "projects" in parts:
project_index = parts.index("projects") + 1
if project_index < len(parts):
return _normalize_project_name(parts[project_index])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the last projects segment to avoid mis-attribution on multi-projects paths.

Line 186 currently anchors on the first projects token, which can resolve the wrong owner when multiple projects segments exist in a path.

💡 Suggested fix
-    if "projects" in parts:
-        project_index = parts.index("projects") + 1
+    project_positions = [i for i, segment in enumerate(parts) if segment == "projects"]
+    if project_positions:
+        project_index = project_positions[-1] + 1
         if project_index < len(parts):
             return _normalize_project_name(parts[project_index])

Based on learnings: nested transcript artifacts must resolve ownership to the canonical project directory directly under projects.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/brainlayer/watcher_bridge.py` around lines 185 - 188, The current logic
in watcher_bridge.py uses parts.index("projects") which picks the first
"projects" token and can mis-attribute nested paths; change it to locate the
last "projects" segment (e.g., find the last index of "projects" in parts) and
then use that index+1 to pass into _normalize_project_name so ownership resolves
to the canonical project directory under the final "projects" segment.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 560b9f265a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +164 to +168
if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
name = "-".join(repo_parts) if repo_parts else raw
else:
name = raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve basename when marker-based decode misses

When an encoded project folder starts with -Users-/-home- but does not contain one of the hard-coded markers (Gits, Desktop, projects, config), this branch falls back to the full encoded token (for example -home-alice-repo) instead of the repo basename (repo). That is a regression from the previous behavior and causes watcher-ingested rows to be written under non-canonical project names, which then breaks project-scoped retrieval for those environments because search/scoping expects the basename.

Useful? React with 👍 / 👎.

Comment on lines +185 to +187
if "projects" in parts:
project_index = parts.index("projects") + 1
if project_index < len(parts):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve project from the last projects path segment

Using parts.index("projects") selects the first projects segment in the absolute path. If the configured watcher source lives under another directory named projects, project extraction can pick the wrong segment (e.g. .claude) and mis-attribute all ingested chunks. This is a real regression for custom source paths because top-level files that previously mapped via parent directory now get an incorrect project label.

Useful? React with 👍 / 👎.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 12, 2026

CI follow-up on 1368b802:

  • The new real-SDK validation test was correct locally, but the GitHub unit matrix does not install the optional cloud dependency in every environment.
  • The failing 3.13 job raised ModuleNotFoundError: No module named "google".
  • I updated tests/test_enrichment_controller.py to use pytest.importorskip("google.genai"), which preserves the SDK-shape assertion when the package is present and cleanly skips it when the repo is running without the cloud extra.
  • Fresh local verification on this exact patch:
    • pytest tests/test_enrichment_controller.py -q -k "service_tier or validates_against_sdk"3 passed, 70 deselected, 2 warnings
    • pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q134 passed, 2 warnings
    • ruff check tests/test_enrichment_controller.py → clean
    • ruff format --check tests/test_enrichment_controller.py → clean
    • cr review --plain on the staged diff → Review completed: No findings ✔

@coderabbitai review
@codex review
@cursor @BugBot re-review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

@EtanHey, the pytest.importorskip("google.genai") fix is exactly the right approach here — it's the canonical pytest pattern for optional dependencies, keeping the SDK-shape assertion active when the google-genai package is installed and producing a clean skip (not a failure) in environments that don't include the cloud extra. No logic changes, minimal diff, and all 134 tests + linting remain clean. Triggering a fresh incremental review now.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1368b8022b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +164 to +168
if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
name = "-".join(repo_parts) if repo_parts else raw
else:
name = raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle encoded paths without marker segments

Restore a fallback when no marker is found in _normalize_project_name. With the new logic, encoded Claude folder names such as -Users-alice-Code-my-repo or -home-bob-src-service-api now return the full encoded token instead of the repo name, because only Gits/Desktop/projects/config are recognized. In environments where repos are not under those exact directory names, watcher-ingested rows get mis-grouped project values, which breaks project-scoped search and recall consistency.

Useful? React with 👍 / 👎.

Comment on lines +185 to +187
if "projects" in parts:
project_index = parts.index("projects") + 1
if project_index < len(parts):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Select the correct projects segment in source paths

Use the last relevant projects segment (or anchor on .claude/projects) when deriving the project root. parts.index("projects") picks the first match, so paths like /projects/alice/.claude/projects/-Users-alice-Gits-brainlayer/... resolve to alice instead of the encoded repo directory. This causes systematic project mis-attribution for users whose home/workspace path already includes a projects directory.

Useful? React with 👍 / 👎.

@EtanHey EtanHey merged commit cda100e into main Apr 12, 2026
6 checks passed
@EtanHey EtanHey deleted the feat/pr-a5-enrichment-watcher-globs branch April 12, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant