fix: watch nested JSONL transcripts recursively by EtanHey · Pull Request #236 · EtanHey/brainlayer

EtanHey · 2026-04-11T23:59:51Z

Summary

recursively discover watcher JSONL files so nested session artifacts like subagents/*.jsonl and sibling nested JSONL files are picked up
anchor project extraction on the directory immediately under projects/ so nested files still map back to the owning repo
preserve full encoded repo names like brainlayer-grill instead of collapsing them to grill
add RED/GREEN tests proving nested watcher discovery, canonical project attribution, and startup backfill of pre-existing nested files

Why

the watcher previously only scanned one level deep, so real conversation content under nested session folders never got ingested
nested source paths were also mis-attributed because the bridge normalized from the immediate parent directory rather than the project root
this is the concrete code-side fix for the scanner finding referenced in the collab about subagent JSONLs and brainlayer-grill self-ingestion

Test plan

pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py -q
ruff check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.py
ruff format --check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.py

CodeRabbit note

I attempted local cr review --plain twice before push, but CodeRabbit returned a rate-limit error both times (try after 22 minutes, then try after 16 minutes 41 seconds on retry). I am explicitly not claiming a local CR pass here; please use the GitHub CodeRabbit review on this PR as the review source.

Context

Collab: /Users/etanheyman/Gits/orchestrator/collab/brainlayer-a5-a6-flex-restart-bundle.md

Note

Fix `JSONLWatcher` to discover nested JSONL transcripts recursively

Changes _discover_jsonl_files in watcher.py to use rglob('*.jsonl') instead of iterating only immediate children, so session subdirectories (e.g. subagents/) are included.
Fixes _extract_project_from_source and _normalize_project_name in watcher_bridge.py to correctly resolve project names from nested paths and encoded directory names (e.g. -Users-...-Gits-<repo>-...).
Moves Gemini service tier from a top-level service_tier field to http_options.extra_body.serviceTier in enrichment_controller.py to align with the SDK's expected config shape.

^{Macroscope summarized 1368b80.}

Summary by CodeRabbit

New Features
- Watcher now recursively discovers JSONL files in nested subdirectories.
Bug Fixes
- Improved project name extraction for a wider range of encoded path formats.
- Gemini generation settings now surface via HTTP options (includes service tier and timeout behavior).
Tests
- Added and updated integration/unit tests covering nested discovery, project extraction, and Gemini config behavior.

coderabbitai · 2026-04-11T23:59:58Z

📝 Walkthrough

Walkthrough

Moved Gemini service-tier into http_options.extra_body.serviceTier; added a helper to build Gemini HTTP options with optional timeout; made JSONL discovery recursive to include nested subagents; improved project-name extraction for encoded path formats and added related tests.

Changes

Cohort / File(s)	Summary
Gemini Configuration Refactor `src/brainlayer/enrichment_controller.py`, `tests/test_enrichment_controller.py`	Relocated `service_tier` into `http_options.extra_body.serviceTier`; added `_build_gemini_http_options(timeout_ms: int
JSONL Discovery (recursive) `src/brainlayer/watcher.py`, `tests/test_jsonl_watcher.py`	Changed `_discover_jsonl_files()` to use `project_dir.rglob("*.jsonl")` to include nested `.jsonl` files; added integration test asserting nested subagent discovery.
Project Name Extraction & Normalization `src/brainlayer/watcher_bridge.py`, `tests/test_watcher_bridge.py`	Enhanced `_normalize_project_name()` to parse encoded paths (e.g., starting with `-Users-`/`-home-`) by locating marker segments and joining following parts; `_extract_project_from_source()` adjusted to prefer `.../projects/...` segments and simplified fallback. Added tests for encoded and nested subagent paths.
Test Double and Test Updates `tests/test_enrichment_flex_integration.py`, `tests/test_watcher_bridge.py`, `tests/test_jsonl_watcher.py`	Made FakeClient generate_content timestamp logging conditional for a specific message; updated/added tests to assert new Gemini config structure, nested JSONL discovery, and project extraction behaviors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

feat: auto-enrich on brain_store via Gemini Flash Lite #115 — Also modifies enrichment_controller.py Gemini invocation and aligns config usage with the new _build_gemini_config() pattern.
feat: enable Gemini Flex tier for enrichment #235 — Adjusts how service_tier is supplied to Gemini calls; related to moving tier settings between top-level and http_options.
feat: LLM-powered entity extraction with gleaning (R68 Round 1) #188 — Touches Gemini call construction and timeout/option handling; closely related to the new HTTP options helper and call sites.

Poem

🐰 I hopped through folders wide and deep,
I nudged the tiers where secrets sleep,
I sniffed the nested jsonl trail,
I stitched the name from encoded trail,
A happy rabbit, code to keep!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: enabling recursive discovery of nested JSONL files in the watcher, which is the primary objective across multiple modified files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/pr-a5-enrichment-watcher-globs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

EtanHey · 2026-04-12T00:00:07Z

@coderabbitai review

EtanHey · 2026-04-12T00:00:07Z

@cursor @BugBot review

EtanHey · 2026-04-12T00:00:07Z

@coderabbitai Nested Claude transcript artifacts under session folders (especially subagents/*.jsonl) belong to the project directory immediately under projects/, not to the immediate parent folder. Treat the project root segment as the canonical owner for watcher ingestion, and preserve full repo names like brainlayer-grill rather than collapsing them to the last dash-separated token. Please learn this design for future reviews on watcher paths.

EtanHey · 2026-04-12T00:00:07Z

@codex review

cursor · 2026-04-12T00:00:10Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

coderabbitai · 2026-04-12T00:00:13Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-12T00:00:25Z

Tip

For best results, initiate chat on the files or code changes.

@EtanHey, understood — I've recorded this design for future watcher path reviews.

✏️ Learnings added

Learnt from: EtanHey
URL: https://github.com/EtanHey/brainlayer/pull/236

Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: EtanHey
URL: https://github.com/EtanHey/brainlayer/pull/236

Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

…-watcher-globs

EtanHey · 2026-04-12T00:01:54Z

Merged current main into this branch in commit 4b59737 so PR #236 is no longer behind PR-A6. Re-verified on the updated branch: pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q -> 133 passed, 2 warnings; ruff check and ruff format --check are clean. @coderabbitai review @codex review @cursor @BugBot re-review

coderabbitai · 2026-04-12T00:02:07Z

@EtanHey, thanks for the update! Glad to hear the merge from main went smoothly and that all 133 tests are passing with clean linting. Triggering a fresh review now.

🧠 Learnings used

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b59737e47

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T00:05:08Z

+        if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
+            repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
+            name = "-".join(repo_parts) if repo_parts else raw
+        else:
+            name = raw


Fall back to repo tail when path markers are missing

In _normalize_project_name, encoded names that start with -Users-/-home- but do not contain one of the hard-coded markers now fall back to name = raw. For common Claude paths like -home-alice-code-myrepo, this stores the full encoded path as the project instead of myrepo, which regresses prior behavior and causes project attribution/filtering mismatches downstream (records are written under the encoded string, while callers typically filter by normalized repo name).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-12T00:05:08Z

+    if "projects" in parts:
+        project_index = parts.index("projects") + 1


Resolve project root from the last projects path segment

_extract_project_from_source uses parts.index("projects"), which always picks the first occurrence in the absolute path. In environments where another projects directory appears earlier (for example home paths rooted under /projects/...), this returns the wrong segment as the project root and misattributes ingested chunks, breaking project-scoped retrieval.

Useful? React with 👍 / 👎.

EtanHey · 2026-04-12T00:18:01Z

Follow-up fix pushed in 560b9f26 after the branch surfaced a real SDK compatibility issue from the merged A6 work. google-genai==1.63.0 does not accept top-level service_tier on GenerateContentConfig; the supported transport is http_options.extra_body.serviceTier. This commit switches realtime + extraction to that SDK-compatible shape and adds a validation test against the real SDK model class.

Also hardened tests/test_enrichment_flex_integration.py::test_sustained_rate_no_contention to count only the 100 prompts owned by the test. Full-suite order was picking up unrelated background auto-enrich calls from other tests; that noise was inflating the raw call counter without changing the workload this test is meant to measure.

Verification on this branch after the fix:

pytest tests/test_enrichment_controller.py -q -k "service_tier or validates_against_sdk" -> 3 passed
pytest tests/test_enrichment_flex_integration.py::test_sustained_rate_no_contention -q -> 1 passed
pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q -> 134 passed, 2 warnings
pytest tests/ -v --tb=short -m "not integration" -x now gets past the previous enrichment blockers and stops at tests/test_eval_baselines.py::TestMemoryRetrieval::test_whoop_discussion_findable

That WHOOP eval failure is already red on local main at cc39f38f as well, so it is baseline repo debt rather than a regression from this branch. @coderabbitai review @codex review @cursor @BugBot re-review

coderabbitai · 2026-04-12T00:18:31Z

@EtanHey, thanks for the detailed update. A few observations on commit 560b9f26:

SDK compatibility fix — confirmed correct. The GenerateContentConfig class in google-genai does not expose service_tier as a top-level field. The SDK's HttpOptions supports an extra_body field for adding extra parameters to the request body, where "the structure must match the backend API's request structure" — so routing serviceTier through http_options.extra_body.serviceTier is the right transport path for this SDK version.

Test hardening — isolating test_sustained_rate_no_contention to count only its own 100 prompts rather than picking up background auto-enrich calls from other tests is the correct fix for that flakiness vector.

WHOOP eval — noted that test_whoop_discussion_findable is already red on main at cc39f38f, making it repo debt rather than a regression introduced by this branch.

Triggering a fresh incremental review now.

🧠 Learnings used

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Treat retrieval correctness, write safety, and MCP stability as critical-path concerns in BrainLayer reviews

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/brainlayer/watcher_bridge.py`:
- Around line 185-188: The current logic in watcher_bridge.py uses
parts.index("projects") which picks the first "projects" token and can
mis-attribute nested paths; change it to locate the last "projects" segment
(e.g., find the last index of "projects" in parts) and then use that index+1 to
pass into _normalize_project_name so ownership resolves to the canonical project
directory under the final "projects" segment.
- Around line 156-168: The parsing block that splits raw by "-" (variables raw,
parts, markers, last_marker_idx, repo_parts, name) mistakenly chooses the last
occurrence of a structural marker, which can pick up tokens inside a
dash-separated repo name; change the logic to locate the first structural marker
after the "-Users-" / "-home-" prefix (stop at the first match instead of
recording the last), then build repo_parts from parts[last_marker_idx+1:] as
before so the full encoded directory name (all remaining dash-separated tokens)
is preserved as the canonical repo identifier.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c6d6edff-d80c-4a6a-8a61-48290710c135

📥 Commits

Reviewing files that changed from the base of the PR and between cc39f38 and 560b9f2.

📒 Files selected for processing (7)

src/brainlayer/enrichment_controller.py
src/brainlayer/watcher.py
src/brainlayer/watcher_bridge.py
tests/test_enrichment_controller.py
tests/test_enrichment_flex_integration.py
tests/test_jsonl_watcher.py
tests/test_watcher_bridge.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: test (3.11)
GitHub Check: Macroscope - Correctness Check

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests

**/*.py: Use paths.py:get_db_path() for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches

Files:

tests/test_enrichment_flex_integration.py
src/brainlayer/watcher.py
tests/test_enrichment_controller.py
src/brainlayer/enrichment_controller.py
tests/test_jsonl_watcher.py
src/brainlayer/watcher_bridge.py
tests/test_watcher_bridge.py

src/brainlayer/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/**/*.py: Use retry logic on SQLITE_BUSY errors; each worker must use its own database connection to handle concurrency safely
Classification must preserve ai_code, stack_trace, and user_message verbatim; skip noise entries entirely and summarize build_log and dir_listing entries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via enrichment_controller.py, and Ollama as offline last-resort; allow override via BRAINLAYER_ENRICH_BACKEND env var
Configure enrichment rate via BRAINLAYER_ENRICH_RATE environment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns: superseded_by, aggregated_into, archived_at on chunks table; exclude lifecycle-managed chunks from default search; allow include_archived=True to show history
Implement brain_supersede with safety gate for personal data (journals, notes, health/finance); use soft-delete for brain_archive with timestamp
Add supersedes parameter to brain_store for atomic store-and-replace operations
Run linting and formatting with: ruff check src/ && ruff format src/
Run tests with pytest
Use PRAGMA wal_checkpoint(FULL) before and after bulk database operations to prevent WAL bloat

Files:

src/brainlayer/watcher.py
src/brainlayer/enrichment_controller.py
src/brainlayer/watcher_bridge.py

src/brainlayer/watcher.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/brainlayer/watcher.py: Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip
Persist watcher offsets in ~/.local/share/brainlayer/offsets.json; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Send Axiom telemetry (startup, flush, error, heartbeat at 60s intervals) to brainlayer-watcher dataset

Files:

src/brainlayer/watcher.py

🧠 Learnings (13)

📓 Common learnings

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.

📚 Learning: 2026-04-12T00:00:23.977Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Applied to files:

src/brainlayer/watcher.py
tests/test_jsonl_watcher.py
src/brainlayer/watcher_bridge.py
tests/test_watcher_bridge.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

Applied to files:

src/brainlayer/watcher.py
src/brainlayer/watcher_bridge.py
tests/test_watcher_bridge.py

📚 Learning: 2026-04-06T11:15:05.940Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Applied to files:

src/brainlayer/watcher.py
src/brainlayer/watcher_bridge.py
tests/test_watcher_bridge.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip

Applied to files:

src/brainlayer/watcher.py

📚 Learning: 2026-04-11T23:47:49.746Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.

Applied to files:

tests/test_enrichment_controller.py
src/brainlayer/enrichment_controller.py

📚 Learning: 2026-04-01T01:24:44.281Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable

Applied to files:

tests/test_enrichment_controller.py
src/brainlayer/enrichment_controller.py

📚 Learning: 2026-04-11T16:54:45.605Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Applied to files:

tests/test_enrichment_controller.py
src/brainlayer/enrichment_controller.py

📚 Learning: 2026-04-11T16:54:45.605Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py` and `src/brainlayer/pipeline/rate_limiter.py`: Gemini API calls in the enrichment pipeline are gated by a token bucket rate limiter. The rate is controlled by `BRAINLAYER_ENRICH_RATE` (default `5/s`, burst `10`) to keep throughput inside the Gemini Flex intended envelope. This default supersedes the earlier 0.2 (12 RPM) default for the Gemini Flex integration path.

Applied to files:

src/brainlayer/enrichment_controller.py

📚 Learning: 2026-04-06T08:40:13.531Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via `enrichment_controller.py`, and Ollama as offline last-resort; allow override via `BRAINLAYER_ENRICH_BACKEND` env var

Applied to files:

src/brainlayer/enrichment_controller.py

📚 Learning: 2026-04-12T00:00:23.977Z

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Applied to files:

src/brainlayer/watcher_bridge.py
tests/test_watcher_bridge.py

📚 Learning: 2026-04-06T11:15:05.940Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip

Applied to files:

src/brainlayer/watcher_bridge.py

📚 Learning: 2026-04-06T11:15:05.940Z

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher.py : Watcher uses Axiom telemetry for startup, flush, error, and heartbeat (60s) events to `brainlayer-watcher` dataset

Applied to files:

tests/test_watcher_bridge.py

🔇 Additional comments (10)

src/brainlayer/watcher.py (1)

287-299: Recursive JSONL discovery looks correct for nested session artifacts.

rglob("*.jsonl") with the file guard cleanly captures nested transcripts (e.g., subagents/*.jsonl) without changing tailing behavior.

Based on learnings: nested session artifacts belong to the canonical project directory and must be discovered recursively.

tests/test_watcher_bridge.py (2)

119-136: Project extraction test updates are aligned with the new canonical naming behavior.

These assertions correctly lock in both pass-through simple names and nested-source extraction to brainlayer-grill.

Based on learnings: preserve full encoded repo names and attribute nested artifacts to the directory directly under projects.

300-331: Great end-to-end regression coverage for nested startup backfill.

This test validates the actual watcher→bridge→DB path for nested subagent files and confirms canonical project attribution.

Based on learnings: nested transcript files under session folders must map to the canonical project root under projects.

tests/test_jsonl_watcher.py (1)

236-253: Nice targeted test for recursive nested JSONL discovery.

This is a solid regression guard for the rglob behavior change.

Based on learnings: watcher discovery must include nested session artifacts such as subagents/*.jsonl.

tests/test_enrichment_flex_integration.py (1)

37-40: Scoped timestamp capture is a good stabilization for this rate test.

Limiting call_times updates to the targeted prompt content reduces noise from unrelated generate_content calls while keeping the concurrency-safe lock path intact.

tests/test_enrichment_controller.py (2)

144-144: Assertion update correctly tracks the new Gemini config shape.

The check now matches the nested http_options.extra_body.serviceTier structure used by the controller.

489-503: SDK-backed schema validation test is a strong regression guard.

Validating with GenerateContentConfig.model_validate(...) helps catch future config-shape drift early.

src/brainlayer/enrichment_controller.py (3)

301-307: Nice centralization of Gemini HTTP option construction.

Routing config through _build_gemini_http_options() reduces duplicated wiring and keeps service-tier behavior consistent.

347-347: Good reuse of the shared HTTP-options helper in extraction path.

Using the same builder at this call site keeps service-tier and timeout wiring consistent with the rest of Gemini integrations.

314-320: No changes needed. The request-level http_options are safe: extra_body and timeout do not override the client-level retry behavior (which is preserved at attempts=1 per the SDK's per-request retry semantics), and timeout_ms units are milliseconds as expected by the SDK.

coderabbitai · 2026-04-12T00:22:00Z

+    if raw.startswith("-Users-") or raw.startswith("-home-"):
+        parts = raw.split("-")
+        markers = {"Gits", "Desktop", "projects", "config"}
+        last_marker_idx = -1
+        for i, part in enumerate(parts):
+            if part in markers:
+                last_marker_idx = i
+
+        if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
+            repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
+            name = "-".join(repo_parts) if repo_parts else raw
+        else:
+            name = raw


⚠️ Potential issue | 🟠 Major

Prevent repo-name truncation when marker tokens appear inside the repo name.

Line 158–Line 166 can mis-parse names like ...-Gits-my-config-service into just service because config is treated as a structural marker. That breaks canonical project attribution.

💡 Suggested fix

- if raw.startswith("-Users-") or raw.startswith("-home-"): - parts = raw.split("-") - markers = {"Gits", "Desktop", "projects", "config"} - last_marker_idx = -1 - for i, part in enumerate(parts): - if part in markers: - last_marker_idx = i - - if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1: - repo_parts = [p for p in parts[last_marker_idx + 1 :] if p] - name = "-".join(repo_parts) if repo_parts else raw - else: - name = raw + if raw.startswith("-Users-") or raw.startswith("-home-"): + decoded = None + if "-Gits-" in raw: + decoded = raw.rsplit("-Gits-", 1)[1].lstrip("-") + elif "-projects-" in raw: + decoded = raw.rsplit("-projects-", 1)[1].lstrip("-") + elif "-config-" in raw: + decoded = raw.rsplit("-config-", 1)[1].lstrip("-") + name = decoded or raw else: name = raw

Based on learnings: preserve the full encoded directory name as the canonical repo identifier, without collapsing dash-separated tokens.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/watcher_bridge.py` around lines 156 - 168, The parsing block that splits raw by "-" (variables raw, parts, markers, last_marker_idx, repo_parts, name) mistakenly chooses the last occurrence of a structural marker, which can pick up tokens inside a dash-separated repo name; change the logic to locate the first structural marker after the "-Users-" / "-home-" prefix (stop at the first match instead of recording the last), then build repo_parts from parts[last_marker_idx+1:] as before so the full encoded directory name (all remaining dash-separated tokens) is preserved as the canonical repo identifier.

coderabbitai · 2026-04-12T00:22:00Z

+    if "projects" in parts:
+        project_index = parts.index("projects") + 1
+        if project_index < len(parts):
+            return _normalize_project_name(parts[project_index])


⚠️ Potential issue | 🟠 Major

Use the last projects segment to avoid mis-attribution on multi-projects paths.

Line 186 currently anchors on the first projects token, which can resolve the wrong owner when multiple projects segments exist in a path.

💡 Suggested fix

- if "projects" in parts: - project_index = parts.index("projects") + 1 + project_positions = [i for i, segment in enumerate(parts) if segment == "projects"] + if project_positions: + project_index = project_positions[-1] + 1 if project_index < len(parts): return _normalize_project_name(parts[project_index])

Based on learnings: nested transcript artifacts must resolve ownership to the canonical project directory directly under projects.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/brainlayer/watcher_bridge.py` around lines 185 - 188, The current logic in watcher_bridge.py uses parts.index("projects") which picks the first "projects" token and can mis-attribute nested paths; change it to locate the last "projects" segment (e.g., find the last index of "projects" in parts) and then use that index+1 to pass into _normalize_project_name so ownership resolves to the canonical project directory under the final "projects" segment.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 560b9f265a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T00:22:44Z

+        if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
+            repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
+            name = "-".join(repo_parts) if repo_parts else raw
+        else:
+            name = raw


Preserve basename when marker-based decode misses

When an encoded project folder starts with -Users-/-home- but does not contain one of the hard-coded markers (Gits, Desktop, projects, config), this branch falls back to the full encoded token (for example -home-alice-repo) instead of the repo basename (repo). That is a regression from the previous behavior and causes watcher-ingested rows to be written under non-canonical project names, which then breaks project-scoped retrieval for those environments because search/scoping expects the basename.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-12T00:22:44Z

+    if "projects" in parts:
+        project_index = parts.index("projects") + 1
+        if project_index < len(parts):


Resolve project from the last projects path segment

Using parts.index("projects") selects the first projects segment in the absolute path. If the configured watcher source lives under another directory named projects, project extraction can pick the wrong segment (e.g. .claude) and mis-attribute all ingested chunks. This is a real regression for custom source paths because top-level files that previously mapped via parent directory now get an incorrect project label.

Useful? React with 👍 / 👎.

EtanHey · 2026-04-12T04:45:51Z

CI follow-up on 1368b802:

The new real-SDK validation test was correct locally, but the GitHub unit matrix does not install the optional cloud dependency in every environment.
The failing 3.13 job raised ModuleNotFoundError: No module named "google".
I updated tests/test_enrichment_controller.py to use pytest.importorskip("google.genai"), which preserves the SDK-shape assertion when the package is present and cleanly skips it when the repo is running without the cloud extra.
Fresh local verification on this exact patch:
- pytest tests/test_enrichment_controller.py -q -k "service_tier or validates_against_sdk" → 3 passed, 70 deselected, 2 warnings
- pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q → 134 passed, 2 warnings
- ruff check tests/test_enrichment_controller.py → clean
- ruff format --check tests/test_enrichment_controller.py → clean
- cr review --plain on the staged diff → Review completed: No findings ✔

@coderabbitai review
@codex review
@cursor @BugBot re-review

coderabbitai · 2026-04-12T04:46:08Z

@EtanHey, the pytest.importorskip("google.genai") fix is exactly the right approach here — it's the canonical pytest pattern for optional dependencies, keeping the SDK-shape assertion active when the google-genai package is installed and producing a clean skip (not a failure) in environments that don't include the cloud extra. No logic changes, minimal diff, and all 134 tests + linting remain clean. Triggering a fresh incremental review now.

🧠 Learnings used

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)

Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1368b8022b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T04:49:44Z

+        if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
+            repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
+            name = "-".join(repo_parts) if repo_parts else raw
+        else:
+            name = raw


Handle encoded paths without marker segments

Restore a fallback when no marker is found in _normalize_project_name. With the new logic, encoded Claude folder names such as -Users-alice-Code-my-repo or -home-bob-src-service-api now return the full encoded token instead of the repo name, because only Gits/Desktop/projects/config are recognized. In environments where repos are not under those exact directory names, watcher-ingested rows get mis-grouped project values, which breaks project-scoped search and recall consistency.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-12T04:49:44Z

+    if "projects" in parts:
+        project_index = parts.index("projects") + 1
+        if project_index < len(parts):


Select the correct projects segment in source paths

Use the last relevant projects segment (or anchor on .claude/projects) when deriving the project root. parts.index("projects") picks the first match, so paths like /projects/alice/.claude/projects/-Users-alice-Gits-brainlayer/... resolve to alice instead of the encoded repo directory. This causes systematic project mis-attribution for users whose home/workspace path already includes a projects directory.

Useful? React with 👍 / 👎.

fix: watch nested jsonl transcripts recursively

0c1f82d

greptile-apps Bot reviewed Apr 11, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into feat/pr-a5-enrichment…

4b59737

…-watcher-globs

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

fix: use sdk-compatible gemini flex transport

560b9f2

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

test: skip sdk validation without cloud extra

1368b80

chatgpt-codex-connector Bot reviewed Apr 12, 2026

View reviewed changes

EtanHey merged commit cda100e into main Apr 12, 2026
6 checks passed

EtanHey deleted the feat/pr-a5-enrichment-watcher-globs branch April 12, 2026 04:51

		if "projects" in parts:
		project_index = parts.index("projects") + 1

Conversation

EtanHey commented Apr 11, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

CodeRabbit note

Context

Fix JSONLWatcher to discover nested JSONL transcripts recursively

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

cursor Bot commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

EtanHey commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

EtanHey commented Apr 11, 2026 •

edited by macroscopeapp Bot

Loading

Fix `JSONLWatcher` to discover nested JSONL transcripts recursively

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading