fix: watch nested JSONL transcripts recursively#236
Conversation
📝 WalkthroughWalkthroughMoved Gemini service-tier into Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
|
@coderabbitai Nested Claude transcript artifacts under session folders (especially subagents/*.jsonl) belong to the project directory immediately under projects/, not to the immediate parent folder. Treat the project root segment as the canonical owner for watcher ingestion, and preserve full repo names like brainlayer-grill rather than collapsing them to the last dash-separated token. Please learn this design for future reviews on watcher paths. |
|
@codex review |
|
You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor |
✅ Actions performedReview triggered.
|
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings used |
|
Merged current main into this branch in commit 4b59737 so PR #236 is no longer behind PR-A6. Re-verified on the updated branch: pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py tests/test_enrichment_controller.py tests/test_enrichment_flex_integration.py -q -> 133 passed, 2 warnings; ruff check and ruff format --check are clean. @coderabbitai review @codex review @cursor @BugBot re-review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b59737e47
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1: | ||
| repo_parts = [p for p in parts[last_marker_idx + 1 :] if p] | ||
| name = "-".join(repo_parts) if repo_parts else raw | ||
| else: | ||
| name = raw |
There was a problem hiding this comment.
Fall back to repo tail when path markers are missing
In _normalize_project_name, encoded names that start with -Users-/-home- but do not contain one of the hard-coded markers now fall back to name = raw. For common Claude paths like -home-alice-code-myrepo, this stores the full encoded path as the project instead of myrepo, which regresses prior behavior and causes project attribution/filtering mismatches downstream (records are written under the encoded string, while callers typically filter by normalized repo name).
Useful? React with 👍 / 👎.
| if "projects" in parts: | ||
| project_index = parts.index("projects") + 1 |
There was a problem hiding this comment.
Resolve project root from the last
projects path segment
_extract_project_from_source uses parts.index("projects"), which always picks the first occurrence in the absolute path. In environments where another projects directory appears earlier (for example home paths rooted under /projects/...), this returns the wrong segment as the project root and misattributes ingested chunks, breaking project-scoped retrieval.
Useful? React with 👍 / 👎.
|
Follow-up fix pushed in Also hardened Verification on this branch after the fix:
That WHOOP eval failure is already red on local |
|
SDK compatibility fix — confirmed correct. The Test hardening — isolating WHOOP eval — noted that Triggering a fresh incremental review now. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/brainlayer/watcher_bridge.py`:
- Around line 185-188: The current logic in watcher_bridge.py uses
parts.index("projects") which picks the first "projects" token and can
mis-attribute nested paths; change it to locate the last "projects" segment
(e.g., find the last index of "projects" in parts) and then use that index+1 to
pass into _normalize_project_name so ownership resolves to the canonical project
directory under the final "projects" segment.
- Around line 156-168: The parsing block that splits raw by "-" (variables raw,
parts, markers, last_marker_idx, repo_parts, name) mistakenly chooses the last
occurrence of a structural marker, which can pick up tokens inside a
dash-separated repo name; change the logic to locate the first structural marker
after the "-Users-" / "-home-" prefix (stop at the first match instead of
recording the last), then build repo_parts from parts[last_marker_idx+1:] as
before so the full encoded directory name (all remaining dash-separated tokens)
is preserved as the canonical repo identifier.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c6d6edff-d80c-4a6a-8a61-48290710c135
📒 Files selected for processing (7)
src/brainlayer/enrichment_controller.pysrc/brainlayer/watcher.pysrc/brainlayer/watcher_bridge.pytests/test_enrichment_controller.pytests/test_enrichment_flex_integration.pytests/test_jsonl_watcher.pytests/test_watcher_bridge.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Macroscope - Correctness Check
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Flag risky DB or concurrency changes explicitly and do not hand-wave lock behavior
Enforce one-write-at-a-time concurrency constraint; reads are safe but brain_digest is write-heavy and must not run in parallel with other MCP work
Run pytest before claiming behavior changed safely; current test suite has 929 tests
**/*.py: Usepaths.py:get_db_path()for all database path resolution; all scripts and CLI must use this function rather than hardcoding paths
When performing bulk database operations: stop enrichment workers first, checkpoint WAL before and after, drop FTS triggers before bulk deletes, batch deletes in 5-10K chunks, and checkpoint every 3 batches
Files:
tests/test_enrichment_flex_integration.pysrc/brainlayer/watcher.pytests/test_enrichment_controller.pysrc/brainlayer/enrichment_controller.pytests/test_jsonl_watcher.pysrc/brainlayer/watcher_bridge.pytests/test_watcher_bridge.py
src/brainlayer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/**/*.py: Use retry logic onSQLITE_BUSYerrors; each worker must use its own database connection to handle concurrency safely
Classification must preserveai_code,stack_trace, anduser_messageverbatim; skipnoiseentries entirely and summarizebuild_loganddir_listingentries (structure only)
Use AST-aware chunking via tree-sitter; never split stack traces; mask large tool output
For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback viaenrichment_controller.py, and Ollama as offline last-resort; allow override viaBRAINLAYER_ENRICH_BACKENDenv var
Configure enrichment rate viaBRAINLAYER_ENRICH_RATEenvironment variable (default 0.2 = 12 RPM)
Implement chunk lifecycle columns:superseded_by,aggregated_into,archived_aton chunks table; exclude lifecycle-managed chunks from default search; allowinclude_archived=Trueto show history
Implementbrain_supersedewith safety gate for personal data (journals, notes, health/finance); use soft-delete forbrain_archivewith timestamp
Addsupersedesparameter tobrain_storefor atomic store-and-replace operations
Run linting and formatting with:ruff check src/ && ruff format src/
Run tests withpytest
UsePRAGMA wal_checkpoint(FULL)before and after bulk database operations to prevent WAL bloat
Files:
src/brainlayer/watcher.pysrc/brainlayer/enrichment_controller.pysrc/brainlayer/watcher_bridge.py
src/brainlayer/watcher.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/brainlayer/watcher.py: Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip
Persist watcher offsets in~/.local/share/brainlayer/offsets.json; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Send Axiom telemetry (startup, flush, error, heartbeat at 60s intervals) tobrainlayer-watcherdataset
Files:
src/brainlayer/watcher.py
🧠 Learnings (13)
📓 Common learnings
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.
📚 Learning: 2026-04-12T00:00:23.977Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher.py and src/brainlayer/watcher_bridge.py: Nested Claude transcript artifacts under session folders (e.g., subagents/*.jsonl) belong to the project directory immediately under `projects/`, not to the immediate parent folder. The project root segment (the directory directly under `projects/`) is the canonical owner for watcher ingestion, regardless of how deeply nested the JSONL file is.
Applied to files:
src/brainlayer/watcher.pytests/test_jsonl_watcher.pysrc/brainlayer/watcher_bridge.pytests/test_watcher_bridge.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Persist watcher offsets in `~/.local/share/brainlayer/offsets.json`; implement rewind detection (file shrink = checkpoint restore) and soft-archive reverted chunks
Applied to files:
src/brainlayer/watcher.pysrc/brainlayer/watcher_bridge.pytests/test_watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Watcher offsets persisted in `~/.local/share/brainlayer/offsets.json` to survive restarts; implement rewind detection (file shrink = checkpoint restore)
Applied to files:
src/brainlayer/watcher.pysrc/brainlayer/watcher_bridge.pytests/test_watcher_bridge.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/watcher.py : Implement 4-layer content filtering for real-time JSONL watcher: entry type whitelist → classify → chunk min-length → system-reminder strip
Applied to files:
src/brainlayer/watcher.py
📚 Learning: 2026-04-11T23:47:49.746Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T23:47:49.746Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: `service_tier="flex"` is the intentional default for all Gemini enrichment calls. Pass-2 enrichment is asynchronous backlog work where 1–15 minute latency is acceptable, and the 50% Gemini Flex Inference discount materially reduces backlog cost. This is locked by R84b design (§8 Q2). The `BRAINLAYER_GEMINI_SERVICE_TIER` environment variable is purely an operational escape hatch (e.g. `standard`), not the intended runtime default. Do not flag `service_tier="flex"` as a concern on this code path.
Applied to files:
tests/test_enrichment_controller.pysrc/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-01T01:24:44.281Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T01:24:44.281Z
Learning: Applies to src/brainlayer/*enrichment*.py : Enrichment backend priority: Groq (primary/cloud) → Gemini (fallback) → Ollama (offline last-resort), configurable via `BRAINLAYER_ENRICH_BACKEND` environment variable
Applied to files:
tests/test_enrichment_controller.pysrc/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-11T16:54:45.605Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py`: Google-genai SDK auto-retries are explicitly disabled (`http_options.retry_options.attempts = 1`) in the enrichment controller. Retry policy is intentionally owned at the application layer, not delegated to the SDK. Do not flag this as missing retry logic.
Applied to files:
tests/test_enrichment_controller.pysrc/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-11T16:54:45.605Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-11T16:54:45.605Z
Learning: Applies to `src/brainlayer/enrichment_controller.py` and `src/brainlayer/pipeline/rate_limiter.py`: Gemini API calls in the enrichment pipeline are gated by a token bucket rate limiter. The rate is controlled by `BRAINLAYER_ENRICH_RATE` (default `5/s`, burst `10`) to keep throughput inside the Gemini Flex intended envelope. This default supersedes the earlier 0.2 (12 RPM) default for the Gemini Flex integration path.
Applied to files:
src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-06T08:40:13.531Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T08:40:13.531Z
Learning: Applies to src/brainlayer/**/*.py : For enrichment backend selection: use Groq as primary backend (cloud, configured in launchd plist), Gemini as fallback via `enrichment_controller.py`, and Ollama as offline last-resort; allow override via `BRAINLAYER_ENRICH_BACKEND` env var
Applied to files:
src/brainlayer/enrichment_controller.py
📚 Learning: 2026-04-12T00:00:23.977Z
Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-04-12T00:00:23.977Z
Learning: Applies to src/brainlayer/watcher_bridge.py: When extracting the project/repo name from a watcher path, preserve the full encoded directory name (e.g., `brainlayer-grill`) as-is. Do not collapse or strip dash-separated tokens — the full name is the canonical repo identifier.
Applied to files:
src/brainlayer/watcher_bridge.pytests/test_watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher*.py : Real-time watcher via LaunchAgent (`com.brainlayer.watch.plist`) uses 4-layer content filters: entry type whitelist → classify → chunk min-length → system-reminder strip
Applied to files:
src/brainlayer/watcher_bridge.py
📚 Learning: 2026-04-06T11:15:05.940Z
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T11:15:05.940Z
Learning: Applies to src/brainlayer/watcher.py : Watcher uses Axiom telemetry for startup, flush, error, and heartbeat (60s) events to `brainlayer-watcher` dataset
Applied to files:
tests/test_watcher_bridge.py
🔇 Additional comments (10)
src/brainlayer/watcher.py (1)
287-299: Recursive JSONL discovery looks correct for nested session artifacts.
rglob("*.jsonl")with the file guard cleanly captures nested transcripts (e.g.,subagents/*.jsonl) without changing tailing behavior.Based on learnings: nested session artifacts belong to the canonical project directory and must be discovered recursively.
tests/test_watcher_bridge.py (2)
119-136: Project extraction test updates are aligned with the new canonical naming behavior.These assertions correctly lock in both pass-through simple names and nested-source extraction to
brainlayer-grill.Based on learnings: preserve full encoded repo names and attribute nested artifacts to the directory directly under
projects.
300-331: Great end-to-end regression coverage for nested startup backfill.This test validates the actual watcher→bridge→DB path for nested subagent files and confirms canonical project attribution.
Based on learnings: nested transcript files under session folders must map to the canonical project root under
projects.tests/test_jsonl_watcher.py (1)
236-253: Nice targeted test for recursive nested JSONL discovery.This is a solid regression guard for the
rglobbehavior change.Based on learnings: watcher discovery must include nested session artifacts such as
subagents/*.jsonl.tests/test_enrichment_flex_integration.py (1)
37-40: Scoped timestamp capture is a good stabilization for this rate test.Limiting
call_timesupdates to the targeted prompt content reduces noise from unrelatedgenerate_contentcalls while keeping the concurrency-safe lock path intact.tests/test_enrichment_controller.py (2)
144-144: Assertion update correctly tracks the new Gemini config shape.The check now matches the nested
http_options.extra_body.serviceTierstructure used by the controller.
489-503: SDK-backed schema validation test is a strong regression guard.Validating with
GenerateContentConfig.model_validate(...)helps catch future config-shape drift early.src/brainlayer/enrichment_controller.py (3)
301-307: Nice centralization of Gemini HTTP option construction.Routing config through
_build_gemini_http_options()reduces duplicated wiring and keeps service-tier behavior consistent.
347-347: Good reuse of the shared HTTP-options helper in extraction path.Using the same builder at this call site keeps service-tier and timeout wiring consistent with the rest of Gemini integrations.
314-320: No changes needed. The request-levelhttp_optionsare safe:extra_bodyandtimeoutdo not override the client-level retry behavior (which is preserved atattempts=1per the SDK's per-request retry semantics), andtimeout_msunits are milliseconds as expected by the SDK.
| if raw.startswith("-Users-") or raw.startswith("-home-"): | ||
| parts = raw.split("-") | ||
| markers = {"Gits", "Desktop", "projects", "config"} | ||
| last_marker_idx = -1 | ||
| for i, part in enumerate(parts): | ||
| if part in markers: | ||
| last_marker_idx = i | ||
|
|
||
| if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1: | ||
| repo_parts = [p for p in parts[last_marker_idx + 1 :] if p] | ||
| name = "-".join(repo_parts) if repo_parts else raw | ||
| else: | ||
| name = raw |
There was a problem hiding this comment.
Prevent repo-name truncation when marker tokens appear inside the repo name.
Line 158–Line 166 can mis-parse names like ...-Gits-my-config-service into just service because config is treated as a structural marker. That breaks canonical project attribution.
💡 Suggested fix
- if raw.startswith("-Users-") or raw.startswith("-home-"):
- parts = raw.split("-")
- markers = {"Gits", "Desktop", "projects", "config"}
- last_marker_idx = -1
- for i, part in enumerate(parts):
- if part in markers:
- last_marker_idx = i
-
- if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1:
- repo_parts = [p for p in parts[last_marker_idx + 1 :] if p]
- name = "-".join(repo_parts) if repo_parts else raw
- else:
- name = raw
+ if raw.startswith("-Users-") or raw.startswith("-home-"):
+ decoded = None
+ if "-Gits-" in raw:
+ decoded = raw.rsplit("-Gits-", 1)[1].lstrip("-")
+ elif "-projects-" in raw:
+ decoded = raw.rsplit("-projects-", 1)[1].lstrip("-")
+ elif "-config-" in raw:
+ decoded = raw.rsplit("-config-", 1)[1].lstrip("-")
+ name = decoded or raw
else:
name = rawBased on learnings: preserve the full encoded directory name as the canonical repo identifier, without collapsing dash-separated tokens.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/watcher_bridge.py` around lines 156 - 168, The parsing block
that splits raw by "-" (variables raw, parts, markers, last_marker_idx,
repo_parts, name) mistakenly chooses the last occurrence of a structural marker,
which can pick up tokens inside a dash-separated repo name; change the logic to
locate the first structural marker after the "-Users-" / "-home-" prefix (stop
at the first match instead of recording the last), then build repo_parts from
parts[last_marker_idx+1:] as before so the full encoded directory name (all
remaining dash-separated tokens) is preserved as the canonical repo identifier.
| if "projects" in parts: | ||
| project_index = parts.index("projects") + 1 | ||
| if project_index < len(parts): | ||
| return _normalize_project_name(parts[project_index]) |
There was a problem hiding this comment.
Use the last projects segment to avoid mis-attribution on multi-projects paths.
Line 186 currently anchors on the first projects token, which can resolve the wrong owner when multiple projects segments exist in a path.
💡 Suggested fix
- if "projects" in parts:
- project_index = parts.index("projects") + 1
+ project_positions = [i for i, segment in enumerate(parts) if segment == "projects"]
+ if project_positions:
+ project_index = project_positions[-1] + 1
if project_index < len(parts):
return _normalize_project_name(parts[project_index])Based on learnings: nested transcript artifacts must resolve ownership to the canonical project directory directly under projects.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/brainlayer/watcher_bridge.py` around lines 185 - 188, The current logic
in watcher_bridge.py uses parts.index("projects") which picks the first
"projects" token and can mis-attribute nested paths; change it to locate the
last "projects" segment (e.g., find the last index of "projects" in parts) and
then use that index+1 to pass into _normalize_project_name so ownership resolves
to the canonical project directory under the final "projects" segment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 560b9f265a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1: | ||
| repo_parts = [p for p in parts[last_marker_idx + 1 :] if p] | ||
| name = "-".join(repo_parts) if repo_parts else raw | ||
| else: | ||
| name = raw |
There was a problem hiding this comment.
Preserve basename when marker-based decode misses
When an encoded project folder starts with -Users-/-home- but does not contain one of the hard-coded markers (Gits, Desktop, projects, config), this branch falls back to the full encoded token (for example -home-alice-repo) instead of the repo basename (repo). That is a regression from the previous behavior and causes watcher-ingested rows to be written under non-canonical project names, which then breaks project-scoped retrieval for those environments because search/scoping expects the basename.
Useful? React with 👍 / 👎.
| if "projects" in parts: | ||
| project_index = parts.index("projects") + 1 | ||
| if project_index < len(parts): |
There was a problem hiding this comment.
Resolve project from the last
projects path segment
Using parts.index("projects") selects the first projects segment in the absolute path. If the configured watcher source lives under another directory named projects, project extraction can pick the wrong segment (e.g. .claude) and mis-attribute all ingested chunks. This is a real regression for custom source paths because top-level files that previously mapped via parent directory now get an incorrect project label.
Useful? React with 👍 / 👎.
|
CI follow-up on
@coderabbitai review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1368b8022b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if last_marker_idx >= 0 and last_marker_idx < len(parts) - 1: | ||
| repo_parts = [p for p in parts[last_marker_idx + 1 :] if p] | ||
| name = "-".join(repo_parts) if repo_parts else raw | ||
| else: | ||
| name = raw |
There was a problem hiding this comment.
Handle encoded paths without marker segments
Restore a fallback when no marker is found in _normalize_project_name. With the new logic, encoded Claude folder names such as -Users-alice-Code-my-repo or -home-bob-src-service-api now return the full encoded token instead of the repo name, because only Gits/Desktop/projects/config are recognized. In environments where repos are not under those exact directory names, watcher-ingested rows get mis-grouped project values, which breaks project-scoped search and recall consistency.
Useful? React with 👍 / 👎.
| if "projects" in parts: | ||
| project_index = parts.index("projects") + 1 | ||
| if project_index < len(parts): |
There was a problem hiding this comment.
Select the correct
projects segment in source paths
Use the last relevant projects segment (or anchor on .claude/projects) when deriving the project root. parts.index("projects") picks the first match, so paths like /projects/alice/.claude/projects/-Users-alice-Gits-brainlayer/... resolve to alice instead of the encoded repo directory. This causes systematic project mis-attribution for users whose home/workspace path already includes a projects directory.
Useful? React with 👍 / 👎.
Summary
subagents/*.jsonland sibling nested JSONL files are picked upprojects/so nested files still map back to the owning repobrainlayer-grillinstead of collapsing them togrillWhy
Test plan
pytest tests/test_jsonl_watcher.py tests/test_watcher_bridge.py -qruff check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.pyruff format --check src/brainlayer/watcher.py src/brainlayer/watcher_bridge.py tests/test_jsonl_watcher.py tests/test_watcher_bridge.pyCodeRabbit note
cr review --plaintwice before push, but CodeRabbit returned a rate-limit error both times (try after 22 minutes, thentry after 16 minutes 41 secondson retry). I am explicitly not claiming a local CR pass here; please use the GitHub CodeRabbit review on this PR as the review source.Context
/Users/etanheyman/Gits/orchestrator/collab/brainlayer-a5-a6-flex-restart-bundle.mdNote
Fix
JSONLWatcherto discover nested JSONL transcripts recursively_discover_jsonl_filesin watcher.py to userglob('*.jsonl')instead of iterating only immediate children, so session subdirectories (e.g.subagents/) are included._extract_project_from_sourceand_normalize_project_namein watcher_bridge.py to correctly resolve project names from nested paths and encoded directory names (e.g.-Users-...-Gits-<repo>-...).service_tierfield tohttp_options.extra_body.serviceTierin enrichment_controller.py to align with the SDK's expected config shape.Macroscope summarized 1368b80.
Summary by CodeRabbit
New Features
Bug Fixes
Tests