Skip to content

chore: rewrite AGENTS.md + gitignore scratch (council Phase A)#146

Closed
Gradata wants to merge 42 commits intomainfrom
feat/council-phase-a-hygiene
Closed

chore: rewrite AGENTS.md + gitignore scratch (council Phase A)#146
Gradata wants to merge 42 commits intomainfrom
feat/council-phase-a-hygiene

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 30, 2026

Phase A — clean rebase onto current main. Replaces stale 41-commit base (m1 work already merged via #144) with single Phase A commit only.

Diff

  • Gradata/.gitignore +7
  • Gradata/AGENTS.md +194

(2 files, +201/-0)

Tests

4061 pass / 4 skip / 3 fail — all 3 failures verified PRE-EXISTING on origin/main (cloud_sync URL fixes pending in fix/coderabbit-followups; pattern_graduation_integration is a known flake).

Branch updated to rebase/council-phase-a (single commit ffc05f0).

Gradata and others added 30 commits April 20, 2026 15:16
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ten stale notes

Co-Authored-By: Gradata <noreply@gradata.ai>
…ate refresh

- agent_graduation: add _extract_output() to handle all Claude Code PostToolUse
  payload key variants (tool_response/tool_output/tool_result/output/response)
  so plan-mode agents no longer silently drop output
- session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected
  into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on
  session close with live DB + lesson counts); raise Stop hook timeout to 90 s
- _events: add _redact_payload() (recursive email PII redaction) wired into
  emit() before any write; raw side-log to events.raw.jsonl (best-effort);
  redactor failure aborts write (fail closed)

Co-Authored-By: Gradata <noreply@gradata.ai>
…e watermarks

- _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso()
  preserves timestamp sort order during historical backfill
- device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id;
  race-safe via O_EXCL temp file + os.replace
- 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/
  origin_agent columns + indexes to events table; chunked 10k-row backfill that
  is idempotent and resumes on restart
- 003_add_sync_state: creates sync_state table if missing and adds device_id/
  last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes
- tests: 44 tests covering all migration paths, chunked backfill, idempotency,
  PII redaction (email), loop-state generation, and session_close functions

Co-Authored-By: Gradata <noreply@gradata.ai>
…ts DB

Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session
number — the actual Anthropic session log — rather than MAX(session)
from the Gradata events table. The two diverged (314 vs 367). Falls
back to the events DB if the project dir can't be located.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previous fix only counted the active project dir (314). Global sum
across all project dirs gives 659, matching the actual Anthropic
session log total. Falls back to events DB if projects dir missing.

Co-Authored-By: Gradata <noreply@gradata.ai>
…oop-state.md (367)

Session number was read from loop-state.md (Gradata events DB counter).
Now counts .jsonl files across all ~/.claude/projects/ dirs — the real
Claude Code session total, same logic as status_line.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
Every silent except Exception: pass in the core library layers now emits
a _log.debug() so failures surface under GRADATA_LOG=debug without
breaking the best-effort semantics. Files touched: brain.py (telemetry
guard), context_wrapper.py (apply_brain_rules / context_for fallbacks),
_brain_manifest.py + _context_compile.py (added module loggers),
_context_packet.py (12 data-loading guards), _manifest_metrics.py
(7 DB query guards), _doctor.py (HTTP body read guard + contextlib
import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py
(4 x SIM105 OSError guards converted to contextlib.suppress).

Co-Authored-By: Gradata <noreply@gradata.ai>
ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules).
ruff format src/ reformatted 163 files to enforce consistent style.
Zero errors remain; 13 pre-existing warnings (optional cloud/framework
imports, lazy __all__ patterns) are unchanged.

Co-Authored-By: Gradata <noreply@gradata.ai>
Two tests expected s0/s42 but got s659 because _claude_session_count()
was walking the real ~/.claude/projects/. Add fake_home fixture so the
function returns None and falls back to the events DB as intended.

Co-Authored-By: Gradata <noreply@gradata.ai>
…eshold

New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md
when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules
surfaces a <watchdog-alert> block at next session start so the LLM knows to
review the handoff and run /compact or /clear.

Also: bracket_confidence() in session_close for cache-key stability; remove
MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps);
13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ript store + retroactive sweep

P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix
(claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic).
session_close._refresh_brain_prompt now uses call_provider instead of inline SDK.

P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high)
so per-tick confidence changes no longer bust the synthesis cache.

P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and
_transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource)
form the transcript store layer. _retroactive_sweep() in the waterfall runs
implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1).
OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls.
21 new tests in test_transcript.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
…only

The global Path.is_file patch in _run_main() caused inject_brain_rules to
also read a fake pending_handoff.txt and append a <watchdog-alert> block.
Test now extracts content between <brain-rules>...</brain-rules> before
counting lines, making it immune to any outer blocks appended to the result.

Co-Authored-By: Gradata <noreply@gradata.ai>
- pre_compact.py rewritten: when auto-compact fires with a pending handoff,
  replaces the compact summary verbatim with handoff content so no lossy
  LLM summarization occurs. Manual compact falls back to snapshot. Corrects
  field name from "type" → "trigger" (keeps legacy fallback).

- inject_brain_rules._build_watchdog_block() extracted from inline main():
  Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to
  post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt.
  Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects
  <session-handoff> into fresh session. Phase 2 takes priority if both exist.

- implicit_feedback: return None instead of signal name string to reduce
  UserPromptSubmit noise.

- tests/test_pre_compact.py: 9 tests covering both trigger paths.
- tests/test_inject_watchdog_phases.py: 8 tests covering both phases.

Co-Authored-By: Gradata <noreply@gradata.ai>
graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code
searches until the session flag is set. Returns a block decision with
the exact ToolSearch call needed to unblock.

graph_session_track.py (PostToolUse, ToolSearch): writes a per-session
flag file when a ToolSearch query contains "code-review-graph", clearing
the block for the rest of the session.

inject_brain_rules.py: appends <code-graph-tools> directive to every
SessionStart injection, with the mandatory ToolSearch query string.

Both hooks registered in ~/.claude/settings.json. Bypass via
GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end.

Co-Authored-By: Gradata <noreply@gradata.ai>
…tignore cleanup

- test_hooks_intelligence.py: implicit_feedback tests now assert result is None
  and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return)
- session_close.py: reorder imports alphabetically (isort)
- .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal
  Claude Code config backups so they never accidentally land in commits

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata and others added 12 commits April 24, 2026 03:29
… migration reference

- Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy
  Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard
  now covers all panels — /rules, /corrections, /self-healing, /observability)
- Gradata/migrations/supabase/: reference copies of cloud migrations 014-016
  applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at)
- Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture +
  pricing tier spec

Co-Authored-By: Gradata <noreply@gradata.ai>
Stale file created by a subagent Bash redirect. Grouped with the existing
Windows cmd.exe stdout misparse artifact entries.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23
  (cloud sync, hooks hardening, Supabase migrations, Streamlit archival,
  statusline session-count source, implicit_feedback emit-only contract)
- migrations/supabase/014,015: wrap constraint adds in DO blocks that check
  pg_constraint first, making re-runs safe on any DB (prod already had inline
  UNIQUE _key variants from CREATE TABLE; these migrations added redundant
  _unique variants, now documented as no-op on existing systems)
- migrations/supabase/README.md: document prod constraint state (both _key
  and _unique present on corrections + events) and drift-cleanup deferred

Co-Authored-By: Gradata <noreply@gradata.ai>
Critic audit flagged a silent-drop path: when resolve_brain_dir() returns
None (fresh install, CI env, unconfigured brain) the hook detected signals
but skipped emit() with no log — every correction became invisible.

- hooks/implicit_feedback.py: add debug log in the else branch recording
  how many signals were detected and of which types, so operators running
  `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb.
- tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the
  main() path (previously only _detect_signals was tested) — verifies the
  debug log fires on detected signals, stays quiet on no-signal input, and
  short messages don't crash.

Co-Authored-By: Gradata <noreply@gradata.ai>
Watermark stalls from 23505 unique-violations were invisible unless a
caller grepped logs: _post() logged everything at WARNING. Now HTTP 409
and any "23505" body are logged at ERROR with a body snippet, and the
last error is persisted to brain_dir/cloud_push_error.json so
'gradata doctor' can surface it ('fail' for constraint violations,
'warn' for other non-2xx). Successful pushes clear the file.

_post() signature is now (accepted, error_info|None); call sites and
the three existing tests patching _post are updated. A _coerce_post_result
shim tolerates legacy int returns from any external patches.

Closes T17 from the overnight backlog (critic finding cycle-2 #1).
Addresses three cycle-3 council findings on commit 492c3dd:

1. Non-atomic write (critic #1, high-severity race). `_record_push_error`
   now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent
   readers (doctor + daemon + MCP server) can no longer observe a truncated
   file that would mask a constraint violation as "error file unreadable".

2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo
   conflicting row values in `details`/`hint` fields, and `gradata doctor`
   prints the file verbatim. New `_scrub_error_body` parses the body as
   JSON and keeps only `code` + the first 120 chars of `message`
   (enough for the constraint name). Non-JSON bodies reduce to a length
   marker. Log messages use the scrubbed form too.

3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests
   exercised the bare-int branch it guarded; callers now destructure
   `_post` returns directly.

Tests: +2 (`test_post_error_body_scrubs_row_values`,
`test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files
pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean.

Co-Authored-By: Gradata <noreply@gradata.ai>
When doctor reports on cloud_push_error.json, the detail string now names
the brain directory it checked. In multi-brain deployments, push() and
doctor() can resolve different brain_dirs silently — surfacing the path
lets users spot the divergence instead of chasing phantom "ok" reports.

Cycle-3 critic finding #3.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…metry

Three bugs kept last_sync_at frozen:
- cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync
- cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics
- Stop hook never fired cloud sync because Claude Code doesn't call
  brain.end_session(). Added cloud_sync_tick() helper in _core.py and
  new _run_cloud_sync step in session_close.py waterfall.

Also elevated silent DEBUG failures to WARNING with HTTP status +
exc_info so the next failure mode surfaces in run.log.

3945 tests pass.

Co-Authored-By: Gradata <noreply@gradata.ai>
New CLI: gradata skill export <name> [--output-dir DIR] [--description STR]
                                      [--category CAT] [--no-meta]

The bet: Claude Skills' "gotchas" section is exactly what graduated
RULE-tier lessons are -- but generated from real corrections instead of
hand-written. This turns a brain into a portable, shippable Skill folder
with valid YAML frontmatter, category-grouped gotchas, and (when
available) injectable meta-principles.

- new module enhancements/skill_export.py reuses _parse_rules from
  rule_export so the RULE-only filter and [hooked] marker stripping
  stay consistent across exporters
- auto-generated frontmatter description lists rule categories with
  defensive 900-char clip (Anthropic 1024 ceiling)
- name slugified for safe folder name + frontmatter alignment
- description quote-escapes preserve YAML validity
- meta-rule loader degrades gracefully on missing system.db / table

24 new tests; full suite 3969 pass (+24, 0 regressions).

Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per
plans/swift-toasting-origami.md.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ignore scratch dirs

P0-1: AGENTS.md previously described 'Sprites Work multi-agent
TypeScript/Claude Flow framework' which is unrelated to this Python SDK.
Council unanimously flagged as credibility-killer for first-time evaluators.
Replaced with accurate guidance for AGENTS.md-aware coding agents.

P0-3 (partial): added .tmp/, .archive/, sessions/handoff-*.md, /0,
/BrainDetail to .gitignore so scratch artifacts stop getting committed.
Existing tracked scratch files left in place — destructive removal
deferred for user review.
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 30, 2026

Too many files changed for review. (243 files found, 100 file limit)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

Major architectural shift from cloud-centric to local-first model: meta-rule synthesis, graduation, and synthesis now execute locally in SQLite; cloud becomes a downstream mirror for dashboards, team sharing, and backups. Includes new SQLite/Supabase migrations, cloud sync with deterministic ID generation and JSONB transformation, event-based PII redaction, local transcript logging, device ID management, skill export, and extensive documentation updates reflecting the new design.

Changes

Cohort / File(s) Summary
Repository configuration
.gitignore, Gradata/.gitignore, Gradata/AGENTS.md
Extended to exclude Graphify outputs, Gradata runtime artifacts, session scratch files, and personal archives; added behavioral guidance for developers on testing, commits, and safe edits.
Architecture & Design Documentation
Gradata/docs/architecture/cloud-monolith-v2.md, Gradata/docs/architecture/multi-tenant-future-proofing.md, Gradata/docs/cloud/overview.md, Gradata/docs/cloud/dashboard.md, Gradata/docs/concepts/meta-rules.md, Gradata/docs/LEGACY_CLEANUP.md
Clarify that graduation, synthesis, and rule-to-hook promotion occur locally in SDK; cloud is downstream-only for visualization and sync; meta-rule synthesis works locally via direct Anthropic API or Claude Code Max OAuth without cloud dependency.
Changelog & Specifications
Gradata/CHANGELOG.md, Gradata/docs/specs/cloud-sync-and-pricing.md
Documents unreleased functional changes (dual-write cloud sync, local migrations, Anthropic JSONL sessions, deprecated Streamlit dashboard); introduces tiered pricing spec with feature matrix and sync protocol details.
Database Migrations
Gradata/migrations/supabase/014_corrections_unique.sql, Gradata/migrations/supabase/015_events_unique.sql, Gradata/migrations/supabase/016_brains_last_used_at.sql, Gradata/migrations/supabase/README.md, Gradata/src/gradata/_migrations/001_add_tenant_id.py, Gradata/src/gradata/_migrations/002_add_event_identity.py, Gradata/src/gradata/_migrations/003_add_sync_state.py, Gradata/src/gradata/_migrations/_ulid.py, Gradata/src/gradata/_migrations/device_uuid.py, Gradata/src/gradata/_migrations/fill_null_tenant.py, Gradata/src/gradata/_migrations/tenant_uuid.py, Gradata/src/gradata/_migrations/_runner.py
Supabase migrations add UNIQUE constraints on corrections/events and brains.last_used_at column; local SQLite migrations add event identity fields, sync state tracking, device/tenant IDs with race-safe generation; includes dependency-free ULID and device UUID utilities.
Core Event & Cloud Sync Rework
Gradata/src/gradata/_events.py, Gradata/src/gradata/_cloud_sync.py
Events now include PII redaction (email-based) with dual-write to canonical and raw side-log; cloud sync adds deterministic ID generation, row transformation with JSONB packing, deduplication, constraint violation classification, and persists push errors for diagnostics.
Meta-Rules Local Synthesis
Gradata/enhancements/meta_rules.py, Gradata/enhancements/meta_rules_storage.py, Gradata/enhancements/rule_synthesizer.py
Meta-rule discovery now clusters and synthesizes locally using configured LLM (direct API or Claude Code Max); merge_into_meta produces deterministic principles with applies_when/context_weights/examples fields; rule_synthesizer builds ranked wisdom blocks from graduated rules.
Transcript & Session Logging
Gradata/src/gradata/_transcript.py, Gradata/src/gradata/_transcript_providers.py
New opt-in transcript logger records conversation turns as JSONL per session; providers support both Claude Code and Gradata middleware sources; includes TTL-based cleanup.
New Skill & Export Utilities
Gradata/enhancements/skill_export.py, Gradata/src/gradata/cli.py
Skill export converts graduated rules to Claude Skill SKILL.md with YAML frontmatter and grouped gotchas; new CLI command gradata skill export with output-dir and metadata options; extends doctor diagnostics with cloud-specific checks.
Hooks & Session Management
Gradata/hooks/hooks.json, Gradata/skills/core/session-start/SKILL.md
Stop hook refactored into context-window watchdog (10s) followed by gated graduation sweep (90s); session-start skill documents minimal loading list, on-demand task-specific resources, and concise alert format.
Deprecated Dashboard Archive
Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
Complete Streamlit dashboard with four pages (Today, Is My AI Learning?, My Deals, Under the Hood) using SQLite backend and live event projections; archived as Streamlit UI is deprecated in favor of gradata.ai web routes.
Core Module Updates
Gradata/src/gradata/_core.py, Gradata/src/gradata/_doctor.py, Gradata/src/gradata/brain.py, Gradata/src/gradata/daemon.py
Cloud sync failures now logged at warning level with stack traces; doctor extended with cloud health probing; brain.correct telemetry logs debug on failure; new cloud_sync_tick function for hook-safe session syncing; rules_injected event now includes task in payload.
Brain Manifest & Context
Gradata/src/gradata/_brain_manifest.py, Gradata/src/gradata/_context_compile.py, Gradata/src/gradata/_context_packet.py
DB session cross-checks log at debug level instead of silently failing; fallback keyword search emits debug logs on error; context loaders replace silent exception suppression with targeted debug logging.
Query, Config, & Validator Modules
Gradata/src/gradata/_query.py, Gradata/src/gradata/_config.py, Gradata/src/gradata/_validator.py, Gradata/src/gradata/_db.py, Gradata/src/gradata/_paths.py
Primarily formatting refactors with no functional changes; validator results and db checks reflow for readability; config memory-weight fallback formatting unchanged.
Enhancements & Pattern Modules
Gradata/src/gradata/enhancements/*, Gradata/src/gradata/contrib/patterns/*, Gradata/src/gradata/enhancements/graduation/agent_graduation.py
Mostly formatting and code-quality improvements; agent_graduation now adds (instead of subtracts) MISFIRE_PENALTY for rejected outcomes; silent exception handlers replaced with debug logging across modules; no major functional changes to graduation, clustering, RAG, or scoring logic.
Utility & Helper Modules
Gradata/src/gradata/_export_brain.py, Gradata/src/gradata/_fact_extractor.py, Gradata/src/gradata/_file_lock.py, Gradata/src/gradata/_http.py, Gradata/src/gradata/_installer.py, Gradata/src/gradata/_manifest_helpers.py, Gradata/src/gradata/_manifest_metrics.py, Gradata/src/gradata/_mine_transcripts.py, Gradata/src/gradata/_telemetry.py, Gradata/src/gradata/_text_utils.py, Gradata/src/gradata/_types.py
Formatting refactors, debug logging additions for silent failures, and minor logic tweaks (e.g., timestamp generation in mine_transcripts uses UTC); no significant behavioral changes.
Client & Cloud Routes
Gradata/src/gradata/cloud/client.py, Gradata/src/gradata/cloud/sync.py
API base URL updated to https://api.gradata.ai/api/v1; sync endpoint changed from /brains/sync to /sync; HTTP error handling distinguishes constraint violations (409) from network errors with WARNING-level logging.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Both PRs modify gradata/enhancements/meta_rules.py and implement local LLM-based meta-rule synthesis with principle population.
  • Both PRs expand .gitignore to ignore Gradata runtime artifacts, session scratch files, and the pre-launch documentation draft.
  • Both PRs refactor the session-stop workflow and hook configuration (Stop hook behavior and session_close hook timeout).

Suggested labels

architecture, cloud, local-first, migrations, documentation

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/council-phase-a-hygiene

@coderabbitai coderabbitai Bot added the docs label Apr 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 31

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
Gradata/src/gradata/enhancements/clustering.py (1)

95-96: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid silent exception swallowing in _extract_domain fallback path.

Catching typed exceptions is good, but pass hides malformed scope_json and makes diagnosis harder. Log at warning level with context before falling back to "global".

Proposed fix
+import logging
 import re
 from dataclasses import dataclass, field
 
 from gradata._types import Lesson, LessonState
+
+logger = logging.getLogger(__name__)
...
             try:
                 scope = json.loads(lesson.scope_json)
                 return scope.get("domain", "") or "global"
             except (json.JSONDecodeError, TypeError):
-                pass
+                logger.warning("Invalid lesson.scope_json; defaulting domain to global", exc_info=True)
         return "global"

As per coding guidelines, "Gradata/**/*.py: Use typed exceptions or provide meaningful logging context (logger.warning(...) with exc_info=True) instead of silent failures".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/clustering.py` around lines 95 - 96, In
_extract_domain, do not silently swallow json.JSONDecodeError or TypeError for
the scope_json fallback; instead log a warning with context and the exception
info before returning "global". Replace the bare "except (json.JSONDecodeError,
TypeError): pass" with a logger.warning call (include scope_json value and
relevant identifier) and use exc_info=True so the stack/exception is recorded,
then proceed to the existing fallback return "global". Ensure you reference the
_extract_domain function and the scope_json variable when adding the log.
Gradata/src/gradata/enhancements/pattern_integration.py (1)

59-631: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

PR scope drift: this file change conflicts with the stated “no code changes” objective.

This PR is described as hygiene-only (AGENTS.md + .gitignore), but it includes broad edits in pattern_integration.py. Even if behavior is unchanged, this should be split into a separate PR (or removed here) to preserve reviewability and release traceability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/pattern_integration.py` around lines 59 -
631, This change set edits many functions (e.g., process_reflection_result,
process_guardrail_result, feed_q_router, process_loop_event,
process_parallel_failures, process_escalation and other helpers like
gates_from_graduated_rules, create_graduation_middleware,
strict_categories_from_rules) and therefore violates the PR's "no code changes"
scope; either revert these edits from this PR or extract them into a separate,
focused PR. To fix: remove the modifications to pattern_integration.py from this
branch (restore the file to the pre-change state) or create a new branch/PR that
contains only the pattern_integration.py changes with a clear description,
moving commits there and leaving this PR limited to the hygiene files (AGENTS.md
and .gitignore). Ensure the new PR references the functions above so reviewers
can find and evaluate the behavioral changes.
Gradata/src/gradata/_export_brain.py (2)

208-225: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

ctx is only partially honored, causing mixed-brain exports.

export_brain derives brain_dir/prospects_dir from ctx, but metadata and file collection still read from global _p paths. When ctx is provided, export content can come from multiple roots.

Suggested direction
-def read_version() -> str:
+def read_version(brain_dir: Path) -> str:
...
-def read_domain_name() -> str:
+def read_domain_name(working_dir: Path) -> str:
...
-def read_session_count() -> int:
+def read_session_count(brain_dir: Path) -> int:
...
-def collect_domain_files() -> list[tuple[str, Path]]:
+def collect_domain_files(working_dir: Path, gates_dir: Path) -> list[tuple[str, Path]]:
...
-def collect_brain_files(include_prospects: bool = True) -> list[tuple[str, Path]]:
+def collect_brain_files(brain_dir: Path, prospects_dir: Path, sessions_dir: Path, include_prospects: bool = True) -> list[tuple[str, Path]]:

Then pass ctx-resolved paths through export_brain(...) so all reads are scoped consistently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_export_brain.py` around lines 208 - 225, export_brain
currently uses ctx only for brain_dir/prospects_dir while still calling global
readers (read_version, read_domain_name, read_session_count, count_lessons,
_LESSONS_ARCHIVE/_LESSONS_ACTIVE) and collectors (collect_brain_files,
collect_domain_files) which read from module-global paths, leading to mixed-root
exports; update export_brain so that when ctx is provided you resolve/derive all
path inputs from ctx and pass them into the readers/collectors (or call
ctx-aware variants) — e.g., ensure
read_version/read_domain_name/read_session_count/count_lessons and
collect_brain_files/collect_domain_files are invoked with ctx-derived paths or a
ctx parameter so all metadata and file collection use the same root as
brain_dir/prospects_dir.

238-241: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid broad silent catch/continue in export pipeline.

Skipping files or falling back to a default manifest without logging makes export integrity hard to verify and debug in production.

Suggested fix
-        except Exception:
-            continue
+        except (OSError, UnicodeDecodeError) as exc:
+            logger.warning("Skipping unreadable export source %s", source_path, exc_info=exc)
+            continue
...
-    except Exception:
+    except (ImportError, OSError, ValueError, TypeError) as exc:
+        logger.warning("Falling back to minimal manifest generation", exc_info=exc)
         manifest = {

As per coding guidelines Gradata/**/*.py: “Use typed exceptions or provide meaningful logging context (logger.warning(...) with exc_info=True) instead of silent failures.”

Also applies to: 247-258

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_export_brain.py` around lines 238 - 241, The try/except
around source_path.read_text currently swallows all exceptions silently; update
it to either catch specific exceptions (e.g., OSError, UnicodeDecodeError) or
catch Exception as e and call the module logger (e.g., logger.warning) with a
clear message including source_path and exc_info=True, then continue. Apply the
same pattern to the later block referenced (lines ~247-258) so any file-read or
manifest-parsing errors are logged with context rather than being silently
ignored. Ensure you reference the same variables/methods (source_path.read_text,
manifest parsing code) when adding the logged exception handling.
Gradata/src/gradata/_migrations/tenant_uuid.py (1)

9-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove Sprites-specific path from documentation.

The docstring contains a Sprites-specific example path (SpritesWork/brain) which violates the coding guideline against leaking private-sibling paths into public docs. As per coding guidelines, references to Sprites-specific examples should not appear in public documentation.

📝 Proposed fix to use a generic example
-    from brain.scripts.migrations.tenant_uuid import get_or_create_tenant_id
-    tid = get_or_create_tenant_id(Path("C:/.../SpritesWork/brain"))
+    from gradata._migrations.tenant_uuid import get_or_create_tenant_id
+    tid = get_or_create_tenant_id(Path("/path/to/brain"))

As per coding guidelines: "Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, private emails, OneDrive paths, or Sprites-specific examples"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_migrations/tenant_uuid.py` around lines 9 - 10, The
docstring example in tenant_uuid.py leaks a Sprites-specific path; replace the
hardcoded Path("C:/.../SpritesWork/brain") example with a generic, non-private
placeholder (e.g. Path("/path/to/brain") or Path("path/to/brain")) in the
example that calls get_or_create_tenant_id so the documentation no longer
references Sprites-specific directories; update the example call that imports
get_or_create_tenant_id accordingly.
Gradata/src/gradata/_text_utils.py (1)

55-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

PR description inconsistency: formatting changes are code changes.

The PR objectives state "No code changes are included," but this file contains formatting changes to Python source. While these are non-functional (cosmetic only), they are still modifications to .py files. Consider clarifying the PR description to say "No functional code changes" or "Formatting-only code changes" to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_text_utils.py` around lines 55 - 143, The PR description
is misleading because formatting-only edits were made to Python source (e.g.,
changes in _FACTUAL_RE and the _STOP_WORDS set); update the PR title/body to
state "No functional code changes" or "Formatting-only code changes" so
reviewers know these are cosmetic edits and not behavioral changes to
functions/classes like _FACTUAL_RE or _STOP_WORDS.
Gradata/src/gradata/enhancements/rule_pipeline.py (1)

660-661: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Silent exception swallowing hides unexpected failures.

Catching Exception with pass silently discards unexpected errors (e.g., TypeError, AttributeError). While ImportError is expected for optional modules, other exceptions should be logged for debuggability. The same pattern appears at lines 671-672 and 681-682.

🔧 Suggested fix — log unexpected exceptions
     except (ImportError, Exception):
-        pass
+    except ImportError:
+        pass  # clustering module not available
+    except Exception as exc:
+        _log.debug("build_knowledge_graph: clustering failed: %s", exc)

Apply the same pattern to lines 671-672 and 681-682.

As per coding guidelines: "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_pipeline.py` around lines 660 - 661,
Replace the silent "except (ImportError, Exception): pass" blocks with explicit
handling: keep "except ImportError: pass" for the optional-import case, then add
"except Exception as e:" that logs the unexpected error using the module logger
(e.g., logger.warning("Unexpected error importing ...", exc_info=True)) or
logging.warning(..., exc_info=True) so unexpected errors are not swallowed;
apply this change for each occurrence of the current pattern ("except
(ImportError, Exception): pass") in rule_pipeline.py.
Gradata/src/gradata/hooks/_generated_runner_core.py (1)

39-45: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop swallowing hook-runner exceptions without context.

Both exception paths fail silently, which hides hook/runtime failures and makes production triage hard. Please log typed failures (or at least warning-level context with exc_info=True) before returning/continuing.

Proposed fix
@@
-    except Exception:
-        return 0
+    except Exception:
+        _log.warning("generated_runner: failed to read stdin payload", exc_info=True)
+        return 0
@@
-        except (subprocess.TimeoutExpired, FileNotFoundError):
+        except subprocess.TimeoutExpired:
+            _log.warning(
+                "generated_runner: hook timed out (hook=%s timeout=%ss)",
+                hook_path,
+                per_hook_timeout,
+                exc_info=True,
+            )
+            continue
+        except FileNotFoundError:
+            _log.warning(
+                "generated_runner: required executable missing while running hook=%s",
+                hook_path,
+                exc_info=True,
+            )
             continue
         except Exception:
+            _log.warning("generated_runner: unexpected hook failure (hook=%s)", hook_path, exc_info=True)
             continue

As per coding guidelines: Gradata/**/*.py: Use typed exceptions or provide meaningful logging context (logger.warning(...) with exc_info=True) instead of silent failures.

Also applies to: 67-70

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/hooks/_generated_runner_core.py` around lines 39 - 45,
Replace the silent except blocks around the stdin read and the later exception
handling (the try that reads sys.stdin into raw and sets payload_json, and the
separate try at the 67-70 region) with either typed exception catches or at
minimum log the exception before returning; e.g., catch Exception as e and call
logger.warning("Failed reading hook payload", exc_info=True) or
logger.exception("Failed reading hook payload") (using the module's logger)
before returning 0, and prefer narrowing the exception type if possible instead
of a bare Exception; ensure you reference the variables raw, payload_json and
sys.stdin.read in the updated handlers so the logged context is meaningful.
Gradata/src/gradata/enhancements/similarity.py (1)

243-262: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log embedding fallback failures instead of silently swallowing them.

_get_embedding returns None on any exception without context, which makes provider/config/runtime failures invisible.

Proposed fix
 def _get_embedding(text: str) -> list[float] | None:
@@
-    except Exception:
+    except Exception:
+        import logging
+        logging.getLogger(__name__).warning(
+            "embedding request failed; falling back to TF similarity",
+            exc_info=True,
+        )
         return None
As per coding guidelines "Use typed exceptions or provide meaningful logging context (`logger.warning(...)` with `exc_info=True`) instead of silent failures".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/similarity.py` around lines 243 - 262, The
_get_embedding function currently swallows all exceptions and returns None
silently; update it to catch exceptions and log a meaningful message including
context (e.g., text snippet length, _EMBED_MODEL, and _OLLAMA_BASE) using the
module logger (or a provided logger) with exc_info=True (e.g., logger.warning or
logger.error) before returning None so provider/config/runtime failures are
visible; locate this in _get_embedding where requests.post is called and add the
logging in the except block without changing the return behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py`:
- Around line 1-26: This archived dashboard file should be removed from the PR:
delete the added file from the commit (or run git rm --cached to untrack if you
need it locally), and add the .archive/ pattern to .gitignore so it won't be
re-added; if you must retain a copy, move it out of the repo or into a
local-only location and remove hard-coded local paths/constants (e.g.,
BRAIN_DIR, DB_PATH, EVENTS_PATH, TASKS_DIR) before committing any replacement so
no C:/Users/olive/SpritesWork/... paths remain in tracked files.

In `@Gradata/AGENTS.md`:
- Line 80: Add language specifiers to the three fenced code blocks in AGENTS.md
by changing their opening fences to include a language (use "text" for these
non-code snippets): the layering diagram fence that contains "Layer 2 — Public
API        brain.py, cli.py, daemon.py, mcp_server.py", the lifecycle diagram
fence that contains "INSTINCT → PATTERN → RULE → META_RULE", and the commit
message template fence that contains "<type>(<scope>): <imperative description>"
so they render with proper syntax highlighting.

In `@Gradata/docs/architecture/multi-tenant-future-proofing.md`:
- Around line 21-23: The markdown heading "### 1. Local-first stays the source
of truth" lacks the required blank line spacing; add a single blank line before
the heading and a single blank line after the heading so the heading is
surrounded by blank lines (i.e., ensure there is an empty line above "### 1.
Local-first stays the source of truth" and an empty line below it).

In `@Gradata/docs/concepts/meta-rules.md`:
- Around line 47-50: Summary: The docs contain a typo in the Anthropic API env
var name; replace the incorrect backticked token `ANTHOPIC_API_KEY` with the
correct `ANTHROPIC_API_KEY`. Locate the string "set `ANTHOPIC_API_KEY`" in the
meta-rules.md text and update it to "set `ANTHROPIC_API_KEY`" so the environment
variable name is spelled correctly (reference token: ANTHROPIC_API_KEY).

In `@Gradata/docs/LEGACY_CLEANUP.md`:
- Around line 16-44: The markdown headings in LEGACY_CLEANUP.md (e.g., "### 1.
Deprecated adapter shims (scheduled v0.8.0)", "### 2. `_cloud_sync.py`
terminology", "### 3. Docstring drift in `meta_rules.py`", "### 4. Test-level
cloud gating", "### 5. `api_key` kwarg on `merge_into_meta`", etc.) are missing
required surrounding blank lines and trigger MD022; fix by inserting a single
blank line before and after each `###` heading so every heading has an empty
line above and below, then run markdownlint to confirm the MD022 warnings are
resolved.

In `@Gradata/hooks/hooks.json`:
- Around line 52-68: The change to hooks.json modifies runtime behavior by
adding the ctx_watchdog command entry ("python -m gradata.hooks.ctx_watchdog")
and extending the session_close hook timeout to 90000ms, so either update the PR
title/description to state it contains behavior-changing hook updates
(mentioning ctx_watchdog and session_close) or split these two hook updates into
a separate PR so this one remains hygiene-only; make sure the commit/PR text
clearly references the "Gradata: context-window watchdog — write handoff at
threshold" and "Gradata: gated graduation sweep (concurrency-locked, SDK-only
synth, throttled)" hook entries when you choose which option.

In `@Gradata/skills/core/session-start/SKILL.md`:
- Line 12: The SKILL.md contains hardcoded private Windows paths like
"C:/Users/olive/SpritesWork/brain/continuation.md" and
"C:/Users/olive/SpritesWork/brain/scripts/continuation.py" (also appears around
lines 17-21 and 43-50); replace these literal paths with environment-variable or
relative placeholders (for example use $BRAIN_DIR/continuation.md or
{brain_dir}/continuation.md and $BRAIN_DIR/scripts/continuation.py or
./scripts/continuation.py) and update the archive invocation to use the
placeholder (e.g., python $BRAIN_DIR/scripts/continuation.py archive) so no
user-specific paths remain in SKILL.md while preserving the intended commands
and file references.
- Around line 32-36: Add a language identifier to the fenced code block in
SKILL.md that contains the three template lines so the linter (MD040) stops
flagging it; edit the triple-backtick fence that encloses the block starting
with "[check] S[N] loaded..." and change it to use "text" (i.e., ```text) so the
code block is explicitly marked as plaintext.

In `@Gradata/src/gradata/_brain_manifest.py`:
- Around line 64-77: The try/except can leak the DB connection and currently
only logs a debug string; update the block that uses ctx, db, conn,
get_connection and version_info to ensure conn is always closed (use a
try/finally or a context manager around get_connection/conn) and replace the
weak _log.debug call with a meaningful warning that includes exception context
(e.g., _log.warning(..., exc_info=True)); prefer catching a more specific DB
exception (such as sqlite3.Error) if available, otherwise keep Exception but
still ensure conn.close() runs in finally and log with exc_info=True and clear
message.

In `@Gradata/src/gradata/_context_compile.py`:
- Around line 94-95: The except block currently catches Exception broadly and
only logs at DEBUG: change it to catch the specific exception types raised by
the fallback keyword search (e.g., KeyError, ValueError, LookupError — whichever
apply in this code path) or, if the set of possible exceptions is unknown, keep
a broad except but promote the log to a warning and include the traceback;
replace the existing "except Exception as e: _log.debug(...)" with either
"except SpecificErrorType as e: _log.warning(..., exc_info=True)" for each
expected type or a single "except Exception as e: _log.warning('Fallback keyword
search failed (non-fatal)', exc_info=True)" to ensure meaningful, visible logs;
locate the handler by searching for the except block using the _log.debug call
in _context_compile.py.

In `@Gradata/src/gradata/_context_packet.py`:
- Around line 98-99: Replace the broad "except Exception as e" handlers in
Gradata/src/gradata/_context_packet.py (the blocks that currently call
_log.debug("user_scope: corrections query failed (non-fatal): %s", e) and the
other similar DEBUG-only catches) with typed exception catches where possible
(e.g., catch the specific DB/API/ValueError exceptions raised by the query
logic) or, if the exact exception types are not easily determined, change the
handler to _log.warning("user_scope: corrections query failed (non-fatal)",
exc_info=True) so the failure is logged at warning level with full traceback;
apply the same change to each similar block referenced in the comment (the
groups at the later try/except locations) and keep the original non-fatal
behavior without suppressing exception details.

In `@Gradata/src/gradata/_core.py`:
- Around line 1403-1407: The code opens a DB with sqlite3.connect(db_path) which
bypasses the project's connection setup; replace that call with
get_connection(db_path) so the connection has the proper row_factory and other
configuration used elsewhere, and keep the surrounding logic that executes the
query ("SELECT data_json FROM events WHERE type = 'CORRECTION' AND session = ?")
using the same session_number variable and fetching rows into rows.

In `@Gradata/src/gradata/_events.py`:
- Around line 227-228: Replace the silent bare exception in the except block
(currently written as "except Exception: pass") with explicit logging: catch the
exception as "except Exception as e:" and call the module logger (e.g.,
logger.debug or logger.warning) with a descriptive message and exc_info=True so
the stacktrace is recorded (for example: logger.debug("Failed writing raw
side-log", exc_info=True)). If a more specific exception type is known for the
raw side-log write, prefer that specific exception instead of Exception;
otherwise log and continue to preserve the original behavior.

In `@Gradata/src/gradata/_export_brain.py`:
- Around line 123-133: The code is silently swallowing file read/parse errors in
the prospect mapping block (the try around f.read_text and re.search), which
hides malformed files and makes redaction nondeterministic; replace the bare
"except Exception: pass" with typed exception handling (e.g., except (OSError,
UnicodeDecodeError, re.error) as e:) and call the module logger with context
(e.g., logger.warning("Failed to read/parse prospect file %s for mapping: %s",
f, e, exc_info=True)) so failures are recorded and the loop can continue; ensure
a module-level logger exists or is imported and apply the same change to the
similar block that handles company mapping (the second try/except handling
fm_company and other parsing).

In `@Gradata/src/gradata/_installer.py`:
- Around line 174-196: Replace the bare "except Exception: pass" blocks around
parsing meta_file and manifest_file with typed exception handling: catch
json.JSONDecodeError and OSError (or more specific IO errors) for the
read_text/json.loads operations and call logger.warning with a clear context
message and exc_info=True (referencing meta_file, manifest_file and the
functions updating info and info.update). Apply the same change in both places
(the try/except around meta = json.loads(meta_file.read_text(...)) and the
try/except around manifest = json.loads(manifest_file.read_text(...))) so
parsing/IO failures are logged instead of silently ignored.

In `@Gradata/src/gradata/_manifest_metrics.py`:
- Around line 49-50: Replace the bare exception-message debug logs in each
except Exception block so the logger includes traceback context by passing
exc_info=True; specifically, update the handlers that call
_log.debug("lesson_distribution read failed (non-fatal): %s", e) and the other
similar _log.debug(...) calls (the ones around the other blocks noted in the
review) to include exc_info=True as an extra kwarg (e.g., _log.debug(...,
exc_info=True)) so the full traceback is captured; apply this change to every
similar except block referenced (the clusters around the other message
locations).

In `@Gradata/src/gradata/_migrations/003_add_sync_state.py`:
- Around line 127-128: Remove the redundant sys.path insertion by deleting the
second sys.path.insert(0, str(Path(__file__).resolve().parent)) call immediately
before the import of tenant_uuid/get_or_create_tenant_id; keep the original
sys.path.insert at the top of the module and leave the from tenant_uuid import
get_or_create_tenant_id import intact so resolution still works.

In `@Gradata/src/gradata/_migrations/device_uuid.py`:
- Around line 55-77: When os.open(tmp, flags, 0o644) raises FileExistsError (PID
collision), do not fall through to return new_did; instead handle the stale-temp
edge by attempting to remove or rotate the existing tmp and retry the write/read
flow (or raise) so new_did is only returned if persisted. Update the logic
around the os.open/tmp/fpath handling (variables tmp, fd, new_did, fpath and the
_is_valid check) to either: (a) retry the create/write/atomic replace sequence a
bounded number of times after cleaning up the stale tmp, or (b) if retries fail,
raise an exception instead of returning new_did, ensuring the function never
returns an unpersisted new_did. Ensure the cleanup uses
contextlib.suppress(OSError) like the existing code and that any raised
exception includes context so callers can handle the persistence failure.

In `@Gradata/src/gradata/_transcript_providers.py`:
- Around line 54-61: The TOCTOU race in _locate() can raise OSError during
max(all_jsonls, key=lambda p: p.stat().st_mtime) if a file disappears; fix it by
computing safe mtimes first: iterate all_jsonls, call p.stat() inside a
try/except OSError block, collect only (p, mtime) pairs that succeed, and then
choose the path with the largest mtime (or return None if none); update the code
paths that reference all_jsonls/max to use this prefiltered list so stat errors
are handled gracefully.

In `@Gradata/src/gradata/_transcript.py`:
- Around line 118-119: The except OSError: pass block should log the exception
like the earlier handler does; replace the silent pass with a debug log using
the module logger (_log) and include the exception details (e.g.,
_log.debug("OSError while <describe context>", exc_info=True) or
_log.debug("OSError: %s", e, exc_info=True)) so the OSError is recorded
consistently with the handler at lines ~95-96.

In `@Gradata/src/gradata/_types.py`:
- Around line 179-181: The field declaration for _contradiction_streak currently
uses unnecessary parentheses around the literal; update the class attribute
_contradiction_streak to a single-line assignment without parentheses (e.g.,
_contradiction_streak: int = 0) and remove the trailing comment line break so it
reads idiomatically while preserving the existing comment text inline or on the
preceding line as desired.

In `@Gradata/src/gradata/cloud/client.py`:
- Line 29: The current DEFAULT_ENDPOINT includes "/api/v1" but user-supplied
GRADATA_ENDPOINT values (e.g., "https://api.gradata.ai") are only trimmed for
trailing slashes and not normalized, causing _post(), connect(), and sync() to
construct routes missing the "/api/v1" prefix and return 404s; update the
endpoint normalization logic (where DEFAULT_ENDPOINT and GRADATA_ENDPOINT are
handled) to parse the provided URL, strip trailing slashes, and if the path
component is empty or just "/", append "/api/v1" (but do not double-append if
"/api/v1" or any other path already exists), so that _post(), connect(), and
sync() always construct full routes like "https://.../api/v1/brains/..."
regardless of host-only inputs.

In `@Gradata/src/gradata/cloud/sync.py`:
- Around line 149-151: The except json.JSONDecodeError handlers currently
swallow non-JSON responses by logging and returning {} which is treated as
success and allows last_sync_at to advance; update those except blocks (the
json.JSONDecodeError handlers that call log.warning and return {}) to treat
non-JSON as a sync failure by returning a failure sentinel or raising a specific
exception instead of {}, and ensure the caller that updates last_sync_at checks
for that failure value (or catches the exception) so last_sync_at is not
advanced on non-JSON responses; apply the same change to both
json.JSONDecodeError handlers referenced in this file.

In `@Gradata/src/gradata/enhancements/bandits/collaborative_filter.py`:
- Around line 164-167: The inline comment "# Cap below RULE" is placed next to
the precision argument of round() making it ambiguous; move or rephrase it so it
clearly references the min(0.89, ...) cap on lesson.confidence (e.g., place the
comment on the same line as min(0.89, lesson.confidence + boost) or as a
separate line above that expression), ensuring the update points to the
lesson.confidence update in the round(min(...), 2) call and leaves the precision
comment (if any) adjacent to the 2.

In `@Gradata/src/gradata/enhancements/diff_engine.py`:
- Around line 280-283: The except block that sets _default_embedder_unavailable
currently uses _log.debug without exception context; change it to emit a
warning-level log with traceback by replacing the _log.debug call with
_log.warning and passing exc_info=True (keep the same descriptive message), so
the failure in loading the default embedder (the block that sets
_default_embedder_unavailable and returns None) is logged with full exception
context.

In `@Gradata/src/gradata/enhancements/learning_pipeline.py`:
- Line 73: The file Gradata/src/gradata/enhancements/learning_pipeline.py
contains formatting-only churn that is out of scope for this PR; revert those
whitespace/formatting edits (they appear around lines referenced in the review)
and restore the original file contents so no behavioral changes
occur—specifically undo any non-functional changes in the LearningPipeline class
and helper functions such as create_learning_pipeline / build_pipeline (or other
top-level functions in that module) so only the intended AGENTS.md and
.gitignore edits remain in the PR; you can restore the file from the base branch
(e.g., git checkout -- <file> or reset the hunks) and verify by running
tests/lint to ensure no functional diffs remain.

In `@Gradata/src/gradata/enhancements/meta_rules.py`:
- Around line 1209-1230: In the deterministic fallback branch (where
_try_llm_principle returns false) ensure the produced MetaRule is marked
non-injectable instead of relying on a log message: when constructing the
fallback principle and setting source="deterministic", also set the MetaRule
injectable flag or equivalent attribute (e.g., MetaRule.injectable = False or
MetaRule.source_kind = "deterministic" and ensure prompt formatter checks that
flag) so callers that forward new_metas cannot have deterministic meta-rules
rendered into prompts; update the creation path that yields new_metas (the code
that uses principle/source to build MetaRule objects) to explicitly set the
non-injectable marker and ensure the prompt formatter only renders meta-rules
with the injectable flag or in INJECTABLE_META_SOURCES.
- Around line 273-296: _cluster_by_similarity currently picks the first element
of the incoming lessons list as the centroid, making clusters non-deterministic
when input order varies; make the seed order deterministic by sorting the
lessons before clustering (inside _cluster_by_similarity) using a stable, unique
key from the Lesson objects (for example lesson.id if available, else a
deterministic fallback like lesson.description or a hash of it), so that
centroid selection (the centroid variable and unclustered list) is consistent
across runs; ensure the sort is stable and applied once at the start before the
greedy loop.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py`:
- Around line 169-171: The catch-all exception handlers in rule_synthesizer.py
(e.g., the block logging "synth: anthropic SDK failed: %s" using _log.debug)
should be replaced with either more specific exception types where possible or
at minimum escalate the log to warning/error and include the stack trace via
exc_info=True; update the handler(s) around the anthropic SDK call and the other
similar blocks (the handlers at the other locations referenced: ~195-197,
~223-225, ~252-254) to call _log.warning(..., exc_info=True) or _log.error(...,
exc_info=True) and narrow the except clauses if you can catch specific
exceptions from the SDK rather than bare "except Exception". Ensure you keep the
current return None behavior after logging if that is desired.
- Around line 228-254: The _call_http function uses the incoming model string
directly as a base URL, so add strict transport validation: parse model with
urllib.parse.urlparse and allow only scheme "https" or allow "http" only when
the hostname is a loopback (localhost, 127.0.0.1, ::1); if the URL fails
validation log a debug/error and return None instead of making the request.
Update the same validation logic in the other HTTP helper that also treats model
as a URL (the function around lines 275-283, e.g., the async/http variant), and
use the validated URL when constructing the openai.OpenAI client to prevent
SSRF/bearer-key exposure.
- Around line 200-225: The _call_gemini function accepts a timeout but never
applies it to the Gemini request; update the GenerateContentConfig construction
(genai_types.GenerateContentConfig) to pass http_options using
genai_types.HttpOptions with timeout set to int(timeout * 1000) so the
per-request timeout (milliseconds) is enforced; ensure the new http_options
argument is included alongside system_instruction and max_output_tokens before
calling client.models.generate_content.

---

Outside diff comments:
In `@Gradata/src/gradata/_export_brain.py`:
- Around line 208-225: export_brain currently uses ctx only for
brain_dir/prospects_dir while still calling global readers (read_version,
read_domain_name, read_session_count, count_lessons,
_LESSONS_ARCHIVE/_LESSONS_ACTIVE) and collectors (collect_brain_files,
collect_domain_files) which read from module-global paths, leading to mixed-root
exports; update export_brain so that when ctx is provided you resolve/derive all
path inputs from ctx and pass them into the readers/collectors (or call
ctx-aware variants) — e.g., ensure
read_version/read_domain_name/read_session_count/count_lessons and
collect_brain_files/collect_domain_files are invoked with ctx-derived paths or a
ctx parameter so all metadata and file collection use the same root as
brain_dir/prospects_dir.
- Around line 238-241: The try/except around source_path.read_text currently
swallows all exceptions silently; update it to either catch specific exceptions
(e.g., OSError, UnicodeDecodeError) or catch Exception as e and call the module
logger (e.g., logger.warning) with a clear message including source_path and
exc_info=True, then continue. Apply the same pattern to the later block
referenced (lines ~247-258) so any file-read or manifest-parsing errors are
logged with context rather than being silently ignored. Ensure you reference the
same variables/methods (source_path.read_text, manifest parsing code) when
adding the logged exception handling.

In `@Gradata/src/gradata/_migrations/tenant_uuid.py`:
- Around line 9-10: The docstring example in tenant_uuid.py leaks a
Sprites-specific path; replace the hardcoded Path("C:/.../SpritesWork/brain")
example with a generic, non-private placeholder (e.g. Path("/path/to/brain") or
Path("path/to/brain")) in the example that calls get_or_create_tenant_id so the
documentation no longer references Sprites-specific directories; update the
example call that imports get_or_create_tenant_id accordingly.

In `@Gradata/src/gradata/_text_utils.py`:
- Around line 55-143: The PR description is misleading because formatting-only
edits were made to Python source (e.g., changes in _FACTUAL_RE and the
_STOP_WORDS set); update the PR title/body to state "No functional code changes"
or "Formatting-only code changes" so reviewers know these are cosmetic edits and
not behavioral changes to functions/classes like _FACTUAL_RE or _STOP_WORDS.

In `@Gradata/src/gradata/enhancements/clustering.py`:
- Around line 95-96: In _extract_domain, do not silently swallow
json.JSONDecodeError or TypeError for the scope_json fallback; instead log a
warning with context and the exception info before returning "global". Replace
the bare "except (json.JSONDecodeError, TypeError): pass" with a logger.warning
call (include scope_json value and relevant identifier) and use exc_info=True so
the stack/exception is recorded, then proceed to the existing fallback return
"global". Ensure you reference the _extract_domain function and the scope_json
variable when adding the log.

In `@Gradata/src/gradata/enhancements/pattern_integration.py`:
- Around line 59-631: This change set edits many functions (e.g.,
process_reflection_result, process_guardrail_result, feed_q_router,
process_loop_event, process_parallel_failures, process_escalation and other
helpers like gates_from_graduated_rules, create_graduation_middleware,
strict_categories_from_rules) and therefore violates the PR's "no code changes"
scope; either revert these edits from this PR or extract them into a separate,
focused PR. To fix: remove the modifications to pattern_integration.py from this
branch (restore the file to the pre-change state) or create a new branch/PR that
contains only the pattern_integration.py changes with a clear description,
moving commits there and leaving this PR limited to the hygiene files (AGENTS.md
and .gitignore). Ensure the new PR references the functions above so reviewers
can find and evaluate the behavioral changes.

In `@Gradata/src/gradata/enhancements/rule_pipeline.py`:
- Around line 660-661: Replace the silent "except (ImportError, Exception):
pass" blocks with explicit handling: keep "except ImportError: pass" for the
optional-import case, then add "except Exception as e:" that logs the unexpected
error using the module logger (e.g., logger.warning("Unexpected error importing
...", exc_info=True)) or logging.warning(..., exc_info=True) so unexpected
errors are not swallowed; apply this change for each occurrence of the current
pattern ("except (ImportError, Exception): pass") in rule_pipeline.py.

In `@Gradata/src/gradata/enhancements/similarity.py`:
- Around line 243-262: The _get_embedding function currently swallows all
exceptions and returns None silently; update it to catch exceptions and log a
meaningful message including context (e.g., text snippet length, _EMBED_MODEL,
and _OLLAMA_BASE) using the module logger (or a provided logger) with
exc_info=True (e.g., logger.warning or logger.error) before returning None so
provider/config/runtime failures are visible; locate this in _get_embedding
where requests.post is called and add the logging in the except block without
changing the return behavior.

In `@Gradata/src/gradata/hooks/_generated_runner_core.py`:
- Around line 39-45: Replace the silent except blocks around the stdin read and
the later exception handling (the try that reads sys.stdin into raw and sets
payload_json, and the separate try at the 67-70 region) with either typed
exception catches or at minimum log the exception before returning; e.g., catch
Exception as e and call logger.warning("Failed reading hook payload",
exc_info=True) or logger.exception("Failed reading hook payload") (using the
module's logger) before returning 0, and prefer narrowing the exception type if
possible instead of a bare Exception; ensure you reference the variables raw,
payload_json and sys.stdin.read in the updated handlers so the logged context is
meaningful.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: da609f72-a493-459e-81df-68ba29a39481

📥 Commits

Reviewing files that changed from the base of the PR and between 5635a66 and 397f7ae.

⛔ Files ignored due to path filters (1)
  • .claude/hooks/statusline/sprites-statusline.js is excluded by !.claude/**
📒 Files selected for processing (242)
  • .gitignore
  • Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
  • Gradata/.gitignore
  • Gradata/AGENTS.md
  • Gradata/CHANGELOG.md
  • Gradata/docs/LEGACY_CLEANUP.md
  • Gradata/docs/architecture/cloud-monolith-v2.md
  • Gradata/docs/architecture/multi-tenant-future-proofing.md
  • Gradata/docs/cloud/dashboard.md
  • Gradata/docs/cloud/overview.md
  • Gradata/docs/concepts/meta-rules.md
  • Gradata/docs/specs/cloud-sync-and-pricing.md
  • Gradata/hooks/hooks.json
  • Gradata/migrations/supabase/014_corrections_unique.sql
  • Gradata/migrations/supabase/015_events_unique.sql
  • Gradata/migrations/supabase/016_brains_last_used_at.sql
  • Gradata/migrations/supabase/README.md
  • Gradata/skills/core/session-start/SKILL.md
  • Gradata/src/gradata/__init__.py
  • Gradata/src/gradata/_brain_manifest.py
  • Gradata/src/gradata/_cloud_sync.py
  • Gradata/src/gradata/_config.py
  • Gradata/src/gradata/_config_paths.py
  • Gradata/src/gradata/_context_compile.py
  • Gradata/src/gradata/_context_packet.py
  • Gradata/src/gradata/_core.py
  • Gradata/src/gradata/_data_flow_audit.py
  • Gradata/src/gradata/_db.py
  • Gradata/src/gradata/_doctor.py
  • Gradata/src/gradata/_events.py
  • Gradata/src/gradata/_export_brain.py
  • Gradata/src/gradata/_fact_extractor.py
  • Gradata/src/gradata/_file_lock.py
  • Gradata/src/gradata/_http.py
  • Gradata/src/gradata/_installer.py
  • Gradata/src/gradata/_manifest_helpers.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/_migrations/001_add_tenant_id.py
  • Gradata/src/gradata/_migrations/002_add_event_identity.py
  • Gradata/src/gradata/_migrations/003_add_sync_state.py
  • Gradata/src/gradata/_migrations/_runner.py
  • Gradata/src/gradata/_migrations/_ulid.py
  • Gradata/src/gradata/_migrations/device_uuid.py
  • Gradata/src/gradata/_migrations/fill_null_tenant.py
  • Gradata/src/gradata/_migrations/tenant_uuid.py
  • Gradata/src/gradata/_mine_transcripts.py
  • Gradata/src/gradata/_paths.py
  • Gradata/src/gradata/_query.py
  • Gradata/src/gradata/_stats.py
  • Gradata/src/gradata/_telemetry.py
  • Gradata/src/gradata/_tenant.py
  • Gradata/src/gradata/_text_utils.py
  • Gradata/src/gradata/_transcript.py
  • Gradata/src/gradata/_transcript_providers.py
  • Gradata/src/gradata/_types.py
  • Gradata/src/gradata/_validator.py
  • Gradata/src/gradata/_workers.py
  • Gradata/src/gradata/adapters/mem0.py
  • Gradata/src/gradata/audit.py
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/brain_inspection.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/cloud/client.py
  • Gradata/src/gradata/cloud/sync.py
  • Gradata/src/gradata/context_wrapper.py
  • Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
  • Gradata/src/gradata/contrib/enhancements/install_manifest.py
  • Gradata/src/gradata/contrib/enhancements/quality_gates.py
  • Gradata/src/gradata/contrib/enhancements/truth_protocol.py
  • Gradata/src/gradata/contrib/patterns/__init__.py
  • Gradata/src/gradata/contrib/patterns/agent_modes.py
  • Gradata/src/gradata/contrib/patterns/context_brackets.py
  • Gradata/src/gradata/contrib/patterns/evaluator.py
  • Gradata/src/gradata/contrib/patterns/execute_qualify.py
  • Gradata/src/gradata/contrib/patterns/guardrails.py
  • Gradata/src/gradata/contrib/patterns/human_loop.py
  • Gradata/src/gradata/contrib/patterns/loop_detection.py
  • Gradata/src/gradata/contrib/patterns/mcp.py
  • Gradata/src/gradata/contrib/patterns/memory.py
  • Gradata/src/gradata/contrib/patterns/middleware.py
  • Gradata/src/gradata/contrib/patterns/orchestrator.py
  • Gradata/src/gradata/contrib/patterns/parallel.py
  • Gradata/src/gradata/contrib/patterns/pipeline.py
  • Gradata/src/gradata/contrib/patterns/q_learning_router.py
  • Gradata/src/gradata/contrib/patterns/rag.py
  • Gradata/src/gradata/contrib/patterns/reconciliation.py
  • Gradata/src/gradata/contrib/patterns/reflection.py
  • Gradata/src/gradata/contrib/patterns/sub_agents.py
  • Gradata/src/gradata/contrib/patterns/task_escalation.py
  • Gradata/src/gradata/contrib/patterns/tools.py
  • Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
  • Gradata/src/gradata/correction_detector.py
  • Gradata/src/gradata/daemon.py
  • Gradata/src/gradata/detection/addition_pattern.py
  • Gradata/src/gradata/enhancements/_sanitize.py
  • Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
  • Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
  • Gradata/src/gradata/enhancements/behavioral_engine.py
  • Gradata/src/gradata/enhancements/causal_chains.py
  • Gradata/src/gradata/enhancements/cluster_manager.py
  • Gradata/src/gradata/enhancements/clustering.py
  • Gradata/src/gradata/enhancements/contradiction_detector.py
  • Gradata/src/gradata/enhancements/dedup.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/enhancements/edit_classifier.py
  • Gradata/src/gradata/enhancements/freshness.py
  • Gradata/src/gradata/enhancements/git_backfill.py
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
  • Gradata/src/gradata/enhancements/instruction_cache.py
  • Gradata/src/gradata/enhancements/learning_pipeline.py
  • Gradata/src/gradata/enhancements/lesson_discriminator.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/memory_taxonomy.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/src/gradata/enhancements/metrics.py
  • Gradata/src/gradata/enhancements/observation_hooks.py
  • Gradata/src/gradata/enhancements/pattern_extractor.py
  • Gradata/src/gradata/enhancements/pattern_integration.py
  • Gradata/src/gradata/enhancements/pipeline_rewriter.py
  • Gradata/src/gradata/enhancements/profiling/tone_profile.py
  • Gradata/src/gradata/enhancements/prompt_synthesizer.py
  • Gradata/src/gradata/enhancements/reporting.py
  • Gradata/src/gradata/enhancements/retrieval_fusion.py
  • Gradata/src/gradata/enhancements/router_warmstart.py
  • Gradata/src/gradata/enhancements/rule_canary.py
  • Gradata/src/gradata/enhancements/rule_context_bridge.py
  • Gradata/src/gradata/enhancements/rule_export.py
  • Gradata/src/gradata/enhancements/rule_integrity.py
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/enhancements/rule_to_hook.py
  • Gradata/src/gradata/enhancements/rule_verifier.py
  • Gradata/src/gradata/enhancements/scoring/brain_scores.py
  • Gradata/src/gradata/enhancements/scoring/calibration.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/enhancements/scoring/failure_detectors.py
  • Gradata/src/gradata/enhancements/scoring/gate_calibration.py
  • Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
  • Gradata/src/gradata/enhancements/scoring/memory_extraction.py
  • Gradata/src/gradata/enhancements/scoring/reports.py
  • Gradata/src/gradata/enhancements/scoring/success_conditions.py
  • Gradata/src/gradata/enhancements/self_improvement/__init__.py
  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
  • Gradata/src/gradata/enhancements/similarity.py
  • Gradata/src/gradata/enhancements/skill_export.py
  • Gradata/src/gradata/events_bus.py
  • Gradata/src/gradata/graph.py
  • Gradata/src/gradata/hooks/_base.py
  • Gradata/src/gradata/hooks/_generated_runner_core.py
  • Gradata/src/gradata/hooks/_installer.py
  • Gradata/src/gradata/hooks/_profiles.py
  • Gradata/src/gradata/hooks/agent_graduation.py
  • Gradata/src/gradata/hooks/agent_precontext.py
  • Gradata/src/gradata/hooks/auto_correct.py
  • Gradata/src/gradata/hooks/brain_maintain.py
  • Gradata/src/gradata/hooks/claude_code.py
  • Gradata/src/gradata/hooks/client.py
  • Gradata/src/gradata/hooks/config_protection.py
  • Gradata/src/gradata/hooks/config_validate.py
  • Gradata/src/gradata/hooks/context_inject.py
  • Gradata/src/gradata/hooks/ctx_watchdog.py
  • Gradata/src/gradata/hooks/daemon.py
  • Gradata/src/gradata/hooks/dispatch_post.py
  • Gradata/src/gradata/hooks/duplicate_guard.py
  • Gradata/src/gradata/hooks/generated_runner.py
  • Gradata/src/gradata/hooks/generated_runner_post.py
  • Gradata/src/gradata/hooks/graph_first_check.py
  • Gradata/src/gradata/hooks/graph_session_track.py
  • Gradata/src/gradata/hooks/implicit_feedback.py
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/src/gradata/hooks/pre_compact.py
  • Gradata/src/gradata/hooks/rule_enforcement.py
  • Gradata/src/gradata/hooks/secret_scan.py
  • Gradata/src/gradata/hooks/self_review.py
  • Gradata/src/gradata/hooks/session_boot.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/src/gradata/hooks/session_persist.py
  • Gradata/src/gradata/hooks/stale_hook_check.py
  • Gradata/src/gradata/hooks/status_line.py
  • Gradata/src/gradata/hooks/telemetry_summary.py
  • Gradata/src/gradata/hooks/tool_failure_emit.py
  • Gradata/src/gradata/hooks/tool_finding_capture.py
  • Gradata/src/gradata/inspection.py
  • Gradata/src/gradata/integrations/anthropic_adapter.py
  • Gradata/src/gradata/integrations/openai_adapter.py
  • Gradata/src/gradata/mcp_server.py
  • Gradata/src/gradata/mcp_tools.py
  • Gradata/src/gradata/middleware/__init__.py
  • Gradata/src/gradata/middleware/_core.py
  • Gradata/src/gradata/middleware/anthropic_adapter.py
  • Gradata/src/gradata/middleware/crewai_adapter.py
  • Gradata/src/gradata/middleware/langchain_adapter.py
  • Gradata/src/gradata/middleware/openai_adapter.py
  • Gradata/src/gradata/notifications.py
  • Gradata/src/gradata/onboard.py
  • Gradata/src/gradata/rules/rule_context.py
  • Gradata/src/gradata/rules/rule_engine/__init__.py
  • Gradata/src/gradata/rules/rule_engine/_formatting.py
  • Gradata/src/gradata/rules/rule_ranker.py
  • Gradata/src/gradata/rules/scope.py
  • Gradata/src/gradata/safety.py
  • Gradata/src/gradata/security/correction_hash.py
  • Gradata/src/gradata/security/correction_provenance.py
  • Gradata/src/gradata/security/manifest_signing.py
  • Gradata/src/gradata/sidecar/watcher.py
  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_graduation.py
  • Gradata/tests/test_bug_fixes.py
  • Gradata/tests/test_cloud_row_push.py
  • Gradata/tests/test_cloud_sync.py
  • Gradata/tests/test_cluster_injection.py
  • Gradata/tests/test_ctx_watchdog.py
  • Gradata/tests/test_doctor_cloud.py
  • Gradata/tests/test_emit_pii_redaction.py
  • Gradata/tests/test_graph_enforcement.py
  • Gradata/tests/test_hooks_intelligence.py
  • Gradata/tests/test_hooks_learning.py
  • Gradata/tests/test_implicit_feedback.py
  • Gradata/tests/test_inject_watchdog_phases.py
  • Gradata/tests/test_integration_workflow.py
  • Gradata/tests/test_lesson_applications.py
  • Gradata/tests/test_llm_synthesizer.py
  • Gradata/tests/test_mem0_adapter.py
  • Gradata/tests/test_meta_rule_generalization.py
  • Gradata/tests/test_meta_rules.py
  • Gradata/tests/test_migration_002_event_identity.py
  • Gradata/tests/test_migration_003_sync_state.py
  • Gradata/tests/test_multi_brain_simulation.py
  • Gradata/tests/test_pipeline_e2e.py
  • Gradata/tests/test_pre_compact.py
  • Gradata/tests/test_rule_pipeline.py
  • Gradata/tests/test_rule_synthesizer.py
  • Gradata/tests/test_session_close_loop_state.py
  • Gradata/tests/test_skill_export.py
  • Gradata/tests/test_transcript.py
💤 Files with no reviewable changes (1)
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py

Comment on lines +1 to +26
"""
Gradata Dashboard — Your AI's fitness tracker.
===============================================
Run: streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py
"""

import json
import re
import sqlite3
from datetime import datetime
from pathlib import Path

import pandas as pd
import plotly.graph_objects as go
import streamlit as st

# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain")
DB_PATH = BRAIN_DIR / "system.db"
EVENTS_PATH = BRAIN_DIR / "events.jsonl"
LESSONS_PATH = BRAIN_DIR / "lessons.md"
PROSPECTS_DIR = BRAIN_DIR / "prospects"
BRIEF_PATH = BRAIN_DIR / "morning-brief.md"
TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove this archived dashboard from the PR.

This file is being added under .archive/, which means it is exactly the kind of scratch artifact the repo now wants ignored. Tracked files remain tracked even after the ignore rule lands, so leaving this here will keep shipping a deprecated script plus hard-coded local paths like C:/Users/olive/... and SpritesWork.

Based on learnings: "Never commit scratch files like .tmp/, .archive/, sessions/handoff-*.md, or files named 0 or BrainDetail — these belong in .gitignore".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 1
- 26, This archived dashboard file should be removed from the PR: delete the
added file from the commit (or run git rm --cached to untrack if you need it
locally), and add the .archive/ pattern to .gitignore so it won't be re-added;
if you must retain a copy, move it out of the repo or into a local-only location
and remove hard-coded local paths/constants (e.g., BRAIN_DIR, DB_PATH,
EVENTS_PATH, TASKS_DIR) before committing any replacement so no
C:/Users/olive/SpritesWork/... paths remain in tracked files.

Comment thread Gradata/AGENTS.md

Lower layers **never** import from higher layers. Violations are bugs and should be flagged in code review.

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Add language specifiers to fenced code blocks.

Three fenced code blocks are missing language specifiers, which impacts rendering and syntax highlighting. Consider specifying the language or using text for plain diagrams/templates.

📝 Proposed fix for language specifiers

At line 80 (layering diagram):

-```
+```text
 Layer 2 — Public API        brain.py, cli.py, daemon.py, mcp_server.py

At line 103 (lifecycle diagram):

-```
+```text
 INSTINCT → PATTERN → RULE → META_RULE

At line 175 (commit template):

-```
+```text
 <type>(<scope>): <imperative description>

Also applies to: 103-103, 175-175

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 80-80: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/AGENTS.md` at line 80, Add language specifiers to the three fenced
code blocks in AGENTS.md by changing their opening fences to include a language
(use "text" for these non-code snippets): the layering diagram fence that
contains "Layer 2 — Public API        brain.py, cli.py, daemon.py,
mcp_server.py", the lifecycle diagram fence that contains "INSTINCT → PATTERN →
RULE → META_RULE", and the commit message template fence that contains
"<type>(<scope>): <imperative description>" so they render with proper syntax
highlighting.

Comment on lines 21 to 23
### 1. Local-first stays the source of truth
SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add required blank line spacing around the heading (MD022).

The heading block here is missing markdownlint-required surrounding blank lines.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/architecture/multi-tenant-future-proofing.md` around lines 21 -
23, The markdown heading "### 1. Local-first stays the source of truth" lacks
the required blank line spacing; add a single blank line before the heading and
a single blank line after the heading so the heading is surrounded by blank
lines (i.e., ensure there is an empty line above "### 1. Local-first stays the
source of truth" and an empty line below it).

Comment on lines +47 to +50
!!! info "Local by default"
Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.

The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated.
Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Typo: ANTHOPIC_API_KEYANTHROPIC_API_KEY.

Line 48 has a missing 'R' in the environment variable name.

📝 Proposed fix
-    Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHOPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.
+    Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/concepts/meta-rules.md` around lines 47 - 50, Summary: The docs
contain a typo in the Anthropic API env var name; replace the incorrect
backticked token `ANTHOPIC_API_KEY` with the correct `ANTHROPIC_API_KEY`. Locate
the string "set `ANTHOPIC_API_KEY`" in the meta-rules.md text and update it to
"set `ANTHROPIC_API_KEY`" so the environment variable name is spelled correctly
(reference token: ANTHROPIC_API_KEY).

Comment on lines +16 to +44
### 1. Deprecated adapter shims (scheduled v0.8.0)
- `src/gradata/integrations/anthropic_adapter.py` → `middleware.wrap_anthropic`
- `src/gradata/integrations/langchain_adapter.py` → `middleware.LangChainCallback`
- `src/gradata/integrations/crewai_adapter.py` → `middleware.CrewAIGuard`
Warnings are in place; remove the modules and their tests at v0.8.0.

### 2. `_cloud_sync.py` terminology
File posts to an optional external dashboard — fine to keep, but the
module docstring should make clear it is optional telemetry, not a
mandatory cloud dependency. Callers already tolerate absence.

### 3. Docstring drift in `meta_rules.py`
Module header still says "require Gradata Cloud" and "no-ops in the
open-source build". That is no longer true as of the local-first port —
rewrite the header to describe the local clustering algorithm.

### 4. Test-level cloud gating
Former `@_requires_cloud` / `skipif` markers were deleted in this cycle.
If any new test reintroduces a cloud gate, delete the gate instead — the
feature should either be local-first or not ship.

### 5. `api_key` kwarg on `merge_into_meta`
The old `merge_into_meta(..., api_key=...)` path routed into
`synthesise_principle_llm` directly. Current architecture drives LLM
distillation from `rule_synthesizer` at session close instead. The kwarg
is still accepted via `**kwargs` for forward compatibility but performs
no work — remove after one release.

### 6. Doc sweep
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix MD022 heading spacing to keep markdownlint clean.

Several ### headings are missing required surrounding blank lines, which triggers the reported lint warnings.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 16 - 44, The markdown headings
in LEGACY_CLEANUP.md (e.g., "### 1. Deprecated adapter shims (scheduled
v0.8.0)", "### 2. `_cloud_sync.py` terminology", "### 3. Docstring drift in
`meta_rules.py`", "### 4. Test-level cloud gating", "### 5. `api_key` kwarg on
`merge_into_meta`", etc.) are missing required surrounding blank lines and
trigger MD022; fix by inserting a single blank line before and after each `###`
heading so every heading has an empty line above and below, then run
markdownlint to confirm the MD022 warnings are resolved.

Comment on lines +273 to +296
def _cluster_by_similarity(
lessons: list[Lesson],
threshold: float = 0.35,
) -> list[list[Lesson]]:
"""Greedy single-pass clustering by semantic similarity.

Picks the first unclustered lesson as centroid, pulls in anything above
``threshold``, repeats on the remainder. Good enough for the cluster
sizes we see (tens of lessons, not thousands).
"""
unclustered = list(lessons)
clusters: list[list[Lesson]] = []
while unclustered:
centroid = unclustered.pop(0)
cluster = [centroid]
remaining: list[Lesson] = []
for lesson in unclustered:
if semantic_similarity(centroid.description, lesson.description) >= threshold:
cluster.append(lesson)
else:
remaining.append(lesson)
clusters.append(cluster)
unclustered = remaining
return clusters
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the clustering seed order deterministic.

_cluster_by_similarity() uses the first unclustered lesson as the centroid, so cluster membership depends on the incoming list order. If callers load the same lessons in a different DB/list order, you can synthesize different clusters and therefore different meta-rule IDs from identical source data.

Suggested stabilization
 def _cluster_by_similarity(
     lessons: list[Lesson],
     threshold: float = 0.35,
 ) -> list[list[Lesson]]:
@@
-    unclustered = list(lessons)
+    unclustered = sorted(lessons, key=_lesson_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 273 - 296,
_cluster_by_similarity currently picks the first element of the incoming lessons
list as the centroid, making clusters non-deterministic when input order varies;
make the seed order deterministic by sorting the lessons before clustering
(inside _cluster_by_similarity) using a stable, unique key from the Lesson
objects (for example lesson.id if available, else a deterministic fallback like
lesson.description or a hash of it), so that centroid selection (the centroid
variable and unclustered list) is consistent across runs; ensure the sort is
stable and applied once at the start before the greedy loop.

Comment on lines +1209 to +1230
# Without creds we emit deterministic meta-rules that are stored but
# never injected (INJECTABLE_META_SOURCES excludes them) — warn loudly
# so the capability gap is visible instead of silent 100% discard.
llm_principle = _try_llm_principle(rules, category)
if llm_principle:
principle = llm_principle
source = "llm_synth"
else:
principle = f"Across {len(rules)} corrections in {category}: " + "; ".join(descriptions[:5])
principle = f"Across {len(rules)} corrections in {category}: " + "; ".join(
descriptions[:5]
)
if len(descriptions) > 5:
principle += f" (and {len(descriptions) - 5} more)"
source = "deterministic"
_log.warning(
"meta-rule synthesis degraded to deterministic for '%s' (%d rules) — "
"no LLM creds. Resulting meta-rule will be stored but not injected. "
"Set GRADATA_LLM_KEY+GRADATA_LLM_BASE or GRADATA_GEMMA_API_KEY to "
"enable injectable LLM synthesis.",
category,
len(rules),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce the non-injectable fallback contract in code, not just in the warning.

This branch says deterministic fallbacks are “stored but not injected”, but this module’s prompt formatter still renders every MetaRule it receives. A caller that forwards new_metas unchanged can still inject source="deterministic" principles into prompts, which is the exact regression this change is trying to avoid.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 1209 - 1230, In
the deterministic fallback branch (where _try_llm_principle returns false)
ensure the produced MetaRule is marked non-injectable instead of relying on a
log message: when constructing the fallback principle and setting
source="deterministic", also set the MetaRule injectable flag or equivalent
attribute (e.g., MetaRule.injectable = False or MetaRule.source_kind =
"deterministic" and ensure prompt formatter checks that flag) so callers that
forward new_metas cannot have deterministic meta-rules rendered into prompts;
update the creation path that yields new_metas (the code that uses
principle/source to build MetaRule objects) to explicitly set the non-injectable
marker and ensure the prompt formatter only renders meta-rules with the
injectable flag or in INJECTABLE_META_SOURCES.

Comment on lines +169 to +171
except Exception as exc:
_log.debug("synth: anthropic SDK failed: %s", exc)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Exception handling/logging does not meet repo policy.

These handlers swallow all exceptions at debug level without stack traces; use typed exceptions where practical, or at least warning-level logs with exc_info=True.

As per coding guidelines Gradata/**/*.py: Use typed exceptions or provide meaningful logging context (logger.warning(...) with exc_info=True) instead of silent failures.

Also applies to: 195-197, 223-225, 252-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 169 - 171,
The catch-all exception handlers in rule_synthesizer.py (e.g., the block logging
"synth: anthropic SDK failed: %s" using _log.debug) should be replaced with
either more specific exception types where possible or at minimum escalate the
log to warning/error and include the stack trace via exc_info=True; update the
handler(s) around the anthropic SDK call and the other similar blocks (the
handlers at the other locations referenced: ~195-197, ~223-225, ~252-254) to
call _log.warning(..., exc_info=True) or _log.error(..., exc_info=True) and
narrow the except clauses if you can catch specific exceptions from the SDK
rather than bare "except Exception". Ensure you keep the current return None
behavior after logging if that is desired.

Comment on lines +200 to +225
def _call_gemini(
model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
) -> str | None:
key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY")
if not key:
_log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set")
return None
try:
from google import genai
from google.genai import types as genai_types

client = genai.Client(api_key=key)
config = genai_types.GenerateContentConfig(
system_instruction=system,
max_output_tokens=max_tokens,
)
resp = client.models.generate_content(
model=model,
contents=user_prompt,
config=config,
)
text = resp.text
return text.strip() if text else None
except Exception as exc:
_log.debug("synth: gemini SDK failed: %s", exc)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For the current google-genai Python SDK, what is the official way to set a per-request timeout for client.models.generate_content(...) calls?

💡 Result:

The official way to set a per-request timeout for client.models.generate_content(...) calls in the current google-genai Python SDK (googleapis/python-genai) is to pass http_options within the config parameter of the GenerateContentConfig. Example: from google import genai from google.genai import types client = genai.Client(api_key='your-api-key') response = client.models.generate_content( model='gemini-2.0-flash-exp', contents='Tell me a story', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=120000) # 2 minutes in milliseconds ) ) Timeouts are specified in milliseconds. This overrides client-level settings for that specific request. For client-wide defaults, set http_options when creating the Client: client = genai.Client( api_key='your-api-key', http_options=types.HttpOptions(timeout=30000) # 30 seconds ) Note: Some issues report potential bugs with timeout enforcement in certain scenarios (e.g., long generations or async streaming), but the documented official method is via HttpOptions in http_options field of GenerateContentConfig for per-request control.

Citations:


Apply timeout to Gemini config using http_options.

The timeout parameter is accepted but ignored in the Gemini request. The google-genai SDK supports per-request timeout via http_options in GenerateContentConfig (timeout in milliseconds):

Suggested fix
config = genai_types.GenerateContentConfig(
    system_instruction=system,
    max_output_tokens=max_tokens,
    http_options=genai_types.HttpOptions(timeout=int(timeout * 1000)),
)

Without this, the Gemini path can block indefinitely despite the caller's timeout contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 225,
The _call_gemini function accepts a timeout but never applies it to the Gemini
request; update the GenerateContentConfig construction
(genai_types.GenerateContentConfig) to pass http_options using
genai_types.HttpOptions with timeout set to int(timeout * 1000) so the
per-request timeout (milliseconds) is enforced; ensure the new http_options
argument is included alongside system_instruction and max_output_tokens before
calling client.models.generate_content.

Comment on lines +228 to +254
def _call_http(
model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
) -> str | None:
"""OpenAI-compatible HTTP endpoint. Model string IS the base URL.

Set GRADATA_HTTP_API_KEY for auth, GRADATA_HTTP_MODEL for the model
name to pass in the request body (defaults to 'default').
"""
key = os.environ.get("GRADATA_HTTP_API_KEY", "dummy")
model_name = os.environ.get("GRADATA_HTTP_MODEL", "default")
try:
import openai

client = openai.OpenAI(api_key=key, base_url=model, timeout=timeout)
resp = client.chat.completions.create(
model=model_name,
max_tokens=max_tokens,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_prompt},
],
)
text = resp.choices[0].message.content
return text.strip() if text else None
except Exception as exc:
_log.debug("synth: HTTP provider failed (%s): %s", model, exc)
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add HTTPS/localhost validation before using model-as-URL.

The HTTP route takes model as base_url without the same transport guard used elsewhere, which weakens SSRF/bearer-key protection.

🔒 Suggested fix
 import hashlib
 import logging
 import os
 from pathlib import Path
 
+from gradata._http import require_https
+
 _log = logging.getLogger(__name__)
@@
 def _call_http(
     model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
 ) -> str | None:
@@
     key = os.environ.get("GRADATA_HTTP_API_KEY", "dummy")
     model_name = os.environ.get("GRADATA_HTTP_MODEL", "default")
+    require_https(model, "GRADATA_SYNTH_MODEL")
     try:
         import openai

Also applies to: 275-283

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 228 - 254,
The _call_http function uses the incoming model string directly as a base URL,
so add strict transport validation: parse model with urllib.parse.urlparse and
allow only scheme "https" or allow "http" only when the hostname is a loopback
(localhost, 127.0.0.1, ::1); if the URL fails validation log a debug/error and
return None instead of making the request. Update the same validation logic in
the other HTTP helper that also treats model as a URL (the function around lines
275-283, e.g., the async/http variant), and use the validated URL when
constructing the openai.OpenAI client to prevent SSRF/bearer-key exposure.

@Gradata
Copy link
Copy Markdown
Owner Author

Gradata commented May 1, 2026

Superseded by clean rebase. The original branch carried 41 stale m1 commits (already merged via #144); reopening with just the Phase A commit (AGENTS.md + .gitignore) on rebase/council-phase-a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant