feat(hooks): opt-out env kill switches for 6 SDK hooks + audit fixes#133
feat(hooks): opt-out env kill switches for 6 SDK hooks + audit fixes#133
Conversation
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json` vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns` -> `corrections`). Added `_transform_row` per-table mapper with deterministic uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` / `GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both backend and SDK. Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth Phase 1 of the learning-pipeline revamp. Rule graduation now flows through the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN, >= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline. Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED header, regenerated only at session_close after the pipeline fires. LLM synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with claude CLI fallback (Max-plan OAuth) so users without an exportable key still get synthesis. Meta-rule deterministic fallback now warns loudly instead of silently discarding. Drops five env-flag gates in favour of file-based signals. Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the underlying diagnose() function. Flips the default cloud endpoint to api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py (all passing). Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").
Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
"how come", "you missed/forgot/failed/didnt"
All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.
Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real corrections observed this session ("Why r you not asking council", "Why flag.. we don't skip we do work") plus shorthand cases (dont / cant / again / you missed / how come). Dual-signal cases assert both types detect. Full suite: 37 passed, 1 pre-existing skip. Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7 retention, time-to-first-graduation, free->Pro conversion, correction-rate decay). Numeric triggers: pivot <20% activation + flat decay at D30; kill <100 installs at D60; scale >1K installs + >=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8 of the pre-launch gap analysis. Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis" which is stale since the graduation-cloud-gate was removed. Synthesis runs on the user's machine via rule_synthesizer.py's two-provider path (Anthropic SDK with user's key, or Claude Code Max CLI OAuth). Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a few sessions ago (rule_synthesizer.py uses user's own Anthropic key or Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly still claimed "cloud runs better graduation engine" and implied a cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy paragraph to match reality: free is functionally complete; Pro is visualization, history, export, and the future community corpus. NOTE: this file is listed in .gitignore per the earlier "untrack private files" cleanup. Force-added at request. Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row correctly emits brain_id (cloud schema) from tenant_id (local schema); the assertion was stale. Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.
Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
members included), storing {category, description, task} in context so
session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
session shares the lesson's category (or description substring).
CONFIRMED otherwise (rule survived the session).
Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.
Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.
Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of the learning loop. Actual architecture (as of the graduation-local pivot): Local SDK owns: correction capture, graduation, meta-rule clustering AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth), rule-to-hook promotion, manifest computation. Cloud owns: dashboard/visualization, cross-device sync, team brains, managed backups, future opt-in corpus donation. Files touched: - docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance. - docs/architecture/cloud-monolith-v2.md — cloud-side workload framing. - docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow. - docs/concepts/meta-rules.md — synthesis is local, not cloud-gated. - docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize. README.md was already accurate; no changes there. Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
and escalate OperationalError to warning (missing-table surfaces).
Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
no longer propagates and kills the whole push batch.
Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
from the public repo; add both to .gitignore. These contain kill
triggers, pricing, and PII that belong in the private brain vault only.
Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
the detail field, restoring the documented status vocabulary
({ok, warn, fail, skip, missing, error}).
Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
test_lesson_applications.py.
Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud" skip markers were legacy artifacts from an earlier architecture where discovery/synthesis lived server-side. This commit finishes the port: - meta_rules.discover_meta_rules + merge_into_meta run locally: category grouping + greedy semantic-similarity clustering, zombie filter on RULE-state lessons below 0.90, decay after 20 sessions, count/(count+3) confidence smoothing. - Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer, test_meta_rule_generalization, test_multi_brain_simulation, test_pipeline_e2e. These tests now exercise the local impl directly. - Retire the api_key-kwarg-on-merge_into_meta path (session-close rule_synthesizer drives LLM distillation now). - Update fixtures to realistic prose so they survive the noise filter that rejects "cut:/added:" edit-distance summaries. - Bump test_meta_rules confidence assertion to the smoothed formula. - Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges (deprecated adapter shims, cloud docs, stale module docstrings). Suite: 3809 passed, 14 skipped, 2 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures
discover_meta_rules is implemented now (local-first). The
if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.
Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).
Suite: 3813 passed, 10 skipped, 2 xfailed.
Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env
xfails (2) are tracked for v0.7 reconciliation in test docstring.
Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests: Bug: agent_graduation._update_lesson_confidence had confidence = max(0.0, confidence - MISFIRE_PENALTY) but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added confidence on rejection. Test test_rejection_decreases_confidence was xfail'd with 'API drift, reconcile in v0.7' — it was a real bug. Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY). Other cleanups in the same pass: - test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson graduates straight to RULE (stronger than PATTERN). Accept either state. - test_integration_workflow: delete stale module-level skipif guarding 5 tests behind ANTHROPIC/OPENAI keys they never actually use. They only exercise local brain.correct/convergence/efficiency — no network. - test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test already covered by the 20+ fake-client tests in the same file). - test_meta_rules: delete test_with_real_data — dev-time exploration script with zero asserts, requiring GRADATA_LESSONS_PATH env var. Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed. Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl, which does not exist on Windows. Complementary Windows paths skip on Linux — running on each platform covers all 4. Cannot be eliminated. From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
CRITICAL fixes: - C1: rewrite meta_rules.py module docstring. It still said 'require Gradata Cloud' / 'no-ops in the open-source build' which directly contradicted the local-first implementation in the same file. Now describes the real algorithm. Closes LEGACY_CLEANUP item #3. - C2: drop owner-name string from _NOISE_PATTERNS. The other entries are format-based (cut:/added:/content change) and filter just fine. - C3: generalize the name-prefix strip regex in _build_principle from hardcoded 'Oliver:' to a generic 'Name:' pattern. HIGH fixes: - H1: update _update_lesson_confidence docstring to stop quoting the old -0.25 number and instead point at the canonical constants. - H2: _apply_decay no longer mutates MetaRule in place — uses dataclasses.replace() so refresh_meta_rules' persisted inputs aren't silently modified. - H3: add a comment explaining why the call-site threshold=0.20 is intentionally looser than _cluster_by_similarity's 0.35 default (category pre-filter handles most noise, recall matters more here). Suite clean on touched areas. Co-Authored-By: Gradata <noreply@gradata.ai>
…tocol Closes #127: HandoffWatchdog fires a preemptive resume-doc at 0.65 pressure (GRADATA_HANDOFF_THRESHOLD override), writes a compact Markdown handoff, and emits a handoff.triggered event so auto-compaction isn't the first signal the agent is out of budget. Closes #128: MultimodalEmbedder Protocol + MultimodalInput validation + TextOnlyEmbedder default + embed_any router. User supplies their own multimodal provider (Gemini, Voyage, CLIP); Gradata never hosts the endpoint. Falls back to text-only when no multimodal embedder is configured. Both are provider-agnostic, local-first, and covered by unit tests (18 handoff + 20 embedder). Full suite: 3853 passed, 3 skipped. Co-Authored-By: Gradata <noreply@gradata.ai>
- HandoffWatchdog._fired now init=False/repr=False/compare=False so the guard cannot be bypassed via constructor and doesn't leak into equality. - _hash_vector zero-norm branch now returns a zero vector instead of an unnormalised one, honouring the Protocol's normalisation contract. - Add test covering the handoff.triggered event emission path so a _events.emit signature drift can't silently regress. Co-Authored-By: Gradata <noreply@gradata.ai>
test_capture_rule_failure.py reached out of Gradata/ via parents[4] to load .claude/hooks/reflect/scripts/capture_learning.py — a private Claude Code hook that is not part of the public SDK. The test would skip on every machine except the author's worktree, adding a phantom \"skipped\" count in CI for every downstream user. If we want coverage for the matcher, rewrite it as a pure unit test against a function exposed by the SDK, or keep it on the private side next to the hook it exercises. Suite after removal: 3854 passed, 2 skipped (the two legitimate POSIX tests in test_file_lock.py that run on Linux CI). Co-Authored-By: Gradata <noreply@gradata.ai>
Wires the watchdog to the next agent's context: when HandoffWatchdog
fires and writes a handoff doc, the new SessionStart hook loads the
most recent unconsumed *.handoff.md from {brain_dir}/handoffs/, wraps
it in <handoff>...</handoff>, and returns it to Claude Code. The agent
sees the handoff before brain-rules (primacy) and picks up where the
prior agent left off.
After injection the file moves to handoffs/consumed/ so the next
session won't re-inject it. Oversized bodies are truncated
(GRADATA_HANDOFF_MAX_CHARS, default 4000). Embedded </handoff> literals
are escaped so a hostile body cannot close our wrapper early.
Helpers added to gradata.contrib.patterns.handoff:
- default_handoff_dir(brain_dir) → Path (canonical location)
- pick_latest_unconsumed(dir) → Path | None
- consume_handoff(path) → moves to consumed/ subdir
Tests: +16 hook tests + 9 helper tests = 41 total on handoff+hook.
Co-Authored-By: Gradata <noreply@gradata.ai>
Handoff now carries the timestamp of the rules the prior agent was operating under. On next SessionStart, inject_handoff writes a .handoff_active.json sentinel. inject_brain_rules reads it and, when lessons.md has not changed since the snapshot, suppresses the ranked <brain-rules> block — the handoff already carries that continuity. Mandatory directives, disposition, meta-rules, and the brain_prompt short-circuit still fire; only the ranked block is skipped. Gated by GRADATA_HANDOFF_RULES_DELTA=1 (default on). Co-Authored-By: Gradata <noreply@gradata.ai>
Sub-agent spawns were re-injecting rules already present in the parent session's context — measured ~500-2500 wasted tokens per multi-agent workflow. agent_precontext now reads brain_dir/.last_injection.json (written by inject_brain_rules on SessionStart) and skips any rule whose full_id appears in the parent manifest. Gated by GRADATA_SUBAGENT_DEDUP=1 (default on). Silent on missing manifest — falls back to full injection. Matches the feature-flag pattern used by the handoff-delta optimization. Co-Authored-By: Gradata <noreply@gradata.ai>
brain_prompt.md had no size cap and grew unconstrained as the lesson corpus matured, costing 500-3000 tokens per session on the primary injection path. Add GRADATA_MAX_BRAIN_PROMPT_CHARS (default 4000) with truncation marker, matching the inject_handoff pattern. Co-Authored-By: Gradata <noreply@gradata.ai>
context_inject fires on every UserPromptSubmit and returned FTS snippets that frequently overlapped with rules already in the <brain-rules> block — ~200-500 wasted tokens per prompt. Drops any snippet with >70% Jaccard token overlap against an injected rule description. Reads brain_dir/.last_injection.json for the comparison corpus. Gated by GRADATA_CONTEXT_DEDUP=1 with threshold override via GRADATA_CONTEXT_DEDUP_THRESHOLD. Co-Authored-By: Gradata <noreply@gradata.ai>
_emit_event ran unconditionally before the 'if not ranked: return' guard, writing a JIT_INJECTION entry for every UserPromptSubmit even when zero rules matched. Most prompts are zero-match, so this was the dominant source of events.jsonl write amplification and hot- path I/O overhead. Moved the emit after the empty-guard so only successful injections emit — matches the success-only pattern in inject_handoff. Co-Authored-By: Gradata <noreply@gradata.ai>
…tart hooks Projects with a superset JS replacement (e.g. the Sprites overlay) can now disable the Python SDK hook without patching SDK source. Default is on — setting the env var to "0" skips the hook and returns None. Vars added (default "1"): GRADATA_BRAIN_MAINTAIN — Stop, brain_maintain.py GRADATA_SESSION_PERSIST — Stop, session_persist.py GRADATA_SECRET_SCAN — PreToolUse, secret_scan.py GRADATA_CONFIG_PROTECTION — PreToolUse, config_protection.py GRADATA_DUPLICATE_GUARD — PreToolUse, duplicate_guard.py GRADATA_CONFIG_VALIDATE — SessionStart, config_validate.py secret_scan additionally emits a stderr warning when disabled — it is the sole line of defense against credential commits, so a silent opt-out on a misconfigured project is too risky. Hook-overlap audit 2026-04-21 (.tmp/hook-overlap-audit-2026-04-21.md): items 10-14 + 17. Eliminates ~8-20s per Stop, ~200-400 tok per edit, ~1500 tok per session of duplicate work when a JS superset is active. Tests: 3908 passed, 2 skipped (baseline 3828/2, +80 from unrelated). Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (54)
📝 WalkthroughSummary
WalkthroughThis PR shifts Gradata's architecture from cloud-centric to local-first by moving core learning-loop operations (graduation, meta-rule synthesis, rule-to-hook promotion) into the SDK, repositioning cloud as a visualization and sharing layer. It introduces local meta-rule discovery, a handoff mechanism for context-pressure handling, rule synthesis caching, RAG embedder infrastructure, enhanced cloud sync with row transformation, and numerous hook enhancements with configurable kill switches. Changes
Sequence Diagram(s)sequenceDiagram
participant SDK as SDK (Local)
participant Cache as Rule Synthesis Cache
participant Anthropic as Anthropic SDK
participant Claude as Claude CLI
participant LLM as LLM Provider
SDK->>SDK: Aggregate rules (mandatory, clustered, meta, disposition)
activate SDK
SDK->>Cache: Check deterministic cache key
Cache-->>SDK: Cache hit? Return cached block
alt Cache Miss
SDK->>Anthropic: Try ANTHROPIC_API_KEY path
alt SDK Available
Anthropic->>LLM: POST to claude-opus-4-7
LLM-->>Anthropic: <brain-wisdom>...</brain-wisdom>
Anthropic-->>SDK: Return synthesized block
else SDK Unavailable
SDK->>Claude: Fallback to 'claude -p' CLI
Claude->>LLM: CLI invocation --model ...
LLM-->>Claude: Output with <brain-wisdom>
Claude-->>SDK: Return extracted block
end
SDK->>Cache: Write cache (best-effort)
end
SDK-->>SDK: Return brain-wisdom block or None on failure
deactivate SDK
sequenceDiagram
participant App as Application
participant Watchdog as HandoffWatchdog
participant Synth as Synthesizer Callable
participant FS as File System
participant Events as Events System
App->>Watchdog: measure_pressure(tokens_used, tokens_max)
Watchdog-->>App: pressure [0,1] (clamped)
App->>Watchdog: check(tokens_used, tokens_max)
activate Watchdog
Watchdog->>Watchdog: Compute pressure
alt Pressure >= Threshold & Not Yet Fired
Watchdog->>Synth: invoke synthesizer()
Synth-->>Watchdog: HandoffDoc
Watchdog->>FS: Write handoff_dir/[task_id].[agent].[ts].handoff.md
FS-->>Watchdog: File written
Watchdog->>Events: emit("handoff.triggered")
Events-->>Watchdog: Event sent (or skipped)
Watchdog->>Watchdog: Mark _fired = True
Watchdog-->>App: Return HandoffDoc
else Below Threshold or Already Fired
Watchdog-->>App: Return None
end
deactivate Watchdog
App->>Watchdog: reset()
Watchdog->>Watchdog: Set _fired = False
sequenceDiagram
participant SDK as SDK Row Buffer
participant Transform as Row Transformer
participant Dedup as Deduplication
participant Scrub as Payload Sanitizer
participant HTTP as HTTP Batch Post
participant Cloud as Cloud Table
SDK->>Transform: For each SQLite row
activate Transform
Transform->>Transform: Coerce types (e.g., session to int|None)
Transform->>Transform: Parse JSON text columns
Transform->>Transform: Pack extra fields → data
Transform->>Transform: Generate deterministic UUID
Transform-->>SDK: Transformed row
deactivate Transform
SDK->>Dedup: Collect transformed rows
Dedup->>Dedup: Group by id, keep first of each
Dedup-->>SDK: Deduplicated batch
SDK->>Scrub: Sanitize for JSONB
Scrub->>Scrub: Remove NUL bytes recursively
Scrub-->>SDK: Sanitized payload
SDK->>HTTP: POST /[remapped_table] batch
HTTP->>Cloud: Send deduplicated, sanitized payload
Cloud-->>HTTP: 200 OK (count)
HTTP-->>SDK: Return accepted row count
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
#133 added opt-out env vars (GRADATA_SECRET_SCAN=0, _CONFIG_PROTECTION=0, _SESSION_PERSIST=0, etc.) that disable the corresponding hook. Dev shells often leave these set, which then flips 10 hook safety/intelligence tests from green to failing locally even though the code is correct. Session-scoped autouse fixture pops the seven kill-switches for the whole test session and restores them on teardown. Co-Authored-By: Gradata <noreply@gradata.ai>
Watchdog gap: the repo only had sdk-publish.yml (tag-triggered), so PRs shipped without pytest ever running. #133's hermeticity bug slipped through because CodeRabbit reviews code but doesn't execute tests. - Matrix: Python 3.11 + 3.12 (matches pyproject `requires-python = >=3.11`) - Scope: PRs touching Gradata/** and pushes to main - No lint yet — ruff surfaces 622 pre-existing errors that need a separate clean-up pass before it can be a blocking gate. Co-Authored-By: Gradata <noreply@gradata.ai>
…or (#134) * fix(implicit_feedback): restore GAP signal category dropped in hook dedup The hook-overlap audit removed the JS implicit-feedback hook as redundant with the SDK version, but verifier caught that the SDK SIGNAL_MAP was missing the GAP category: "what about", "you forgot/missed/skipped/ dropped/ignored", "did you check/verify/test/review". CHALLENGE_PATTERNS already catches "you didn't/missed/forgot/failed" but lost the "what about" and "did you check" variants. Adding GAP_PATTERNS restores strict parity with the removed JS hook. Tests: 48 implicit_feedback + hooks_intelligence tests pass. Co-Authored-By: Gradata <noreply@gradata.ai> * feat(implicit_feedback): emit tacit OUTPUT_ACCEPTED on silent follow-ups Users rarely type "looks good" — they just send the next task. brain.correct() logs every CORRECTION but explicit approval words fire 20x less, making the correction ratio look broken (2289% over last 14 days). Emit a tacit OUTPUT_ACCEPTED when a substantive follow-up prompt (>=30 chars) has no negation / challenge / reminder / gap signals. - Adds `mode: "explicit" | "tacit"` to the event payload so the audit script can distinguish signal strength. - Short acks ("ok", "go", "thanks") stay below the threshold — they are too ambiguous to infer prior-turn acceptance. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(hooks): eliminate false-positive REJECTED outcomes + dead code session_close previously flagged rules as REJECTED when a corrections' 30-char prefix happened to match any substring of the lesson description. "never hardcode secrets" and "never hardcode port numbers" collided on "never hardcode " and quietly poisoned the graduation pipeline. Require the shorter side to be ≥40 chars and to be a full substring match (either direction) before rejecting. Also remove a dead `payload.get("tool_output", "")` expression in claude_code.py whose return value was never captured. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(cloud): align SDK base URL with /api/v1 prefix Cloud API now mounts routes under /api/v1 (gradata-cloud@04a272f). SDK was posting to /v1/telemetry/metrics — 404s. Rebases the default base to https://api.gradata.ai/api/v1 and trims the /v1 prefix from the per-call paths. Also applies ruff formatting to the security regression tests (no behavior change). Co-Authored-By: Gradata <noreply@gradata.ai> * refactor(hooks): collapse emit boilerplate into _base.emit_hook_event helper Five hooks were repeating the same resolve_brain_dir() → BrainContext.from_brain_dir() → emit() dance. Centralize into emit_hook_event() so new hooks don't re-learn the pattern and failures log uniformly. - _base.py: add emit_hook_event(event_type, source, data, brain_dir=None) - implicit_feedback, agent_graduation, tool_failure_emit, self_review: migrated - Net -13 lines, identical external behavior, all 90 hook tests pass Co-Authored-By: Gradata <noreply@gradata.ai> * fix(sdk): audit-driven bug batch — unreachable code, shared-state mutation, off-by-one From autoresearch audits on patterns/, enhancements/, and top-level SDK: - rag.py: two-pass query expansion was unreachable (Stage 3 unconditionally returned before Stage 4). Moved expansion inside Stage 3 gate so cfg.two_pass actually takes effect when non-empty results exist. - parallel.py: DependencyGraph.run mutated task.input_data on the shared ParallelTask instance. Re-running the graph saw stale upstream outputs. Use dataclasses.replace to scope the resolved input to the current run. - guardrails.py: two dead expressions (.lower() and str()) whose results were discarded; removed. - _confidence.py: sessions_since_fire off-by-one — reset to 0 then immediately += 1 produced a systematic overcount for fired lessons. Track via flag and skip the increment on fire. Added defensive severity default for fragile ternary on CONTRADICTING path. - meta_rules.py:685: refresh_meta_rules mutated existing_metas in place despite contract; use dataclasses.replace so callers' references stay pristine. - brain.py:_resolve_pending: held a SQLite connection open across lessons_lock. Close before acquiring the file lock; re-open only for the final UPDATE. All 670 affected tests pass. Co-Authored-By: Gradata <noreply@gradata.ai> * test(conftest): scrub hook kill-switch env vars for hermetic runs #133 added opt-out env vars (GRADATA_SECRET_SCAN=0, _CONFIG_PROTECTION=0, _SESSION_PERSIST=0, etc.) that disable the corresponding hook. Dev shells often leave these set, which then flips 10 hook safety/intelligence tests from green to failing locally even though the code is correct. Session-scoped autouse fixture pops the seven kill-switches for the whole test session and restores them on teardown. Co-Authored-By: Gradata <noreply@gradata.ai> * perf(sdk): MEDIUM fixes — skip DDL reruns, drop O(n) dupe scan, log swallowed exceptions - _events.py:_ensure_table: cache schema-initialized state per db_path so the 10+ CREATE/ALTER/INDEX DDL statements run once per process instead of on every emit() call. PRAGMAs still re-run per connection. - reflection.py: CritiqueChecklist duplicate-name scan was O(n²) via list.count in a loop; use Counter once. - reporting.py: three `except Exception: pass` blocks in build_brain_briefing silently dropped rule/quality/correction extraction errors. Log at DEBUG so misconfigurations are diagnosable without changing the silent-return contract. All 180/167 affected tests pass. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(rules): audit HIGH batch — non-deterministic rule_id, O(N×E) sort, duplicate TaskType - _engine.py: replace `hash(lesson.description) % 10000` with `_make_rule_id(lesson)`. Python hash() is per-process randomized via PYTHONHASHSEED, so RuleCache and RuleGraph lookups keyed on rule_id broke across runs. - _engine.py: pre-compute difficulty_by_cat dict once before sort to collapse O(N × E) compute_rule_difficulty calls inside sort key to O(E + N). - scope.py: merge duplicate `TaskType(name="research", ...)` entries. Second entry (with sales-flavored keywords) was dead — first match always won. Unified keyword list in the primary entry. Tests: 418 passed (rules + scope + rule_engine scope). Co-Authored-By: Gradata <noreply@gradata.ai> * ci(sdk): add pytest on pull_request Watchdog gap: the repo only had sdk-publish.yml (tag-triggered), so PRs shipped without pytest ever running. #133's hermeticity bug slipped through because CodeRabbit reviews code but doesn't execute tests. - Matrix: Python 3.11 + 3.12 (matches pyproject `requires-python = >=3.11`) - Scope: PRs touching Gradata/** and pushes to main - No lint yet — ruff surfaces 622 pre-existing errors that need a separate clean-up pass before it can be a blocking gate. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(sdk): audit CRITICAL + HIGH batch — hash determinism, shared-state mutation, conn leaks, O(N^2) CRITICAL - integrations/embeddings.py: trigram local-embedding used Python's built-in hash() which is per-process randomized (PYTHONHASHSEED). Same text embedded across processes yielded different vectors, silently corrupting cosine similarity + clustering. Switched to md5 truncated to 8 bytes — stable, fast, deterministic. - integrations/openai_adapter.py: patched_create mutated the caller's messages list and dict entries in place. Any caller that reused the list across calls permanently accumulated rules on the system message. Now clones the list + dicts and routes the clone to the underlying client via kwargs/args. - sidecar/watcher.py: _try_emit_via_brain created a new Brain (fresh SQLite conn) on every detected change, never closed. Now caches a single self._brain_instance lazily on first use. HIGH - graph.py: O(N*E) node lookup inside graduation-edge loop replaced with a single nodes_by_id dict. O(M*E) "any(e.target == mr_id for e in edges)" per meta-rule replaced with a merged_into_targets set. - mcp_server.py: top-level except in _dispatch now logs via _log.exception with traceback before returning the error dict. - middleware/_core.py: RuleSource.load now mtime-caches parsed+filtered lessons when sourced from lessons.md. Previously every on_llm_start / on_llm_end re-read and re-parsed the file. - integrations/langchain_adapter.py: 3 bare "except Exception: pass" blocks in load_memory_variables + save_context now log at debug level. - rules/rule_tracker.py: get_rule_history previously pulled 500 events and filtered Python-side. Now queries with tags_json LIKE 'rule:<id>' — uses the tag the emitter already writes. Tests: updated test_integrations.py to assert messages passed into the underlying client (not the caller's list) contain injected rules + that the caller's list stays unmutated. 249 targeted tests pass. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(sdk): audit CRITICAL+HIGH batch 3 — conn leaks, double disk reads, silent except, dead code CRITICAL - brain.py::review_pending: conn.close() was outside try/finally. If fetchall() or row materialization raised, the SQLite handle leaked. Switched to `with contextlib.closing(get_connection(...)) as conn:`. - _brain_manifest.py::generate: session-count cross-check did get_connection + conn.close() with close outside finally. Same fix. - _manifest_metrics.py::_quality_metrics: previously read lessons.md twice per call (once here, once inside `_lesson_distribution`). Read once, pass text through. HIGH - _manifest_helpers.py::_count_events, _get_tables: conn.close() only on happy path. Switched to contextlib.closing. - _manifest_metrics.py::_quality_metrics second conn block: same fix. - _manifest_metrics.py:221: dead list-comprehension whose result was immediately discarded — deleted. - brain.py::correct: telemetry `except Exception: pass` now debug-logs so failures are visible. - rules/rule_engine/_scoring.py::validate_assumptions: bare `except: pass` on scope_json parse now logs at debug level. Tests: 602 passed (brain + manifest + scoring + confidence scope). Co-Authored-By: Gradata <noreply@gradata.ai> * perf(brain): push _search_events fallback filter into SQL; log manifest parse errors - brain.py::_search_events: term filter now runs in SQL (LOWER(data_json) LIKE) instead of fetching 500 rows and Python-filtering. Empty query returns [] early. - brain.py: delete dead `with contextlib.suppress(ImportError): pass` trailer. - cloud/client.py::_read_local_manifest: corrupt brain.manifest.json now logs at warning level before returning empty dict, instead of silently shipping empty payloads to the cloud. Co-Authored-By: Gradata <noreply@gradata.ai> * perf(brain): batch 5 — lessons cache, env hoist, pairwise embed dedup, scope_json helper - brain.py::_load_lessons: mtime-keyed parse cache so apply_brain_rules and other read-only callers reuse parsed lessons instead of re-parsing on every call. apply_brain_rules switched to the cached loader; enhancements import check moved to find_spec so pyright sees it as a capability probe. - _graduation.py: hoist Beta-LB env reads out of the per-lesson loop via a new _read_beta_lb_config() called once per graduate() invocation. - similarity.py: expose semantic_vector / similarity_from_vectors so callers comparing one probe against many stored strings precompute stored vectors once (O(N*M) tokenization -> O(N+M)). - _graduation.py dedup gate: precompute existing-rule vectors outside the candidate loop and use similarity_from_vectors. - _manifest_metrics.py: add _parse_ts_utc() so naive/aware timestamp mixes from SQLite coerce to aware UTC before subtraction. - _scoring.py::lesson_scope: shared scope_json parse helper; _engine.py and validate_assumptions now use it instead of inlining the try/json.loads pattern. Removes the unused logging import from _scoring.py. Full suite: 3908 passed, 2 skipped. Co-Authored-By: Gradata <noreply@gradata.ai> * perf(confidence): hoist json import to module top Removes 7 redundant `import json as _json_*` statements from hot paths (parse_lessons per-meta-line, format_lessons per-lesson). Python caches imports so the cost is modest, but the stuttered aliases obscure intent. Co-Authored-By: Gradata <noreply@gradata.ai> * perf: batch 7 — brain cache invalidation, loop_detection O(1), q_learning index - brain.py: invalidate lessons + rule cache after patch_rule/forget/rollback/_resolve_pending writes; wrap _resolve_pending sqlite connections in contextlib.closing; cache self_improvement capability check in __init__; add logger.debug to silent excepts in session/manifest.proof - loop_detection.py: use Counter alongside deque for O(1) repeat detection (was O(window_size) per record) - q_learning_router.py: hoist hmac/logging/platform/time imports to module level; precompute agent_index dict for O(1) lookup (was O(N) list.index per update_reward) Co-Authored-By: Gradata <noreply@gradata.ai> * perf: batch 8 — hoist env reads, wrap sqlite, logger.debug silent excepts - meta_rules.py: _resolve_principle_creds() hoists GRADATA_LLM_* env reads out of per-category loop; _try_llm_principle accepts precomputed creds - reporting.py: wrap health-report sqlite3 connection in contextlib.closing; replace two `except: pass` with logger.debug - router_warmstart.py: wrap warm-start sqlite connection in contextlib.closing (was leaked if exception in between connect/close) - contrib/enhancements/quality_gates.py: wrap success-report sqlite in contextlib.closing; replace `except: pass` with logger.debug - brain.py: lineage() now uses get_connection() (consistent with the rest of brain.py) instead of raw sqlite3.connect - test_agentic_synthesis.py: update mocks to accept new creds kwarg Co-Authored-By: Gradata <noreply@gradata.ai> * perf: batch 9 — stable RRF hashing, O(N^2) fix, precompute word-sets, log swallowed excepts - rag.py: replace non-deterministic hash() with zlib.crc32 for RRF chunk IDs (PYTHONHASHSEED randomisation was silently breaking dedupe across processes/restarts) - rag.py: order_by_relevance_position no longer uses list.insert(0, ...) — was O(N^2) per call, now O(N) via head/tail split + reverse - rag.py: two-pass expansion + NaiveRAG.retrieve silent excepts now log at debug instead of masking misconfigured backends - tree_of_thoughts.py: precompute rule_word_sets once outside _default_scorer closure (was O(N*M) re-tokenisation per candidate x existing_rule) - rule_context_bridge.py: wrap WAL checkpoint conn in contextlib.closing; log the swallow - brain.py: hoist dataclasses import to module level (was inside health()) Co-Authored-By: Gradata <noreply@gradata.ai> * perf: batch 10 — stable IDs, sqlite closing, dict-indexed intent registry - rule_graph.py: wrap add_rule_relationship and get_related_rules sqlite connections in contextlib.closing (were leaked on exception) - rule_tree.py: replace non-deterministic hash() with zlib.crc32 for lesson IDs written to persisted .md frontmatter (was changing across processes, breaking cross-run ID stability) - contrib/patterns/memory.py: use heapq.nlargest for retrieve() when limit < len(matches); full sort only when returning everything - contrib/patterns/orchestrator.py: mirror _REGISTERED_INTENT_PATTERNS into a dict index for O(1) classify_request lookup (was O(N) linear scan per classify call) Co-Authored-By: Gradata <noreply@gradata.ai> * chore: track handoff watchdog hooks in .claude/hooks/ Whitelist the user-prompt handoff-watchdog and session-start handoff-inject hooks (plus their dispatchers) so fresh clones keep context-pressure handling wired into Claude Code. Everything else under .claude/ remains ignored. Co-Authored-By: Gradata <noreply@gradata.ai> * chore(gitignore): drop dispatcher whitelist, keep only watchdog hooks Refine the .gitignore watchdog carve-out — track only the two hook files themselves; leave dispatcher wiring machine-local. Co-Authored-By: Gradata <noreply@gradata.ai> * fix: address CodeRabbit review findings on #134 - brain.py: _resolve_pending re-checks resolution IS NULL and verifies rowcount in the UPDATE; prevents lost-race overwrites returning resolved=true when another worker already resolved - rag.py: two-pass expanded retrieval returns the original user query (not expanded_query) so downstream telemetry/logging never surfaces mined corpus terms as user input - cloud/sync.py: _normalize_api_base() upgrades legacy https://api.gradata.ai bases (no /api/v1 segment) on load; older cloud-config.json files self-heal instead of silently POSTing to unversioned endpoints - hooks/session_close.py: enforce the 40-char floor on BOTH lesson_desc and rejecting desc; gating only one side let short lessons match long descriptions via prefix containment - hooks/implicit_feedback.py: drop forgot|missed from GAP_PATTERNS (already owned by CHALLENGE_PATTERNS); raise tacit threshold to 60 chars and skip messages that look like questions — "can you explain ..." no longer counts as tacit acceptance - guardrails.py: block_reason now reports only the first failing input check, aligning with the GuardedResult docstring contract Co-Authored-By: Gradata <noreply@gradata.ai> --------- Co-authored-by: Gradata <noreply@gradata.ai>
Summary
Kill switches added (default "1")
GRADATA_BRAIN_MAINTAINGRADATA_SESSION_PERSISTGRADATA_SECRET_SCANGRADATA_CONFIG_PROTECTIONGRADATA_DUPLICATE_GUARDGRADATA_CONFIG_VALIDATEsecret_scanadditionally emits a stderr warning when disabled — it is the only credential guard, so a silent opt-out on a misconfigured project is too risky.Estimated savings when JS superset active
Test plan
pytest tests/— 3908 passed, 2 skippedpr-review-toolkit:code-reviewer) — approved with secret_scan warning ask, addressed"1"keeps every hook active.claude/settings.jsonGenerated with Gradata