fix(cloud/client): coerce float ts and non-int session for /sync by Gradata · Pull Request #165 · Gradata/gradata

Gradata · 2026-05-02T22:18:39Z

Backfill discovered 2 type-coerce bugs in CloudClient:

TypeError on float ts values from JIT_INJECTION events → coerce to str
HTTP 422 on float/string session values → coerce to int|None
Surface HTTPError response body for future debugging

Found while running scripts/backfill_to_cloud.py against historical events.jsonl.

…B gate, add invariant + obfuscation tests Council v4 verdict (council_2026-05-02T11-59-00.md) and v4-rerun (council_2026-05-02T11-59-00.md) flagged a small set of production-readiness items that don't depend on the larger dual-write work. This commit lands those independently so dual-write atomicity can ship as its own reviewable PR. What - Move src/gradata/enhancements/retrieval_fusion.py into enhancements/scoring/retrieval_fusion.py and update importers. Council vote 5/7 — RRF is a ranking primitive, lives more naturally with scoring/ than as a sibling. - Flip GRADATA_BETA_LB_GATE default ON in enhancements/self_improvement/_graduation.py. The 2026-04 ablation documented in the file showed ~15-20% of RULE-tier graduations miscalibrated by format-not-content; shipping the fix opt-in was textbook silent regression (council 5/7). GRADATA_BETA_LB_GATE=0 preserves the override-off escape hatch. - New tests/test_initial_confidence_invariant.py — locks the INITIAL_CONFIDENCE / PATTERN_THRESHOLD = 0.60 boundary that almost promoted every fresh lesson before strict-> was wired in. - New tests/test_score_obfuscation_gate.py — CI gate that fails the build if any raw confidence float in [0,1] leaks into the <brain-rules> prompt-bound payload. middleware/_core.py build_brain_rules_block() updated to obfuscate. Why - Each item is independently testable, low-risk, and clears the runway for the dual-write atomicity PR. - Beta-LB default-on closes a known correctness hole that ships every release until flipped. - Obfuscation gate converts a comment-level guarantee (security/score_obfuscation.py) into an enforced one. Test plan - pytest tests/test_initial_confidence_invariant.py tests/test_score_obfuscation_gate.py tests/test_retrieval_fusion.py tests/test_rule_pipeline.py tests/test_middleware_core.py — 63 passed. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). - ruff on changed files — clean. Layering check - No Layer 0 → 2 imports introduced. Risk - Beta-LB flip changes the default for graduation calibration. Users relying on miscalibrated PATTERN→RULE behavior will see fewer graduations until they set GRADATA_BETA_LB_GATE=0. This is the intended fix. Council references - council_2026-05-02T11-08-25.md (initial v4 review) - council_2026-05-02T11-59-00.md (v4 rerun, all 7 lenses through fallback chain) - council_2026-05-02T12-24-08.md (PR sequencing decision)

…octor --reconcile Council v4 (council_2026-05-02T11-59-00.md) ranked dual-write atomicity the #1 production blocker. Crash mid-write between events.jsonl append and SQLite INSERT could leave the brain in silent split-brain state with no recovery path. What - src/gradata/_events.py - JSONL is the canonical source of truth. Append + fsync FIRST, SQLite INSERT is now an idempotent projection derived from JSONL. - Added reconcile_jsonl_to_sqlite() that scans JSONL past the SQLite watermark and replays missing rows. - Single SQLite projection helper used by both the live write path and the retain orchestrator. - Env-gated crash-window delay for deterministic kill-9 testing only (no production effect). - src/gradata/brain.py - Brain.__init__ runs JSONL → SQLite reconciliation after migrations. - Brain() resolves BRAIN_DIR / cwd when no explicit path is supplied. - observe(text, kind="correction") public event API used by the PR2 spec. - src/gradata/cli.py + src/gradata/_doctor.py - New `gradata doctor --reconcile`: scans for drift, reports the count, replays missing JSONL rows into SQLite, exits non-zero on inconsistency that can't be healed. - tests/test_dualwrite_atomicity.py - Path-agnostic public-API tests covering: happy path, kill-9 mid batch (JSONL must lead SQLite, never trail), reconcile replay, idempotency, doctor CLI drift report, concurrent-writer JSONL line integrity. Why - Before: dual-write claimed atomic in CLAUDE.md, no two-phase commit, no recovery. Crash → silent data loss or duplicate-on-replay. - After: JSONL is the log, SQLite is the projection. Every reopen reconciles. doctor --reconcile is the operator escape hatch. Property: jsonl_count >= sqlite_count, always. Test plan - pytest tests/test_dualwrite_atomicity.py — 6 passed. - Full focused regression on changed surface — 42 passed. - Non-integration suite (excluding socket-bound daemon/plugin tests blocked by sandbox) — 4130 passed, 4 skipped. - pyright src/ — 0 errors, 27 warnings (unchanged baseline). Layering check - _events.py is Layer 0. Brain.__init__ in Layer 2 calls into it. No upward imports introduced. Risk - Reconcile-on-init runs on every Brain open. For a brain with 100k events this adds ~50ms-200ms one-time at startup. Watermark is incremental so subsequent opens are O(drift) not O(total). - Concurrent writers serialize via JSONL append + advisory lock. Throughput trade-off is acceptable for correctness. Council references - council_2026-05-02T11-59-00.md (v4 RISK class, all 7 lenses) - council_2026-05-02T12-24-08.md (PR sequencing — TDD-first) Stacks on #163.

The backfill script and incremental sync both crashed on real-world events.jsonl rows that contain: - float ts (epoch seconds, e.g. 1776803751.89) — broke str-vs-str comparison against the watermark cursor with TypeError. - float or string session values (e.g. 4.5, UUID strings) — server schema rejects non-ints with HTTP 422. Coerce ts to str and session to int|None at the format-event boundary. Also surface the response body in HTTPError so 4xx/5xx debugging is not opaque. Discovered while running scripts/backfill_to_cloud.py against a brain with ~28k events accumulated since 2026-03-22.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-05-02T22:18:50Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 63a5c60b-8ced-4b9a-9412-95df96133b72

📥 Commits

Reviewing files that changed from the base of the PR and between b82a2dc and a3ae362.

📒 Files selected for processing (21)

Gradata/pyproject.toml
Gradata/src/gradata/_doctor.py
Gradata/src/gradata/_events.py
Gradata/src/gradata/brain.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/cloud/client.py
Gradata/src/gradata/enhancements/rule_pipeline.py
Gradata/src/gradata/enhancements/scoring/retrieval_fusion.py
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Gradata/src/gradata/middleware/_core.py
Gradata/tests/test_dualwrite_atomicity.py
Gradata/tests/test_initial_confidence_invariant.py
Gradata/tests/test_middleware_core.py
Gradata/tests/test_retrieval_fusion.py
Gradata/tests/test_rule_graduated_events.py
Gradata/tests/test_rule_pipeline.py
Gradata/tests/test_rule_to_hook.py
Gradata/tests/test_rule_to_hook_promotion.py
Gradata/tests/test_safety_assertion.py
Gradata/tests/test_score_obfuscation_gate.py
Gradata/tests/test_wiring_compound.py

📝 Walkthrough

Walkthrough

This PR implements dual-write event reconciliation for durability, reverses the Beta LB gate default to enabled, obfuscates confidence scores in prompts for security, reorganizes retrieval-fusion imports, and adds type coercion improvements to cloud client event handling.

Changes

Dual-Write Event Reconciliation & Brain Initialization

Layer / File(s)	Summary
Reconciliation Core `src/gradata/_events.py`	Adds `_insert_event_projection()` for idempotent tenant-scoped SQLite inserts and `reconcile_jsonl_to_sqlite()` to replay canonical JSONL into SQLite with drift/invalid-line metrics; `emit()` now uses `_insert_event_projection()` and respects `GRADATA_DUALWRITE_JSONL_FSYNC_DELAY_MS` before SQLite commit.
Brain Path Resolution `src/gradata/_doctor.py`	Adds public `resolve_brain_path(brain_dir)` wrapper to normalize brain directory resolution; `_probe_api()` error handling now uses `contextlib.suppress()`.
Brain Integration `src/gradata/brain.py`	`Brain.__init__` accepts optional `brain_dir`, resolves via `resolve_brain_dir()`, and runs `reconcile_jsonl_to_sqlite()` post-migration (swallowing errors at debug level); `observe()` signature changed to accept `messages: list[dict]
CLI Reconciliation Command `src/gradata/cli.py`	Adds `--reconcile` flag to `gradata doctor` that runs reconciliation, reports drift/replayed/counts, and exits early, skipping standard diagnostics.
Durability Tests `tests/test_dualwrite_atomicity.py`	Six comprehensive tests validate dual-write agreement, JSONL-as-canonical under kill-9, SQLite replay via reconciliation, idempotent re-initialization, `doctor --reconcile` healing, and concurrent-writer serialization; includes subprocess crash simulation and drift tolerance.
Configuration `pyproject.toml`	Adds `dualwrite` pytest marker for test classification.

Beta LB Gate Default Reversal

Layer / File(s)	Summary
Core Behavior `src/gradata/enhancements/self_improvement/_graduation.py`	`_read_beta_lb_config()` now defaults gate to enabled (`"1"` when unset) and disables only for `"0"`, `"false"`, `"no"`, or `"off"`; docstring updated to reflect enabled-by-default contract.
Test Enforcement `tests/test_*.py` (7 files)	Multiple tests explicitly set `GRADATA_BETA_LB_GATE=0` via `monkeypatch` to test behavior under the gate-disabled condition, ensuring consistent test outcomes across the new default.

Confidence Obfuscation & Score Security

Layer / File(s)	Summary
Middleware Obfuscation `src/gradata/middleware/_core.py`	`build_brain_rules_block()` now obfuscates each rendered rule line via `obfuscate_instruction()` instead of inserting raw confidence-annotated text.
Output Assertion Updates `tests/test_middleware_core.py`	Assertions updated to expect obfuscated `[RULE]` markers instead of confidence-parameterized `[RULE:0.95]` format.
Obfuscation Validation `tests/test_score_obfuscation_gate.py`	New module with regex-based assertion that no raw confidence floats leak into prompt text; covers both `Brain.apply_brain_rules()` and `build_brain_rules_block()` outputs.

Module Reorganization & Type Robustness

Layer / File(s)	Summary
Import Path Updates `src/gradata/enhancements/rule_pipeline.py`, `tests/test_retrieval_fusion.py`, `tests/test_rule_pipeline.py`	Retrieval-fusion utilities moved from `gradata.enhancements.retrieval_fusion` to `gradata.enhancements.scoring.retrieval_fusion`; all import statements updated consistently.
Cloud Client Type Coercion `src/gradata/cloud/client.py`	`sync()` coerces event `ts` to string before watermark comparison; `_format_event()` normalizes `ts` to string and converts `session` to `int \| None` with explicit type-handling rules; `_post()` appends HTTP error body snippet (up to 500 chars) to `ConnectionError` message.
Additional Test Updates `tests/test_initial_confidence_invariant.py`	New module validating that lessons with initial confidence remain in INSTINCT state and threshold-boundary promotion logic.

Sequence Diagram(s)

sequenceDiagram
    participant BrainInit as Brain.__init__
    participant MigDB as DB Migrations
    participant ReconcileFunc as reconcile_jsonl_to_sqlite
    participant JSONL as events.jsonl
    participant SQLite as system.db
    participant Logger as Log (debug)

    BrainInit->>MigDB: run_migrations()
    MigDB->>SQLite: CREATE/ALTER tables
    BrainInit->>ReconcileFunc: reconcile_jsonl_to_sqlite(ctx)
    ReconcileFunc->>JSONL: Read events.jsonl
    alt JSONL exists
        ReconcileFunc->>JSONL: Parse & validate JSON lines
        ReconcileFunc->>SQLite: INSERT OR IGNORE per event
        ReconcileFunc->>ReconcileFunc: Track drift, replayed, invalid counts
        ReconcileFunc-->>BrainInit: Return metrics
    else JSONL missing/error
        ReconcileFunc-->>Logger: Log exception at debug level
        ReconcileFunc-->>BrainInit: Swallow error, continue
    end
    BrainInit->>BrainInit: Initialization complete

sequenceDiagram
    participant User as User / CLI
    participant CliCmd as gradata doctor --reconcile
    participant ResolvePath as resolve_brain_path
    participant Recon as reconcile_jsonl_to_sqlite
    participant Output as Output (JSON/text)

    User->>CliCmd: Run with --reconcile flag
    CliCmd->>ResolvePath: Resolve brain_dir
    alt Brain dir resolved
        ResolvePath-->>CliCmd: Return Path
        CliCmd->>Recon: Run reconciliation
        Recon-->>CliCmd: Return metrics dict
        CliCmd->>Output: Format & print drift, replayed, counts
        CliCmd->>User: Exit success
    else Brain dir not found
        ResolvePath-->>CliCmd: Return None
        CliCmd->>Output: Print error to stderr
        CliCmd->>User: Exit code 1
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: P0 council bug fixes (atomic writes, BRAIN_DIR hard-fail, thread-safety) [rebased] #153: Modifies brain-directory resolution and doctor diagnostics, touching similar _doctor.py concerns with path handling and error semantics.
feat(sdk): middleware adapters for OpenAI / Anthropic / LangChain / CrewAI #32: Introduces middleware core foundations; this PR modifies build_brain_rules_block() in the same file to add obfuscation logic.

Suggested labels

bug, enhancement

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/backfill-typecoerce

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

wtfhungs added 3 commits May 2, 2026 14:32

greptile-apps Bot reviewed May 2, 2026

View reviewed changes

Gradata merged commit 8a438c9 into main May 2, 2026
6 of 9 checks passed

Gradata deleted the fix/backfill-typecoerce branch May 2, 2026 22:18

coderabbitai Bot added the bug Something isn't working label May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud/client): coerce float ts and non-int session for /sync#165

fix(cloud/client): coerce float ts and non-int session for /sync#165
Gradata merged 3 commits intomainfrom
fix/backfill-typecoerce

Gradata commented May 2, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gradata commented May 2, 2026

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 2, 2026 •

edited

Loading