Conversation
Council Phase 1 tasks 2-3. Migration 002 adds event_id (ULID), device_id, content_hash, correction_chain_id, origin_agent as nullable columns with chunked 10k backfill. Migration 003 creates sync_state(brain_id, device_id, last_push_event_id, last_pull_cursor) for watermark tracking. No legacy-* device_id prefix (Pragmatist cut): first device to run 002 assigns its device_id to all historical events. Co-Authored-By: Gradata <noreply@gradata.ai>
…afety net Council Phase 1 task 4, rejects Decision 8 as originally specified. Redacts emails in event data/tags inside emit() before any persistent write. Raw unredacted events tee to events.raw.jsonl (gitignored via *.jsonl pattern) for local debugging only. Fail-closed: if the redactor raises, neither raw side-log nor canonical events.jsonl is written. RetainOrchestrator.flush() re-runs redaction as a defense-in-depth safety net in case anyone queues raw events bypassing emit(). Co-Authored-By: Gradata <noreply@gradata.ai>
Council Phase 1 task 1. Unblocks Decision 4 materializer work
(deferred to Phase 2+) by making graduation state changes visible in
events.jsonl instead of only writing to SQLite lesson_transitions.
Instrumented 7 transitions: instinct->pattern, pattern->rule, demoted
below threshold, zero-confidence, untestable expired/idle, moved to
untestable. Each emits {category, description, old_state, new_state,
confidence, fire_count, reason}.
Co-Authored-By: Gradata <noreply@gradata.ai>
Council Phase 1 task 5. cloud/_credentials.py owns the single resolve_credential() chain: kwarg > ~/.gradata/key > GRADATA_API_KEY env > fallback. CloudClient._post() and /events/push both flow through this chain instead of using config.token directly — closes the silo where a keyfile-configured brain couldn't reach the wire. Also adds GRADATA_CLOUD_SYNC_DISABLE kill switch honored by both the new CloudClient.enabled and the legacy _cloud_sync.enabled() paths. key_scope field carried in cloud-config.json for future scoped keys. Co-Authored-By: Gradata <noreply@gradata.ai>
…deprecation Council Phase 1 task 6. Ships four subcommands as the primary path to configuring cloud sync: gradata cloud enable --key gk_live_... [--scope ...] gradata cloud rotate-key --key gk_live_... gradata cloud status gradata cloud disconnect gradata login prints a deprecation notice redirecting users to the new path. Preserves the old command per User Advocate's "no silent breakage" objection to hard deletion. Co-Authored-By: Gradata <noreply@gradata.ai>
Council Phase 1 task 7. Pushes rows from the local events table to POST /events/push in 500-row chunks. Exponential backoff on 5xx, fatal on 4xx. Watermark advances via sync_state.last_push_event_id only after all batches of a run succeed, so partial failures re-push on the next call. Honors GRADATA_CLOUD_SYNC_DISABLE kill switch. Credential flows through the unified _credentials.resolve_credential() chain so keyfile, env var, and cloud-config.json all reach the wire. Co-Authored-By: Gradata <noreply@gradata.ai>
Locks the /events/pull endpoint contract in code before Phase 2 wires the merge path. 501 Not Implemented from server surfaces as a clean 'disabled_server_side' status; 200 OK raises NotImplementedError loudly so nothing accidentally merges into local state before the materializer and Decision 9 merge semantics ship. Contract details in docs/specs/events-pull-contract.md. Co-Authored-By: Gradata <noreply@gradata.ai>
…ontract Addresses the four council blind spots that must be resolved before Phase 2 implementation: - docs/specs/merge-semantics.md — Decision 9. Three-tier conflict resolution: last-write-wins for small deltas, user-surfaced conflict queue for >= 0.15 confidence gaps or opposite directions, source- authority override reserved for team RBAC in Phase 3. - docs/errors.md — Every user-facing error code + message + remediation for cloud/migration/redaction/event-validation paths. - docs/specs/retention-clock.md — first_cloud_store_ts is the retention clock. Rejects emit_ts (backup restore breaks) and push_ts (clock reset gaming). Tier-scoped retention windows. - docs/specs/events-pull-contract.md — frozen interface for GET /events/pull, documents the Phase 1 disabled-server-side behavior and Phase 2 merge requirements. Co-Authored-By: Gradata <noreply@gradata.ai>
…ction Pure event-stream → rule-state fold. Tier 1 LWW (|Δconf| < 0.15 and matching new_state), Tier 2 conflict hold on large drift or state disagreement, Tier 3 RULE_OVERRIDE recognized. No writes, no network — caller decides how to apply materialized state. Ships with 15 tests covering spec invariants: idempotent replay, dedup, order-independence for non-conflicting streams, tiebreak on device_id. Co-Authored-By: Gradata <noreply@gradata.ai>
…mission Adapter between the pure event-fold and the persistent lessons store. apply_to_lessons() non-destructively merges materialized state into an existing Lesson list — only state/confidence/fire_count overwrite; scope and metadata are preserved. Rules held in Tier 2 conflict are skipped. emit_conflict_events() walks result.conflicts and emits one RULE_CONFLICT per key so the dashboard conflict queue can surface them. Injectable emit_fn for tests. Co-Authored-By: Gradata <noreply@gradata.ai>
Replaces the NotImplementedError guard on the 200 OK path with the full merge pipeline. apply=False (default) returns a preview summary with rules_materialized + conflict counts and never touches lessons.md. apply=True merges materialized state into lessons.md and emits RULE_CONFLICT events for unresolved Tier 2 conflicts. Summary shape gains rules_materialized, conflicts, applied, and conflict_events_emitted fields. Malformed JSON now returns status=error/reason=malformed_response instead of crashing. Co-Authored-By: Gradata <noreply@gradata.ai>
…ence Materializer now recognizes RULE_CONFLICT_RESOLVED events: clears the Tier 2 hold and re-applies the user-picked graduation by (ts, device_id). Resolved conflicts are pruned from result.conflicts so callers don't emit stale RULE_CONFLICT events for keys already adjudicated. New cloud._sync_state module upserts last_pull_cursor keyed by (tenant_id, device_id). pull_events(apply=True) now persists the watermark after a successful lessons.md write so the next pull can resume incrementally. Schema ensure runs lazily so brains predating Migration 003 still get a working sync_state table. Co-Authored-By: Gradata <noreply@gradata.ai>
Successive pulls now stream incrementally: when neither rebuild_from nor cursor is provided, pull_events falls back to last_pull_cursor from sync_state so the client doesn't re-download history it already applied. Added a pagination loop (capped at 50 pages) that drains end_of_stream=False responses by passing the page watermark as the next cursor. Defensive: stops if the server forgets to return a watermark, so a broken pagination contract can't spin the client. Explicit rebuild_from still wins — it's the escape hatch for rebuilding from retention, and should never be masked by a stale persisted cursor. 4048 pass / 2 skip. 5 new tests cover auto-resume, rebuild_from override, multi-page drain, no-watermark defense, and the summary shape. Co-Authored-By: Gradata <noreply@gradata.ai>
…pull Exposes the materialize→apply pipeline under the existing `gradata cloud` family. Dry-run by default (counts printed, lessons.md untouched); --apply merges materialized state and emits RULE_CONFLICT events. Flags: --apply, --rebuild-from <watermark>, --limit <n>. No-db / no-cred paths surface as printable reasons so smoke tests can grep the output. 2 new tests cover the no-db path and the dry-run summary shape. Co-Authored-By: Gradata <noreply@gradata.ai>
…e state Ship-gate from merge-semantics.md §6: any permutation of an event pool must materialize to identical rules once sorted by ts. Two cases: - Pure LWW pool (no Tier 2 drift) — 200 permutations × 8 keys × 4 events each → byte-equal rules dict. - Mixed pool with injected 0.95-confidence bursts that force Tier 2 holds — 100 permutations → same rules AND same conflict key set. Keeps under 1s so it stays in the default suite rather than hiding behind a marker. Co-Authored-By: Gradata <noreply@gradata.ai>
Adds a conflict_reason closed-vocabulary prefix (confidence_drift / state_disagreement) and wires enrich_tags so RULE_GRADUATED, RULE_DEMOTED, RULE_OVERRIDE, RULE_CONFLICT, and RULE_CONFLICT_RESOLVED auto-attach category:<X> pulled from the payload. RULE_CONFLICT also auto-attaches conflict_reason:<Y>, and is marked required_on so broken emitters surface as validation issues. Net: cloud events now query the same way as the SDK's own learning events — brain.query_events(tags=["category:STYLE"]) returns both local corrections and cloud-graduated patterns without the caller doing manual tag plumbing. 7 new taxonomy tests. 4065 pass / 2 skip. Co-Authored-By: Gradata <noreply@gradata.ai>
…json CloudConfig.conflict_threshold (default 0.0 sentinel = use SDK default) is threaded through pull_events into materialize(). Out-of-range and unparsable values collapse to the sentinel so a broken config never silently changes merge behavior. gradata cloud sync-pull now prints the active threshold so operators can verify the override took effect without opening the config file. 4065 pass / 2 skip. 6 new tests: defaults, persist, clamp, garbage, end-to-end override flips a drift into a conflict. Co-Authored-By: Gradata <noreply@gradata.ai>
…form A second pull-and-apply of the same stream was duplicating rows in lessons.md. Root cause: apply_to_lessons matched existing lessons with raw event-payload category (lowercase), but the lessons.md serializer uppercases on persist. The match always missed on replay → every materialized rule was appended as a fresh row. Fix: case-insensitive lookup (.upper() on both sides) and uppercase category when constructing fresh Lesson rows so the in-memory form matches what comes back from parse_lessons. Updated the existing append-test assertion to lock the canonical form. Also lands 3 end-to-end disaster-recovery tests proving the ship-gate: - Blank brain + one apply → rules identical to the origin brain's. - Replay of the same stream → byte-identical lessons.md (regression lock for this specific bug). - Pre-existing local lesson in a different category survives a merge. 4068 pass / 2 skip. Co-Authored-By: Gradata <noreply@gradata.ai>
After watermark persist, emit a summary event (events_pulled,
rules_materialized, conflicts, pages_fetched, conflict_threshold,
watermark, status) so the learning dashboard can observe sync
outcomes. Dry-run paths do not emit.
Tag taxonomy: add closed-value sync_status tag ({ok, error,
skipped}) and enrichment block for the new event.
Co-Authored-By: Gradata <noreply@gradata.ai>
Shims whose deprecation warnings have fired since 0.6.x are now
removed per docs/contributing/deprecation-policy.md (new file) and
the two-minor-version carry rule:
- gradata.patterns/ shim (use gradata.contrib.patterns / gradata.rules)
- gradata.enhancements.carl re-export (use .behavioral_engine)
- gradata.enhancements.profiling.tone_profile (orphan, 546 LOC)
- gradata.integrations.{anthropic,openai,langchain,crewai}_adapter
(use gradata.middleware.{wrap_*, LangChainCallback, CrewAIGuard})
- gradata login / gradata logout CLI + device-auth helpers
(use `gradata cloud enable --key gk_live_...`)
Live modules mistakenly colocated with the deprecated adapters are
relocated to gradata.services/:
- gradata.services.embeddings
- gradata.services.session_history
Forwarding shims kept at the old paths through v0.9.0 per policy.
Docs, install_manifest, changelog (0.7.0 entry), and affected tests
updated. Full suite: 4025 passed, 2 skipped.
Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
Caution Review failedPull request was closed or merged during review Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughCloud Sync Phase 2 & Legacy Cleanup (v0.7.0)Breaking Changes:
New Cloud Sync Features:
Security:
New Public APIs:
Database & Events:
CLI Enhancements:
Documentation:
Test Coverage:
WalkthroughAdds a v0.7.0 cloud sync system (push/pull) with materializer and apply path, migrations (event identity, sync_state), ULID/device-id utilities, email PII redaction, package relocations (patterns/integrations→contrib/services/middleware), a unified cloud CLI, embedding/session-history services, docs, and many tests. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,200,255,0.5)
participant CLI
participant PullClient
participant CloudAPI
participant Materializer
participant Lessons
participant Brain
end
CLI->>PullClient: pull_events(brain_dir, apply=...)
PullClient->>CloudAPI: GET /events/pull?brain_id&device_id...(auth)
CloudAPI-->>PullClient: 200 {events, watermark, end_of_stream}
PullClient->>Materializer: materialize(events)
Materializer-->>PullClient: MaterializeResult(rules, conflicts, counts)
alt apply=True
PullClient->>Lessons: apply_to_lessons(MaterializeResult)
Lessons-->>PullClient: updated lessons file
PullClient->>Brain: emit(CLOUD_SYNC_COMPLETED)
PullClient->>PullClient: update_pull_cursor(watermark)
end
PullClient-->>CLI: summary/status
sequenceDiagram
rect rgba(255,200,200,0.5)
participant PushClient
participant DB
participant CloudAPI
participant SyncState
end
PushClient->>DB: fetch events > last_push_event_id (chunk)
PushClient->>CloudAPI: POST /events/push (batch)
alt 2xx
CloudAPI-->>PushClient: success
PushClient->>SyncState: upsert last_push_event_id
else 4xx
CloudAPI-->>PushClient: client error -> abort (no retry)
else 5xx/transport
CloudAPI-->>PushClient: transient -> retry/backoff
end
loop until no pending events
end
sequenceDiagram
rect rgba(200,255,200,0.5)
participant Graduation
participant EventEmitter
participant EventStore
end
Graduation->>EventEmitter: _emit_rule_graduated(lesson, old_state, new_state, reason)
EventEmitter->>EventEmitter: redact payload (emails)
EventEmitter->>EventStore: append redacted event to events.jsonl
EventEmitter->>EventStore: persist redacted event to system.db
EventEmitter->>EventStore: write raw unredacted events.raw.jsonl (best-effort)
EventEmitter-->>Graduation: emit result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
….cloud.*
_cloud_sync.py was the pre-Phase-2 Supabase row-push MVP with its own
env-var scheme (GRADATA_CLOUD_SYNC / GRADATA_CLOUD_URL / GRADATA_CLOUD_KEY)
and direct PostgREST calls. It had zero production callers after the
new gradata.cloud.{client,push,pull,sync,materializer,_credentials}
package shipped.
Removed:
- src/gradata/_cloud_sync.py (440 LOC)
- tests/test_cloud_row_push.py (tested the deleted module)
Updated:
- _doctor._check_cloud_env_vars: now checks GRADATA_API_KEY /
GRADATA_ENDPOINT / GRADATA_CLOUD_SYNC_DISABLE (the env-var surface the
new gradata.cloud.* uses) instead of the old Supabase trio.
- test_cloud_credentials.test_kill_switch: now targets
cloud.push.push_pending_events status=kill_switch (the real network
entrypoint that honours the switch) instead of the deleted row-push
enabled() helper.
- test_doctor_cloud env-var tests rewritten for the new surface.
- _migrations/003_add_sync_state docstring: drop stale references.
- test_embeddings_integration: switch to gradata.services.embeddings
(missed in prior services/ relocation sweep).
Full suite: 4019 passed, 2 skipped, 0 warnings.
Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Actionable comments posted: 33
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Gradata/src/gradata/_doctor.py (1)
252-267:⚠️ Potential issue | 🟠 MajorReuse the unified credential resolver here.
These checks still only accept
[cloud].api_keyfromconfig.toml. After the new keyfile/env credential chain, a valid setup can now be reported asmissing/fail, andcloud_authcan be skipped entirely.doctor --cloudshould resolve credentials the same way the runtime client does.Also applies to: 346-365
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_doctor.py` around lines 252 - 267, _update _check_cloud_config to use the unified credential resolver instead of directly reading config.toml: call the shared resolver used by the runtime client (the credential resolution function that checks keyfile, env vars, and config) and base the returned status on whether that resolver yields valid credentials; return "ok" when credentials are resolved, "missing" when resolver indicates no credentials available, and "fail" when resolver indicates invalid credentials or an error. Replace uses of _read_cloud_config/_gradata_config_path in _check_cloud_config and mirror the exact same replacement in the similar block at lines 346-365 (the cloud_auth-related check) so both doctor --cloud checks use the same resolver logic and outcomes. Ensure you reference the resolver function by its existing name in the codebase when integrating.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/docs/api/patterns.md`:
- Line 114: The docs import paths are wrong: replace any occurrences of
gradata.contrib.patterns.* with gradata.rules.* so the examples import the rule
system modules correctly; specifically update the import that currently reads
"from gradata.contrib.patterns.scope import classify_scope, register_task_type"
to "from gradata.rules.scope import classify_scope, register_task_type" (and do
the same for the other imports in this document that reference
gradata.contrib.patterns, updating their module paths to gradata.rules.* to
match the actual symbols).
In `@Gradata/docs/changelog.md`:
- Around line 7-13: The changelog currently claims the `gradata.patterns` shim
(and other shims) were removed in the 0.7.0 "BREAKING" entry but an earlier note
states removal is scheduled for v0.8.0; reconcile by restoring consistency:
either move the removal bullet(s) out of the 0.7.0 entry and mark them as
deprecated (keep the deprecation text and reference the planned removal in
v0.8.0), or update the wording to say "deprecated in 0.7.0; removal scheduled
for v0.8.0"; specifically edit the bullets referencing `gradata.patterns` (and
any related shim entries like `gradata.enhancements.carl`,
`gradata.integrations.*_adapter`) so they match the previously announced v0.8.0
removal plan and do not claim they are already removed. Ensure the final message
uses the exact symbols `gradata.patterns`, `gradata.enhancements.carl`, and
`gradata.integrations.{anthropic,openai,langchain,crewai}_adapter` so readers
can reconcile with code and releases.
In `@Gradata/docs/contributing/deprecation-policy.md`:
- Around line 3-37: The headings (e.g., "## Scope", "## Lifecycle", "##
Exceptions", "## Non-deprecation refactors", "## Why two versions") currently
lack surrounding blank lines and trigger markdownlint MD022; update the document
by ensuring there is a blank line before and after each section heading so every
"## ..." heading is separated from surrounding content by one empty line, then
run the linter to verify the MD022 warning is resolved.
In `@Gradata/docs/errors.md`:
- Around line 67-72: The table row for CONFLICT_TIER2_HOLD contains the
threshold expression `|Δconfidence| >= 0.15` whose literal pipes break the
Markdown table; update that cell to escape the pipes (e.g., replace
`|Δconfidence| >= 0.15` with either backtick-wrapped code ``|Δconfidence| >=
0.15`` or escaped pipes `\|Δconfidence\| >= 0.15`) so the `CONFLICT_TIER2_HOLD`
row renders correctly in the table.
In `@Gradata/docs/specs/merge-semantics.md`:
- Line 87: The spec line with the property test "apply(device_A, stream) ==
apply(device_B, stream)" currently says "10k shuffled orderings" but the PR
implements ~200+ orderings; update the spec to match the implemented test count
or mark the 10k target as a documented stretch goal. Edit the checklist item
text for the property test in merge-semantics.md to either set the number to 200
(or the exact implemented count) or add a parenthetical note such as "(200+
implemented; 10k planned)"; ensure the unique expression apply(device_A, stream)
== apply(device_B, stream) remains unchanged.
- Line 39: The spec uses a strict inequality (`|Δconfidence| < 0.15`) for Tier 1
while the implementation checklist references `conflict_threshold: 0.15`,
creating an ambiguous boundary at exactly 0.15; update the spec text (the Tier 1
condition) to use a consistent comparison such as `|Δconfidence| <=
conflict_threshold` (or explicitly state that `|Δconfidence| ==
conflict_threshold` falls into Tier 2) and ensure the symbol
`conflict_threshold` is referenced so readers and implementers (and the
checklist) share the same definitive rule.
In `@Gradata/src/gradata/_core.py`:
- Around line 589-595: The provenance event payload is using the constant
INITIAL_CONFIDENCE instead of the actual initial confidence assigned when
approval gating is active; change the event dict built in the
provenance/emission block (the dict with keys
"action","lesson_category","lesson_description","initial_confidence","source_correction_id")
to set "initial_confidence" to the variable that holds the real initial
confidence (e.g. init_conf or whatever local variable is computed when
approval_required=True) rather than INITIAL_CONFIDENCE so the emitted event
matches the created lesson; update references where the event is created in
Gradata/src/gradata/_core.py (look for approval_required, init_conf, and
INITIAL_CONFIDENCE) to use the computed init_conf variable.
- Around line 945-955: The INSERT into lesson_transitions is writing NULL for
the session column (currently passing None); update the values tuple in the
lesson_transitions insert so the session column gets the correct session
identifier (e.g., use lesson.session or the current session variable like
session_id/current_session from the surrounding scope) instead of None; ensure
the symbol sequence around this change (lesson_transitions insert, lesson, _tid,
now) remains unchanged and add a fallback to preserve lineage if the session
variable can be missing.
In `@Gradata/src/gradata/_events.py`:
- Around line 703-710: The loop over new_events currently only calls
_redact_payload for dict/list types, letting scalar queued payloads (e.g.
"data": "alice@example.com") bypass redaction; update the loop to call
_redact_payload for any existing "data" or "tags" field (not just when
isinstance(..., (dict, list))) so that scalar values are passed through
_redact_payload as well; keep the type checks unnecessary because
_redact_payload already no-ops for unsupported types and ensure you reference
the existing variables and functions in this block: new_events, _redact_payload,
and emit() to maintain the defense-in-depth safety net.
In `@Gradata/src/gradata/_migrations/_ulid.py`:
- Around line 46-50: The code currently parses iso_ts with
datetime.fromisoformat and calls dt.timestamp(), which treats naive datetimes as
local time; update the parsing logic so that if dt.tzinfo is None it is treated
as UTC (e.g. dt = dt.replace(tzinfo=timezone.utc)) before computing ts_ms;
modify the block around datetime.fromisoformat/iso_ts/ts_ms in _ulid.py to set
UTC for naive datetimes and ensure timezone is imported so new_ulid() generation
becomes deterministic across hosts.
In `@Gradata/src/gradata/_migrations/device_uuid.py`:
- Around line 47-77: The current flow may return a transient new_did without
persisting it when fpath exists but contains an invalid device id; change the
post-write replacement logic inside the block that writes tmp so that after
writing new_did you atomically replace fpath if it does not exist OR if the
existing fpath content fails _is_valid(), otherwise unlink tmp; in practice
update the branch around os.replace/tmp handling to read fpath (using
fpath.read_text and _is_valid) when fpath.exists() and call os.replace(tmp,
fpath) when the existing value is invalid, falling back to os.unlink(tmp) when
the existing value is valid, so that new_did is persisted whenever the on-disk
id is corrupt.
In `@Gradata/src/gradata/cli.py`:
- Around line 674-675: Replace the call to _resolve_brain_root() and the eager
brain_root.mkdir() with the CLI’s canonical brain resolution via
_get_brain(args) so cloud commands use the same brain as other commands; obtain
the brain root from the returned brain object (e.g., from the brain's .root /
.path attribute) instead of calling _resolve_brain_root(), and remove the
unconditional brain_root.mkdir(parents=True, exist_ok=True) so status/disconnect
do not create the directory.
In `@Gradata/src/gradata/cloud/_sync_state.py`:
- Around line 91-101: The INSERT currently uses tenant_id in place of brain_id
and conflicts only on brain_id, collapsing cursors across devices; change the
VALUES parameter order to pass the actual brain_id variable (use (brain_id,
tenant_id, device_id, cursor, now)) and update the ON CONFLICT target to include
device_id (ON CONFLICT(brain_id, device_id) DO UPDATE SET ...) so each device
keeps its own last_pull_cursor in the sync_state upsert. Ensure the parameter
tuple and conflict target use the symbols brain_id, tenant_id, device_id,
cursor, and now exactly as in this function.
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 83-90: The materializer key builder is using the raw category case
causing mismatches with the apply path which collapses categories using upper();
update the key construction in _rule_key (the block that computes category,
pattern_hash, description, ident) to normalize category the same way as apply
(e.g., category = str(data.get("category") or "").strip().upper()) so identical
logical categories (e.g., "style" vs "STYLE") produce the same key and surface
conflicts deterministically; leave ident logic unchanged and return (category,
ident) as before.
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 218-259: The current code only calls update_pull_cursor (via
update_pull_cursor/db/tenant_id/device_id/cursor) inside the branch guarded by
if apply and (mat.rules or mat.conflicts), so when apply=True but nothing
materialized the watermark is never persisted; move or duplicate the watermark
persistence logic (the block that reads watermark = summary.get("watermark") and
calls update_pull_cursor) so it runs whenever apply is True and a watermark
exists — regardless of whether
apply_to_lessons/emit_conflict_events/write_lessons_safe ran — and keep the
existing try/except that logs debug on failure; reference apply_to_lessons,
emit_conflict_events, write_lessons_safe, and update_pull_cursor to locate the
relevant code.
- Around line 73-76: The branch that checks db.is_file() currently sets
summary["status"] = "error" and summary["reason"] = "no_db", but according to
the function contract it must return the advertised stable status "no_db";
change the assignment so summary["status"] = "no_db" (keeping summary["reason"]
= "no_db" if desired) and return summary so callers can key off status; update
the code around the db.is_file() check in pull.py (the function that constructs
summary) to use "no_db" as the status value instead of "error".
- Around line 99-106: Replace the direct read of config.api_base with the
project’s centralized endpoint resolver and then enforce HTTPS; specifically,
call the shared resolver (the function that resolves GRADATA_ENDPOINT /
GRADATA_CLOUD_API_BASE into the effective API base) to produce api_base instead
of using config.api_base, then call require_https(api_base, "api_base") and keep
the existing error handling that sets summary["status"] and summary["reason"];
update references to api_base and require_https in pull.py accordingly.
In `@Gradata/src/gradata/cloud/push.py`:
- Around line 253-254: Replace the manual open/try/finally pattern for sqlite3
connections with a context manager: use sqlite3.connect(...) as conn so the
connection is automatically closed (and committed on exit); update code that
calls conn.commit() in _write_watermark to account for the with-statement's
commit-on-exit behavior (remove redundant commits or adjust logic if explicit
commits are required), and ensure all uses of conn inside the surrounding block
reference the new context-managed conn variable.
- Around line 130-152: The upsert in _write_watermark is using ON
CONFLICT(brain_id) and inserts brain_id=tenant_id, which causes per-device state
to be overwritten; change the upsert to use ON CONFLICT(tenant_id, device_id) so
watermarks are per (tenant,device), update the INSERT/VALUES to still set
brain_id for backward compatibility but make the conflict target the composite
key (tenant_id, device_id), and ensure the sync_state table has a
UNIQUE(tenant_id, device_id) constraint (or a composite primary key) so the ON
CONFLICT(tenant_id, device_id) target is valid.
In `@Gradata/src/gradata/cloud/sync.py`:
- Around line 187-190: The check for a missing credential in _post() is likely
redundant because enabled already calls _resolved_credential(); either remove
the dead branch or make the defensive intent explicit—choose one: (A) remove the
if not credential: log.debug(...) return None block from _post() and rely on
enabled to gate calls, or (B) keep the check but add a clear comment above it
explaining the race-condition/keyfile-deletion scenario and why we still bail
out, and consider changing log.debug to log.warning if you want visibility;
reference _post(), the enabled property, and _resolved_credential() when making
the change.
In `@Gradata/src/gradata/contrib/enhancements/install_manifest.py`:
- Around line 315-324: The manifest entry was renamed to id="middleware" but
callers may still pass the legacy id "integrations", causing Unknown module
errors; update the install manifest resolution to preserve backward
compatibility by mapping the legacy id "integrations" to the new id "middleware"
(mirror how the existing "carl" alias is handled) — either add "integrations" as
an alias on the manifest entry or add a normalization mapping in the same
resolver/normalizer code path that currently recognizes the "carl" alias so that
lookups for "integrations" resolve to the middleware module.
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py`:
- Around line 76-88: The current flow falls back to module-level _events_emit
after a failed brain.emit, which can create duplicate RULE_GRADUATED events;
change the logic so the module-level fallback (_events_emit) is only invoked
when no brain emitter is available (i.e., brain is None or lacks emit).
Specifically, in the block that calls brain.emit("RULE_GRADUATED", "graduate",
payload, []), keep the try/except to log failures from brain.emit but do NOT
call or fall through to _events_emit afterwards—either return or re-raise after
logging to stop further processing; only call _events_emit when the initial
condition (brain is None or not hasattr(brain, "emit")) is true.
In `@Gradata/src/gradata/services/embeddings.py`:
- Around line 166-172: The current embedding cache uses FIFO eviction because
accesses don't update recency; modify the logic around _embedding_cache (the
dict used with _CACHE_MAX_SIZE and the get_client().embed(desc) path) so that
when a description is found in _embedding_cache you call
_embedding_cache.move_to_end(desc) to mark it as recently used, and when
inserting a newly fetched vec you also ensure recency is set before the eviction
loop that pops with _embedding_cache.popitem(last=False); this will implement
true LRU eviction rather than FIFO.
- Around line 44-53: The SSRF guard in _is_trusted_url is too permissive because
hostname.endswith("gradata.ai") allows sibling domains; tighten it by only
allowing parsed.hostname == "gradata.ai" or
parsed.hostname.endswith(".gradata.ai") (note the leading dot) and ensure
parsed.hostname is present; also restrict the localhost/127.0.0.1 exception to
only permit safe schemes (e.g., parsed.scheme in ("http", "https")) instead of
allowing any scheme. Update the checks in _is_trusted_url (using the parsed
variable) accordingly.
In `@Gradata/src/gradata/services/session_history.py`:
- Around line 43-50: on_session_ended currently mutates the passed payload
in-place by assigning payload["rule_effectiveness"] =
self.compute_effectiveness(), which can surprise callers; change it to compute
effectiveness via self.compute_effectiveness(), call self.reset(), and return
the effectiveness data instead of modifying payload, update the on_session_ended
docstring to clearly state it returns the effectiveness dict (and no longer
mutates payload), and update any callers that relied on the in-place mutation to
merge the returned dict into their payloads; reference symbols:
on_session_ended, compute_effectiveness, reset.
- Around line 10-11: Update the usage example in the module docstring to
reference the new canonical import path: replace "from
gradata.integrations.session_history import SessionHistory" with "from
gradata.services.session_history import SessionHistory" (the class symbol is
SessionHistory in gradata.services.session_history); leave any mentions of
forwarding shims intact but ensure the example shows the new module location so
documentation reflects the canonical API.
In `@Gradata/tests/test_cli_cloud.py`:
- Around line 29-36: The helper _run currently returns an empty string and
leaves tests to call capsys.readouterr().out; change _run(monkeypatch, *argv:
str) to accept a capsys fixture parameter (e.g., capsys) and after calling
cli.main() capture and return capsys.readouterr().out; keep the existing
SystemExit handling around cli.main() and preserve monkeypatch.setattr(sys,
"argv", ["gradata", *argv]) so callers can simply call _run(monkeypatch, capsys,
...) to get the CLI output directly.
In `@Gradata/tests/test_cloud_events_pull.py`:
- Around line 30-37: The fixture _isolate_keyfile doesn't clear all
cloud-related env vars, causing non-hermetic tests; in the _isolate_keyfile
fixture add monkeypatch.delenv calls for GRADATA_CLOUD_URL, GRADATA_CLOUD_KEY,
and GRADATA_CLOUD_SYNC (in addition to the existing
GRADATA_CLOUD_SYNC_DISABLE/GRADATA_API_KEY clears) so that _save_cfg() and tests
always control the endpoint and credentials; update references in the fixture
around monkeypatch and _creds (KEYFILE_PATH/KEYFILE_DIR) to ensure the new
deletions run before yield.
In `@Gradata/tests/test_cloud_sync_state.py`:
- Around line 34-39: Seed a non-empty cursor into the DB before calling
update_pull_cursor in test_empty_cursor_is_noop, then assert
update_pull_cursor(db, tenant_id="t", device_id="dev_a", cursor="") returns
False and that the persisted cursor value is unchanged afterwards; locate the
cursor using the existing helper or API (e.g., get_pull_cursor or the same
storage access used elsewhere in tests) and compare the stored value to the
original seeded cursor to ensure the empty-cursor call is a true no-op.
In `@Gradata/tests/test_emit_pii_redaction.py`:
- Around line 103-104: Replace the broad pytest.raises(Exception) with an
explicit expectation for the error types the code may raise: assert
pytest.raises((EventPersistenceError, RuntimeError)) around the brain.emit call
(or only EventPersistenceError if the implementation should always wrap
failures). Ensure EventPersistenceError is imported into
Gradata/tests/test_emit_pii_redaction.py and keep the same call to
brain.emit("SHOULD_NOT_LAND", "test", {"note": SECRET_EMAIL}, []) so the test
fails only for the intended error types.
In `@Gradata/tests/test_migration_002_event_identity.py`:
- Around line 171-184: emit() currently omits the identity columns event_id,
device_id, and content_hash when inserting new events, which will leave
post-migration rows permanently NULL; update the emit() implementation so its
INSERT populates event_id (UUID or deterministic id), device_id (derive from
supplied session/source/device context), and content_hash (compute from
data_json or canonicalized payload) for every new event, ensuring the same
column names are written as Migration 002 backfills; alternatively, if you
prefer the migration approach, change Migration 002 to be idempotent and
rerunnable (detect and backfill only rows with NULL identity columns) so fresh
inserts remain consistent with the historical backfill contract.
- Around line 37-40: The test helper _run_002 currently calls
_apply_numbered(conn, brain.dir) and then directly invokes module.up(conn,
tenant_id="unused"), which bypasses the runner path; modify _run_002 to only
call _apply_numbered(conn, brain.dir) (so the migration is executed via the
runner and marked applied) and remove the direct module.up(...) call, ensuring
the runner's has_applied() logic is exercised and the test reflects production
behavior.
In `@Gradata/tests/test_rule_graduated_events.py`:
- Around line 13-18: The import of Brain is only used for type annotations and
should be guarded by TYPE_CHECKING to avoid importing gradata.brain at test
collection; add "from typing import TYPE_CHECKING" and move "from gradata.brain
import Brain" into an "if TYPE_CHECKING:" block, keeping the rest of the imports
(Lesson, LessonState, graduate, init_brain) unchanged and ensuring any
annotations referencing Brain remain valid (use a forward-reference string
'Brain' or rely on future annotations if present).
---
Outside diff comments:
In `@Gradata/src/gradata/_doctor.py`:
- Around line 252-267: _update _check_cloud_config to use the unified credential
resolver instead of directly reading config.toml: call the shared resolver used
by the runtime client (the credential resolution function that checks keyfile,
env vars, and config) and base the returned status on whether that resolver
yields valid credentials; return "ok" when credentials are resolved, "missing"
when resolver indicates no credentials available, and "fail" when resolver
indicates invalid credentials or an error. Replace uses of
_read_cloud_config/_gradata_config_path in _check_cloud_config and mirror the
exact same replacement in the similar block at lines 346-365 (the
cloud_auth-related check) so both doctor --cloud checks use the same resolver
logic and outcomes. Ensure you reference the resolver function by its existing
name in the codebase when integrating.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8127e2b0-9c06-45b8-b498-2d6dccd4cb7f
📒 Files selected for processing (69)
Gradata/docs/api/enhancements.mdGradata/docs/api/patterns.mdGradata/docs/architecture/patterns.mdGradata/docs/changelog.mdGradata/docs/contributing/deprecation-policy.mdGradata/docs/errors.mdGradata/docs/guides/custom-domains.mdGradata/docs/guides/mcp.mdGradata/docs/sdk/middleware.mdGradata/docs/specs/events-pull-contract.mdGradata/docs/specs/merge-semantics.mdGradata/docs/specs/retention-clock.mdGradata/src/gradata/__init__.pyGradata/src/gradata/_cloud_sync.pyGradata/src/gradata/_core.pyGradata/src/gradata/_doctor.pyGradata/src/gradata/_events.pyGradata/src/gradata/_migrations/002_add_event_identity.pyGradata/src/gradata/_migrations/003_add_sync_state.pyGradata/src/gradata/_migrations/_ulid.pyGradata/src/gradata/_migrations/device_uuid.pyGradata/src/gradata/_tag_taxonomy.pyGradata/src/gradata/brain.pyGradata/src/gradata/cli.pyGradata/src/gradata/cloud/__init__.pyGradata/src/gradata/cloud/_apply_materialized.pyGradata/src/gradata/cloud/_credentials.pyGradata/src/gradata/cloud/_sync_state.pyGradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/src/gradata/cloud/push.pyGradata/src/gradata/cloud/sync.pyGradata/src/gradata/contrib/enhancements/install_manifest.pyGradata/src/gradata/enhancements/carl.pyGradata/src/gradata/enhancements/profiling/__init__.pyGradata/src/gradata/enhancements/profiling/tone_profile.pyGradata/src/gradata/enhancements/self_improvement/_graduation.pyGradata/src/gradata/integrations/__init__.pyGradata/src/gradata/integrations/anthropic_adapter.pyGradata/src/gradata/integrations/crewai_adapter.pyGradata/src/gradata/integrations/embeddings.pyGradata/src/gradata/integrations/langchain_adapter.pyGradata/src/gradata/integrations/openai_adapter.pyGradata/src/gradata/integrations/session_history.pyGradata/src/gradata/patterns/__init__.pyGradata/src/gradata/services/__init__.pyGradata/src/gradata/services/embeddings.pyGradata/src/gradata/services/session_history.pyGradata/tests/security/test_device_auth_url.pyGradata/tests/test_bug_fixes.pyGradata/tests/test_cli_cloud.pyGradata/tests/test_cloud_apply_materialized.pyGradata/tests/test_cloud_credentials.pyGradata/tests/test_cloud_disaster_recovery.pyGradata/tests/test_cloud_events_pull.pyGradata/tests/test_cloud_events_push.pyGradata/tests/test_cloud_materializer.pyGradata/tests/test_cloud_sync.pyGradata/tests/test_cloud_sync_state.pyGradata/tests/test_doctor_cloud.pyGradata/tests/test_embeddings.pyGradata/tests/test_emit_pii_redaction.pyGradata/tests/test_integrations.pyGradata/tests/test_migration_002_event_identity.pyGradata/tests/test_migration_003_sync_state.pyGradata/tests/test_rule_graduated_events.pyGradata/tests/test_session_history.pyGradata/tests/test_spec_compliance.pyGradata/tests/test_tag_taxonomy.py
💤 Files with no reviewable changes (10)
- Gradata/src/gradata/enhancements/profiling/init.py
- Gradata/src/gradata/enhancements/carl.py
- Gradata/src/gradata/integrations/langchain_adapter.py
- Gradata/src/gradata/patterns/init.py
- Gradata/src/gradata/integrations/openai_adapter.py
- Gradata/src/gradata/integrations/anthropic_adapter.py
- Gradata/tests/security/test_device_auth_url.py
- Gradata/src/gradata/integrations/crewai_adapter.py
- Gradata/tests/test_integrations.py
- Gradata/src/gradata/enhancements/profiling/tone_profile.py
📜 Review details
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/src/gradata/_doctor.pyGradata/tests/test_cloud_sync_state.pyGradata/tests/test_emit_pii_redaction.pyGradata/tests/test_bug_fixes.pyGradata/tests/test_cloud_disaster_recovery.pyGradata/tests/test_cli_cloud.pyGradata/tests/test_cloud_credentials.pyGradata/tests/test_migration_002_event_identity.pyGradata/tests/test_rule_graduated_events.pyGradata/tests/test_cloud_materializer.pyGradata/src/gradata/_core.pyGradata/tests/test_cloud_events_pull.pyGradata/tests/test_cloud_events_push.py
🪛 LanguageTool
Gradata/docs/specs/events-pull-contract.md
[style] ~12-~12: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...e is the commitment." If /events/pull is going to be the disaster-recovery path for the m...
(GOING_TO_WILL)
[style] ~36-~36: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...vents with emit_ts >= rebuild_from. - If absent, the server uses the requester's...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.22.0)
Gradata/docs/errors.md
[warning] 71-71: Table column count
Expected: 4; Actual: 6; Too many cells, extra data will be missing
(MD056, table-column-count)
Gradata/docs/contributing/deprecation-policy.md
[warning] 3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 8-8: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 29-29: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 34-34: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🪛 Ruff (0.15.10)
Gradata/tests/test_emit_pii_redaction.py
[warning] 103-103: Do not assert blind exception: Exception
(B017)
Gradata/src/gradata/cloud/_sync_state.py
[warning] 54-57: Use contextlib.suppress(sqlite3.OperationalError) instead of try-except-pass
(SIM105)
Gradata/tests/test_migration_002_event_identity.py
[warning] 99-99: Unpacked variable ts is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
Gradata/src/gradata/cloud/_credentials.py
[warning] 53-56: Use contextlib.suppress(OSError) instead of try-except-pass
Replace try-except-pass with with contextlib.suppress(OSError): ...
(SIM105)
Gradata/src/gradata/cloud/materializer.py
[warning] 35-35: Import from collections.abc instead: Iterable
Import from collections.abc
(UP035)
[warning] 307-313: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
Gradata/tests/test_rule_graduated_events.py
[warning] 16-16: Move application import gradata.brain.Brain into a type-checking block
Move into type-checking block
(TC001)
Gradata/src/gradata/cloud/_apply_materialized.py
[warning] 22-22: Move application import gradata.cloud.materializer.MaterializeResult into a type-checking block
Move into type-checking block
(TC001)
Gradata/tests/test_cloud_materializer.py
[error] 320-320: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (36)
Gradata/src/gradata/__init__.py (3)
21-31: Version resolution fallback remains correct.The
importlib.metadata/fallback chain is unchanged functionally; this is safe.
95-99: Pattern import guidance now matches the migrated namespace.Line 97 correctly points users to
gradata.contrib.patterns, consistent with the lazy import map.
145-152: Lazy-load deprecation path remains intact.No behavioral drift in
__getattr__; warning + dynamic import flow is preserved.Gradata/docs/guides/mcp.md (1)
71-71: Import path update is correct.Line 71 aligns the example with the new
gradata.contrib.patterns.mcpnamespace.Gradata/docs/guides/custom-domains.md (1)
45-45: Behavioral engine import migration looks consistent.Line 45 updates the docs to the new module path without changing example behavior.
Gradata/docs/api/enhancements.md (1)
94-94: Enhancements import migration is coherent.Line 94 consistently points CARL-related examples at
gradata.enhancements.behavioral_engine.Gradata/docs/architecture/patterns.md (1)
12-12: Architecture examples are consistently migrated.These snippet imports correctly reflect
gradata.contrib.patterns.*for Layer 0 APIs.Also applies to: 23-23, 38-38, 66-66, 87-87
Gradata/docs/api/patterns.md (1)
8-104: Most pattern import migrations are correct.The updated examples for orchestrator/pipeline/parallel/reflection/evaluator/memory/guardrails/human_loop/tools/mcp align with the contrib namespace move.
Also applies to: 149-162
Gradata/tests/test_spec_compliance.py (2)
26-33: Spec compliance migration checks are updated correctly.The architecture/enhancement importability assertions now match the new module layout.
Also applies to: 77-84
318-322: Good strengthening of zero-dependency import assertions.Adding explicit
reflectionandguardrailsassertions makes this check less brittle.Gradata/tests/test_bug_fixes.py (1)
355-388: New import-surface regression test is solid.This effectively replaces deprecated-shim expectations with direct checks of current canonical namespaces.
Gradata/tests/test_doctor_cloud.py (1)
44-44: Test assertion update is correct and aligned with new CLI guidance.Gradata/tests/test_embeddings.py (1)
5-5: Import migration is correct; tests now exercise the canonical services module.Gradata/tests/test_session_history.py (1)
2-3: Session history test updates look good; migration + formatting changes are safe.Also applies to: 8-15, 26-34
Gradata/src/gradata/brain.py (1)
195-201: Subscriber wiring migration is clean and safely guarded.Gradata/src/gradata/services/__init__.py (1)
1-9: Good package-level documentation for the services namespace transition.Gradata/src/gradata/cloud/__init__.py (1)
31-34: Package exports are consistent with the new cloud credentials module surface.Gradata/tests/test_cloud_sync.py (1)
67-99: Good coverage forconflict_thresholdcoercion.These tests lock down the default sentinel, valid override, out-of-range fallback, and garbage-input fallback paths.
Gradata/src/gradata/_cloud_sync.py (1)
229-239: Kill-switch short-circuit looks good.Checking the shared cloud kill switch before the legacy env gating keeps emergency write-disable behavior centralized and predictable.
Gradata/tests/test_tag_taxonomy.py (1)
49-96: Nice regression coverage for the rule-event tags.This pins both enrichment behavior and
conflict_reasonvalidation, including the non-duplication path for caller-supplied category tags.Gradata/src/gradata/integrations/session_history.py (1)
1-19: Shim behavior matches the stated deprecation window.The import-time warning plus re-export keeps legacy imports working while steering callers to
gradata.services.session_history.Gradata/docs/specs/events-pull-contract.md (3)
34-36: Minor: Consider varying sentence structure.Static analysis flagged three successive sentences beginning with "If". While this is clear and systematic for a spec, you could optionally vary the structure for readability.
118-126: LGTM! Smart Phase 1 stub design.The intentional
NotImplementedErroron 200 response is a clever safety mechanism - it ensures nothing accidentally gets merged before the materializer ships, while still allowing the client interface to be released and frozen.
108-115: LGTM! Clear GA readiness invariants.The property invariants provide a concrete checklist for validating the implementation before GA: idempotent replay, order-independence, watermark monotonicity, dedup invariant, and retention boundary behavior.
Gradata/src/gradata/cloud/sync.py (2)
73-85: LGTM! Defensive threshold coercion.The
_coerce_thresholdfunction properly handles invalid inputs (non-numeric, out-of-range) by collapsing to the0.0sentinel, ensuring a broken config never silently changes merge behavior.
151-174: LGTM! Unified credential resolution chain.The
_resolved_credential()method correctly implements the documented precedence (config.token > keyfile > env var), and theenabledproperty properly gates sync on bothsync_enabledflag AND credential availability.Gradata/src/gradata/cloud/push.py (1)
171-189: LGTM! Robust retry logic with proper backoff.The
_post_batchfunction correctly:
- Distinguishes non-retryable 4xx errors from transient failures
- Implements exponential backoff (
2**attempt)- Limits retries appropriately
- Returns a clear success/failure tuple
Gradata/src/gradata/services/session_history.py (1)
54-69: LGTM! Clean effectiveness computation.The
compute_effectivenessmethod correctly implements the inverse relationship: a rule is effective if it was NOT corrected during the session. The early return for emptyinjected_this_sessionavoids unnecessary computation.Gradata/tests/test_cli_cloud.py (1)
118-179: LGTM! Comprehensive dry-run test with proper isolation.The test properly:
- Sets up a real Brain with a seeded event to create
system.db- Configures cloud sync with a valid token
- Mocks the HTTP response with a realistic payload
- Verifies dry-run output format and that
lessons.mdis unchangedGradata/docs/specs/merge-semantics.md (1)
67-77: LGTM! Clear convergence property definition.The formal convergence guarantee with pseudocode clearly defines the required property: any two devices applying the same ordered event stream must produce bit-identical results.
Gradata/docs/specs/retention-clock.md (2)
37-48: LGTM! Clear tier-based retention windows.The retention windows table is comprehensive and the "indefinite" clarification is important for user expectations and storage planning.
75-81: LGTM! Edge cases well documented.The spec covers important edge cases: backup restoration, duplicate pushes, cloud-only events, and migration backfill. This documentation will help prevent bugs during implementation.
Gradata/tests/test_cloud_credentials.py (3)
44-63: LGTM! Comprehensive credential precedence tests.The tests thoroughly cover the credential resolution chain: explicit argument wins over keyfile, keyfile wins over environment variable, and empty string is returned when nothing is set.
105-116: LGTM! Important integration test for kill switch.This test verifies critical behavior: the kill switch (
GRADATA_CLOUD_SYNC_DISABLE) overrides all other sync settings, ensuring users can always disable cloud sync regardless of other configuration.
119-128: LGTM! Verifies unified auth chain in CloudClient.This test confirms that
CloudClientproperly consults the keyfile whencloud-config.jsonhas no token, validating the unified credential resolution chain documented insync.py.Gradata/tests/test_cloud_events_push.py (1)
24-37: Good hermetic setup for the push contract tests.Rebinding the keyfile path and clearing the cloud env overrides here keeps the suite deterministic across developer shells and CI.
|
|
||
| ```python | ||
| from gradata.patterns.scope import classify_scope, register_task_type | ||
| from gradata.contrib.patterns.scope import classify_scope, register_task_type |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify canonical locations for rule modules
fd -i '^(scope|rule_engine|rule_tracker)\.py$' | sort
# Verify current docs usage for these imports
rg -n --type=md -C1 'from gradata\.(contrib\.patterns|rules)\.(scope|rule_engine|rule_tracker) import' Gradata/docsRepository: Gradata/gradata
Length of output: 830
🏁 Script executed:
# Find all Python files related to rule modules
find Gradata -name "rule_engine.py" -o -name "scope.py" -o -name "rule_tracker.py" 2>/dev/null
# Check if contrib.patterns directory exists
find Gradata -type d -name "patterns" 2>/dev/null
# Search for rule_engine references anywhere in the codebase
rg -l "rule_engine" --type=py Gradata 2>/dev/null | head -20Repository: Gradata/gradata
Length of output: 989
🏁 Script executed:
# List contents of contrib/patterns directory
ls -la Gradata/src/gradata/contrib/patterns/
# Also check rules directory structure
ls -la Gradata/src/gradata/rules/
# Search specifically for rule_engine.py
find Gradata -name "rule_engine.py" -type fRepository: Gradata/gradata
Length of output: 2343
🏁 Script executed:
# Check the rule_engine directory structure
ls -la Gradata/src/gradata/rules/rule_engine/
# Check what's exported from rule_engine module
cat -n Gradata/src/gradata/rules/rule_engine/__init__.py | head -50Repository: Gradata/gradata
Length of output: 2316
🏁 Script executed:
# Check if contrib/patterns has these modules
ls -la Gradata/src/gradata/contrib/patterns/
# Check what's in contrib/patterns __init__.py
cat -n Gradata/src/gradata/contrib/patterns/__init__.py | head -100
# Check the test_spec_compliance.py at lines 64-66 mentioned in the review
sed -n '60,70p' Gradata/tests/test_spec_compliance.pyRepository: Gradata/gradata
Length of output: 7036
🏁 Script executed:
# Verify all three lines in the docs that need fixing
sed -n '113,140p' Gradata/docs/api/patterns.md
# Double-check the actual module locations
cat -n Gradata/src/gradata/rules/__init__.py | head -40Repository: Gradata/gradata
Length of output: 1876
Update import paths in documentation to use gradata.rules.* instead of gradata.contrib.patterns.*
Lines 114, 127, and 136 import from gradata.contrib.patterns.*, but these modules are located in gradata.rules.*. The contrib/patterns directory contains pattern implementations (Pipeline, RAG, Guards, etc.), not the rule system modules. This will cause copy-paste import failures for users.
📌 Docs fix
-from gradata.contrib.patterns.scope import classify_scope, register_task_type
+from gradata.rules.scope import classify_scope, register_task_type
...
-from gradata.contrib.patterns.rule_engine import apply_rules, format_rules_for_prompt
+from gradata.rules.rule_engine import apply_rules, format_rules_for_prompt
...
-from gradata.contrib.patterns.rule_tracker import log_application
+from gradata.rules.rule_tracker import log_application📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from gradata.contrib.patterns.scope import classify_scope, register_task_type | |
| from gradata.rules.scope import classify_scope, register_task_type |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/api/patterns.md` at line 114, The docs import paths are wrong:
replace any occurrences of gradata.contrib.patterns.* with gradata.rules.* so
the examples import the rule system modules correctly; specifically update the
import that currently reads "from gradata.contrib.patterns.scope import
classify_scope, register_task_type" to "from gradata.rules.scope import
classify_scope, register_task_type" (and do the same for the other imports in
this document that reference gradata.contrib.patterns, updating their module
paths to gradata.rules.* to match the actual symbols).
| ## Scope | ||
| Applies to every public symbol, import path, CLI command, and config key in | ||
| `gradata` that is marked for removal via `DeprecationWarning`, a | ||
| `.. deprecated::` docstring tag, or a CHANGELOG entry tagged `DEPRECATED`. | ||
|
|
||
| ## Lifecycle | ||
| 1. **Mark.** The deprecating release MUST: | ||
| - Emit `DeprecationWarning` at first use (module-level via `__getattr__` | ||
| where possible, otherwise at function entry). | ||
| - Add `.. deprecated:: X.Y.Z` to the symbol's docstring. | ||
| - Add a `DEPRECATED` bullet to CHANGELOG naming the replacement. | ||
| 2. **Carry.** The symbol SHALL remain callable through **two additional minor | ||
| versions** after the deprecating release, then be removed. | ||
| - Deprecated in 0.6.x → removed in 0.8.0 (earliest). | ||
| - No removal inside a patch bump. | ||
| 3. **Remove.** The removing release MUST: | ||
| - Add a `BREAKING` bullet to CHANGELOG with the old path → new path. | ||
| - Delete the shim, its tests, and any docs that reference it. | ||
|
|
||
| ## Exceptions | ||
| - **Security fixes** may remove a deprecated path earlier. CHANGELOG must | ||
| state "removed early for security" with a CVE or internal ticket. | ||
| - **Pre-0.7.0 relaxation.** Before 0.7.0 ships, early removal is permitted | ||
| if the CHANGELOG `BREAKING` section names the symbol explicitly. Once | ||
| 0.7.0 ships, the two-minor-version rule binds. | ||
|
|
||
| ## Non-deprecation refactors | ||
| Moving a non-deprecated module (rename, relocation) counts as a breaking | ||
| change and follows the same two-minor-version carry rule via a forwarding | ||
| shim. The move itself triggers the deprecation clock. | ||
|
|
||
| ## Why two versions | ||
| One version is the window users realistically notice the warning in their | ||
| test suite. The second is the window they schedule the migration. Removing | ||
| after one minor version breaks users who followed the warning in good faith. |
There was a problem hiding this comment.
Add blank lines around the section headings.
markdownlint is already flagging MD022 here, so this doc will keep failing style checks until the headings are separated by blank lines.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 3-3: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 8-8: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 29-29: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 34-34: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/contributing/deprecation-policy.md` around lines 3 - 37, The
headings (e.g., "## Scope", "## Lifecycle", "## Exceptions", "## Non-deprecation
refactors", "## Why two versions") currently lack surrounding blank lines and
trigger markdownlint MD022; update the document by ensuring there is a blank
line before and after each section heading so every "## ..." heading is
separated from surrounding content by one empty line, then run the linter to
verify the MD022 warning is resolved.
| # Migration 001 must land first so the migrations table exists etc. | ||
| _apply_inline(conn) | ||
| _apply_numbered(conn, brain.dir) | ||
| summary = module.up(conn, tenant_id="unused") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
# Verify whether `_apply_numbered` includes 002 and how numbered migrations are ordered.
rg -n -C4 '\bdef _apply_numbered\b|\b_apply_numbered\(' Gradata/src/gradata
fd -a '^[0-9]{3}_.*\.py$' Gradata/src/gradata/_migrations
rg -n -C3 '002_add_event_identity' Gradata/src/gradata/_migrations Gradata/testsRepository: Gradata/gradata
Length of output: 1538
🏁 Script executed:
# Get full _apply_numbered implementation to check if it skips already-applied migrations
sed -n '127,150p' Gradata/src/gradata/_migrations/__init__.py
# Get the test file context around lines 37-40 to see _run_002() usage
sed -n '20,60p' Gradata/tests/test_migration_002_event_identity.py
# Find and show the _run_002 helper definition
rg -n -A20 'def _run_002' Gradata/tests/test_migration_002_event_identity.pyRepository: Gradata/gradata
Length of output: 3394
🏁 Script executed:
# Show migration 002 implementation
cat Gradata/src/gradata/_migrations/002_add_event_identity.py
# Show test assertions in test_migration_002_event_identity.py
sed -n '60,120p' Gradata/tests/test_migration_002_event_identity.pyRepository: Gradata/gradata
Length of output: 10933
Simplify the test helper to call migration 002 once, via the runner path.
The helper currently calls _apply_numbered(conn, brain.dir) (which marks 002 as applied) and then immediately calls module.up(...) directly, exercising both the runner and a bypass of it. While migration 002's up() is idempotent—columns use add_column_if_missing, indexes use create_index_if_missing, and backfill only processes WHERE event_id IS NULL—the test still diverges from production behavior. In production, the runner's has_applied() check prevents re-entry after the first successful run. Refactor _run_002() to remove the direct module.up() call and rely solely on _apply_numbered() to invoke the migration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_migration_002_event_identity.py` around lines 37 - 40, The
test helper _run_002 currently calls _apply_numbered(conn, brain.dir) and then
directly invokes module.up(conn, tenant_id="unused"), which bypasses the runner
path; modify _run_002 to only call _apply_numbered(conn, brain.dir) (so the
migration is executed via the runner and marked applied) and remove the direct
module.up(...) call, ensuring the runner's has_applied() logic is exercised and
the test reflects production behavior.
| def test_new_emit_leaves_identity_columns_null_for_now(tmp_path): | ||
| """emit() does not yet populate identity columns — only Migration 002 backfill does. | ||
|
|
||
| Wiring emit() to write event_id/device_id/content_hash is deferred; this | ||
| test pins the current contract so a future change flips it deliberately. | ||
| """ | ||
| brain = init_brain(tmp_path) | ||
| brain.emit("FRESH", "src", {"k": "v"}, []) | ||
|
|
||
| with _conn(brain) as conn: | ||
| row = conn.execute( | ||
| "SELECT event_id, device_id, content_hash FROM events WHERE type = 'FRESH'" | ||
| ).fetchone() | ||
| assert row == (None, None, None) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -e
# Verify whether the event insert path populates the new identity columns.
rg -n -C4 '\bdef emit\b|INSERT INTO events|event_id|device_id|content_hash' Gradata/src/gradataRepository: Gradata/gradata
Length of output: 50498
🏁 Script executed:
#!/bin/bash
# Find the main brain/emit implementation
find Gradata -name "*.py" -type f | xargs grep -l "def emit" | head -5Repository: Gradata/gradata
Length of output: 241
🏁 Script executed:
#!/bin/bash
# Find migration 002
find Gradata -name "*002*" -o -name "*migration*" | grep -E "\.py$|002"Repository: Gradata/gradata
Length of output: 214
🏁 Script executed:
#!/bin/bash
# Search specifically for event insertion logic in the main codebase (not tests)
rg -n "INSERT INTO events" Gradata/src --type py -A 5 -B 2Repository: Gradata/gradata
Length of output: 168
🏁 Script executed:
#!/bin/bash
# Examine the brain.emit() implementation
wc -l Gradata/src/gradata/brain.pyRepository: Gradata/gradata
Length of output: 92
🏁 Script executed:
#!/bin/bash
# Look at the emit() function in brain.py
rg -n "def emit" Gradata/src/gradata/brain.py -A 30Repository: Gradata/gradata
Length of output: 1132
🏁 Script executed:
#!/bin/bash
# Examine _events.py for emit implementation
cat Gradata/src/gradata/_events.py | head -100Repository: Gradata/gradata
Length of output: 3534
🏁 Script executed:
#!/bin/bash
# Look at migration 002
cat Gradata/src/gradata/_migrations/002_add_event_identity.pyRepository: Gradata/gradata
Length of output: 8904
🏁 Script executed:
#!/bin/bash
# Find the emit() function implementation in _events.py
rg -n "^def emit" Gradata/src/gradata/_events.py -A 100Repository: Gradata/gradata
Length of output: 8323
🏁 Script executed:
#!/bin/bash
# Search for event_id population in the insert logic
rg -n "event_id.*=" Gradata/src/gradata/_events.py | head -20Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Look at the SQLite insert statement in emit() - lines 254-300
sed -n '254,300p' Gradata/src/gradata/_events.pyRepository: Gradata/gradata
Length of output: 2087
🏁 Script executed:
#!/bin/bash
# Check if there's any code that populates event_id/device_id/content_hash during emit
rg -n "INSERT INTO events" Gradata/src/gradata/_events.py -A 10Repository: Gradata/gradata
Length of output: 41
Update emit() to populate identity columns before migration 002 is applied.
The test correctly documents current behavior but locks in a critical bug. Migration 002 backfills historical rows with event_id, device_id, and content_hash, but once marked applied, it never reruns. Meanwhile, emit() does not populate these columns during INSERT—it only provides ts, session, type, source, data_json, tags_json, valid_from, valid_until, tenant_id, and schema_version. All events inserted after migration 002 is applied will have NULL identity columns permanently, breaking the push/dedup/conflict logic this PR is shipping.
Either wire emit() to generate and populate these fields, or revise the migration contract so it remains idempotent and can backfill fresh events on each run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_migration_002_event_identity.py` around lines 171 - 184,
emit() currently omits the identity columns event_id, device_id, and
content_hash when inserting new events, which will leave post-migration rows
permanently NULL; update the emit() implementation so its INSERT populates
event_id (UUID or deterministic id), device_id (derive from supplied
session/source/device context), and content_hash (compute from data_json or
canonicalized payload) for every new event, ensuring the same column names are
written as Migration 002 backfills; alternatively, if you prefer the migration
approach, change Migration 002 to be idempotent and rerunnable (detect and
backfill only rows with NULL identity columns) so fresh inserts remain
consistent with the historical backfill contract.
| from pathlib import Path | ||
|
|
||
| from gradata._types import Lesson, LessonState | ||
| from gradata.brain import Brain | ||
| from gradata.enhancements.self_improvement import graduate | ||
| from tests.conftest import init_brain |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Move Brain behind TYPE_CHECKING.
Brain is only used in annotations here, so importing gradata.brain at runtime adds an unnecessary application import during test collection. Ruff is already flagging this as TC001.
♻️ Proposed cleanup
from pathlib import Path
+from typing import TYPE_CHECKING
from gradata._types import Lesson, LessonState
-from gradata.brain import Brain
from gradata.enhancements.self_improvement import graduate
from tests.conftest import init_brain
+
+if TYPE_CHECKING:
+ from gradata.brain import Brain🧰 Tools
🪛 Ruff (0.15.10)
[warning] 16-16: Move application import gradata.brain.Brain into a type-checking block
Move into type-checking block
(TC001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_rule_graduated_events.py` around lines 13 - 18, The import
of Brain is only used for type annotations and should be guarded by
TYPE_CHECKING to avoid importing gradata.brain at test collection; add "from
typing import TYPE_CHECKING" and move "from gradata.brain import Brain" into an
"if TYPE_CHECKING:" block, keeping the rest of the imports (Lesson, LessonState,
graduate, init_brain) unchanged and ensuring any annotations referencing Brain
remain valid (use a forward-reference string 'Brain' or rely on future
annotations if present).
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/_doctor.py`:
- Around line 279-302: The cloud doctor probes currently build probe URLs from
api_url/GRADATA_API_URL causing mismatches with env-only overrides; update the
probe functions _check_cloud_reachable, _check_cloud_auth, and
_check_cloud_has_data to call the central resolve_endpoint() (from
cloud/_credentials.py) to compute the effective endpoint and use that for all
probe URLs so the probes honor GRADATA_ENDPOINT / GRADATA_CLOUD_API_BASE and any
other resolution logic in resolve_endpoint().
- Around line 282-286: The disable flag in gradata._doctor (the local `disabled
= os.environ.get("GRADATA_CLOUD_SYNC_DISABLE", "").strip().lower() in (...)`
logic) duplicates parsing and disagrees with the canonical parser; replace this
ad-hoc logic by calling the shared utility
gradata.cloud._credentials.kill_switch_set() (importing it into gradata._doctor)
so that `"on"` and all other truthy variants are interpreted consistently with
the rest of the codebase.
In `@Gradata/src/gradata/_migrations/003_add_sync_state.py`:
- Around line 37-50: The migration currently creates sync_state with brain_id as
the primary key which collapses per-device watermarks; change SYNC_STATE_SQL to
define a composite primary key (tenant_id, device_id) or add a UNIQUE(tenant_id,
device_id) and include tenant_id and device_id columns in the table definition
so watermarks are per-device, and adjust NEW_COLUMNS as needed; then update the
upsert writers referenced (Gradata/src/gradata/cloud/push.py and
Gradata/src/gradata/cloud/_sync_state.py) to use ON CONFLICT (tenant_id,
device_id) instead of ON CONFLICT(brain_id) (and update any code that
sets/assumes brain_id as the PK to use the tenant_id/device_id pair or a
device-qualified brain_id).
- Around line 26-33: Ensure the migrations tracker exists before
checking/marking migration 003 by creating or ensuring the migrations table
up-front in _main(); call a helper (or add logic using resolve_brain_db +
table_exists/create operations) before invoking has_applied() and
mark_applied(). Specifically, in _main() (the function that calls has_applied()
and later mark_applied()), add a step that uses resolve_brain_db() and
table_exists() (or an explicit create-migrations step) and create the migrations
table if missing so mark_applied() never runs without the tracker present; use
the existing helpers (add_column_if_missing/create_index_if_missing) as needed
to create the table schema.
In `@Gradata/tests/test_embeddings_integration.py`:
- Around line 35-36: The nested patch.context managers around the test client
should be flattened: replace the nested with patch.object(client, "_embed_api",
return_value=[0.5, 0.6]) as mock_api: with patch.object(client, "_embed_local")
as mock_local: blocks by a single with statement using comma-separated context
managers (e.g., with patch.object(client, "_embed_api", return_value=[0.5, 0.6])
as mock_api, patch.object(client, "_embed_local") as mock_local:) for both
occurrences so the mocks for client._embed_api and client._embed_local are
created in one line and used in the test body.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6f42ec87-a591-4762-a489-d9457f006e8d
📒 Files selected for processing (7)
Gradata/src/gradata/_cloud_sync.pyGradata/src/gradata/_doctor.pyGradata/src/gradata/_migrations/003_add_sync_state.pyGradata/tests/test_cloud_credentials.pyGradata/tests/test_cloud_row_push.pyGradata/tests/test_doctor_cloud.pyGradata/tests/test_embeddings_integration.py
💤 Files with no reviewable changes (2)
- Gradata/tests/test_cloud_row_push.py
- Gradata/src/gradata/_cloud_sync.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/tests/test_doctor_cloud.pyGradata/src/gradata/_doctor.py
🪛 Ruff (0.15.10)
Gradata/tests/test_embeddings_integration.py
[warning] 35-36: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
[warning] 43-44: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
🔇 Additional comments (1)
Gradata/tests/test_embeddings_integration.py (1)
3-3: Good migration to the canonical embeddings module path.Line 3 and Line 29 are aligned with the service-layer move and keep test behavior unchanged.
Also applies to: 29-29
| with patch.object(client, "_embed_api", return_value=[0.5, 0.6]) as mock_api: | ||
| with patch.object(client, "_embed_local") as mock_local: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="$(fd -i '^test_embeddings_integration\.py$' | head -n1)"
python - <<'PY' "$FILE"
import re
import sys
from pathlib import Path
p = Path(sys.argv[1])
lines = p.read_text().splitlines()
for i in range(len(lines) - 1):
if re.match(r'^\s*with patch\.object\(.*\):\s*$', lines[i]) and re.match(r'^\s+with patch\.object\(.*\):\s*$', lines[i + 1]):
print(f"Nested patch context managers at Line {i+1}-{i+2}")
PYRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
fd -i 'test_embeddings_integration.py' --type fRepository: Gradata/gradata
Length of output: 103
🏁 Script executed:
sed -n '30,50p' Gradata/tests/test_embeddings_integration.pyRepository: Gradata/gradata
Length of output: 1015
🌐 Web query:
SIM117 flake8-simplify nested with statement
💡 Result:
SIM117 is a rule from the flake8-simplify plugin (also ported to Ruff as SIM117) that detects nested with statements using the same scope and suggests merging them into a single with statement with multiple context managers. This reduces indentation and improves readability, as supported in Python 3+. Example of violation: with A as a: with B as b: print("hello") Suggested fix: with A as a, B as b: print("hello") The rule was added via GitHub issue #35 and PR #36 in flake8-simplify, implemented by MartinThoma based on suggestion by Skylion007. It is documented in the plugin's README and PyPI page. Ruff's documentation notes exemptions for certain async context managers like those from anyio, asyncio, and trio when used standalone. Fixes may be unavailable if the merged line exceeds configured line length.
Citations:
- 1: https://github.com/MartinThoma/flake8-simplify
- 2: [SIM117] Merge with statements for context managers that have same scope MartinThoma/flake8-simplify#35
- 3: https://pypi.org/project/flake8_simplify/
- 4: https://docs.astral.sh/ruff/rules/multiple-with-statements/
Flatten nested patch context managers into a single with statement.
Lines 35-36 and 43-44 contain nested patch.object calls. Merge them using comma-separated context managers to improve readability and eliminate the SIM117 violation:
♻️ Proposed refactor
- with patch.object(client, "_embed_api", return_value=[0.5, 0.6]) as mock_api:
- with patch.object(client, "_embed_local") as mock_local:
- client.embed("test")
- mock_api.assert_called_once()
- mock_local.assert_not_called()
+ with patch.object(client, "_embed_api", return_value=[0.5, 0.6]) as mock_api, patch.object(
+ client, "_embed_local"
+ ) as mock_local:
+ client.embed("test")
+ mock_api.assert_called_once()
+ mock_local.assert_not_called()
@@
- with patch.object(client, "_embed_api", side_effect=Exception("down")):
- with patch.object(client, "_embed_local", return_value=[0.1]) as mock_local:
- client.embed("test")
- mock_local.assert_called_once()
+ with patch.object(client, "_embed_api", side_effect=Exception("down")), patch.object(
+ client, "_embed_local", return_value=[0.1]
+ ) as mock_local:
+ client.embed("test")
+ mock_local.assert_called_once()🧰 Tools
🪛 Ruff (0.15.10)
[warning] 35-36: Use a single with statement with multiple contexts instead of nested with statements
(SIM117)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_embeddings_integration.py` around lines 35 - 36, The
nested patch.context managers around the test client should be flattened:
replace the nested with patch.object(client, "_embed_api", return_value=[0.5,
0.6]) as mock_api: with patch.object(client, "_embed_local") as mock_local:
blocks by a single with statement using comma-separated context managers (e.g.,
with patch.object(client, "_embed_api", return_value=[0.5, 0.6]) as mock_api,
patch.object(client, "_embed_local") as mock_local:) for both occurrences so the
mocks for client._embed_api and client._embed_local are created in one line and
used in the test body.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (7)
Gradata/src/gradata/_migrations/_runner.py (1)
114-135:⚠️ Potential issue | 🟠 MajorPrevent dropping an index that is not attached to the target table.
At Line 122,
existing_uniquedefaults toFalse. If Line 114 finds a global index name match but that index is not listed fortableinPRAGMA index_list, Line 133 can still drop it and recreate it on the wrong table.Suggested fix
if index_exists(conn, index): if not unique: return False @@ - existing_unique = False + existing_unique: bool | None = None for _, idx_name, is_unique, *_rest in conn.execute( f"PRAGMA index_list({table})" ).fetchall(): if idx_name == index: existing_unique = bool(is_unique) break + if existing_unique is None: + raise sqlite3.OperationalError( + f"Index '{index}' exists but is not attached to table '{table}'" + ) if existing_unique: return False # IF EXISTS closes the TOCTOU window between index_exists() and the#!/bin/bash # Repro SQLite behavior showing why a table-ownership sentinel is required. python - <<'PY' import sqlite3 def create_index_if_missing_current(conn, index, table, columns, *, unique=False): def table_exists(c, t): return c.execute( "SELECT 1 FROM sqlite_master WHERE type='table' AND name=?", (t,) ).fetchone() is not None def index_exists(c, i): return c.execute( "SELECT 1 FROM sqlite_master WHERE type='index' AND name=?", (i,) ).fetchone() is not None if not table_exists(conn, table): return False if index_exists(conn, index): if not unique: return False existing_unique = False for _, idx_name, is_unique, *_ in conn.execute(f"PRAGMA index_list({table})").fetchall(): if idx_name == index: existing_unique = bool(is_unique) break if existing_unique: return False conn.execute(f"DROP INDEX IF EXISTS {index}") kw = "UNIQUE INDEX" if unique else "INDEX" conn.execute(f"CREATE {kw} {index} ON {table} ({columns})") return True conn = sqlite3.connect(":memory:") conn.execute("CREATE TABLE a(id INTEGER)") conn.execute("CREATE TABLE b(id INTEGER)") conn.execute("CREATE INDEX idx_shared ON b(id)") print("before:", conn.execute("SELECT name, tbl_name FROM sqlite_master WHERE type='index'").fetchall()) create_index_if_missing_current(conn, "idx_shared", "a", "id", unique=True) print("after: ", conn.execute("SELECT name, tbl_name FROM sqlite_master WHERE type='index'").fetchall()) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_migrations/_runner.py` around lines 114 - 135, The code can drop a same-named index from a different table because existing_unique defaults to False and the PRAGMA check only inspects indexes listed for the target table; change the logic to verify the index actually belongs to the target table before dropping it: after index_exists(conn, index) query sqlite_master for the index row (SELECT tbl_name FROM sqlite_master WHERE type='index' AND name=?), compare tbl_name to the target table, and only proceed to DROP INDEX IF EXISTS {index} (or return False/raise) when they match; alternatively set a boolean found_on_table when iterating PRAGMA index_list({table}) and only drop/recreate when found_on_table is true (and preserve the existing unique-check logic using existing_unique).Gradata/src/gradata/services/embeddings.py (3)
45-55:⚠️ Potential issue | 🔴 CriticalRestrict the localhost exception to HTTP(S).
Line 50 still returns
Truefor any scheme onlocalhost/127.0.0.1, so a misconfiguredGRADATA_API_URLlikefile://localhost/...bypasses the SSRF guard and will still carry the bearer token. Gate the localhost exception onparsed.scheme in {"http", "https"}.🔒 Proposed fix
parsed = urlparse(url) - if parsed.hostname in ("localhost", "127.0.0.1"): - return True host = parsed.hostname or "" + if host in ("localhost", "127.0.0.1"): + return parsed.scheme in {"http", "https"} return bool( parsed.scheme == "https" and (host == "gradata.ai" or host.endswith(".gradata.ai")) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/services/embeddings.py` around lines 45 - 55, The _is_trusted_url function currently treats any scheme as trusted for localhost/127.0.0.1; update the localhost check so it only returns True when parsed.hostname is "localhost" or "127.0.0.1" AND parsed.scheme is one of {"http", "https"} (use parsed.scheme and parsed.hostname variables), then keep the existing HTTPS check for gradata.ai and subdomains unchanged; this prevents non-HTTP schemes like file:// from bypassing the SSRF guard.
87-90:⚠️ Potential issue | 🟡 MinorFix the documented range for
semantic_similarity.
gradata._math.cosine_similarity()can return values in[-1.0, 1.0], so the current[0.0, 1.0]contract is incorrect.📝 Proposed fix
def semantic_similarity(self, text_a: str, text_b: str) -> float: """Cosine similarity between embeddings of two texts. - Returns 0.0 if either text is empty, otherwise [0.0, 1.0]. + Returns 0.0 if either text is empty, otherwise a value in [-1.0, 1.0]. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/services/embeddings.py` around lines 87 - 90, The docstring for semantic_similarity incorrectly states the similarity range as [0.0, 1.0]; update the triple-quoted docstring in the semantic_similarity method to reflect that it returns values in [-1.0, 1.0] (or 0.0 if either input is empty), and mention that it delegates to gradata._math.cosine_similarity so the contract matches that function's output.
111-116:⚠️ Potential issue | 🟠 MajorNormalize nullable descriptions before embedding.
lesson.get("description", "")does not protect the case where the key exists withNone; in local mode that reachestext.lower()and aborts clustering for the whole batch. Normalize to a string and keepNonefor blank values so the guard at Line 140 actually skips them.🛠️ Proposed fix
if not lessons: return [] if client is None: client = get_client() - vectors = [client.embed(l.get("description", "")) for l in lessons] + vectors = [] + for lesson in lessons: + desc = lesson.get("description") + desc = desc.strip() if isinstance(desc, str) else "" + vectors.append(client.embed(desc) if desc else None) parent = list(range(len(lessons))) rank = [0] * len(lessons)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/services/embeddings.py` around lines 111 - 116, In cluster_lessons_by_similarity, the list comprehension using l.get("description", "") allows non-string or empty values to reach the embedder and crash (e.g., None causing .lower()), so replace that comprehension with normalization: for each lesson get desc = lesson.get("description"); if desc is None keep None (so the later guard at Line 140 can skip it), otherwise convert to str(desc).strip() and if that yields an empty string set to None; then call client.embed only with the normalized non-None strings. Ensure you update the variable used later (vectors/descriptions) to reflect this normalization and preserve lesson ordering.Gradata/src/gradata/cloud/materializer.py (1)
355-365:⚠️ Potential issue | 🟠 MajorDefensively parse winner numerics before building
MaterializedRule.
RULE_CONFLICT_RESOLVEDcan supplywinning_event/winning_snapshotpayloads that bypass_apply_tier(). The directfloat(...)/int(...)casts here will still raise on malformed values and abort the whole pull/materialization.Suggested fix
for key, evt in winners.items(): data = evt.get("data") or {} + try: + confidence = float(data.get("confidence") or 0.0) + except (TypeError, ValueError): + confidence = 0.0 + try: + fire_count = int(data.get("fire_count") or 0) + except (TypeError, ValueError): + fire_count = 0 result.rules[key] = MaterializedRule( category=key[0], description=str(data.get("description") or ""), state=str(data.get("new_state") or ""), - confidence=float(data.get("confidence") or 0.0), - fire_count=int(data.get("fire_count") or 0), + confidence=confidence, + fire_count=fire_count, winning_event_ts=str(evt.get("ts") or ""), winning_device_id=str(data.get("device_id") or ""), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/cloud/materializer.py` around lines 355 - 365, The loop that constructs MaterializedRule from winners uses direct float(...) and int(...) casts which can raise on malformed values; update the construction in the for key, evt in winners.items() block to defensively parse confidence and fire_count (from evt.get("data") or {}) using safe conversion (e.g., try/except or helper functions like safe_float/safe_int) that return a default 0.0 / 0 on failure or non-numeric input, then pass those sanitized values into MaterializedRule(...); ensure winning_event_ts and winning_device_id are also coerced to strings safely as currently done.Gradata/tests/test_cloud_materializer.py (1)
385-388:⚠️ Potential issue | 🟡 MinorDrop the unused f-string prefix.
This literal has no interpolation, so Ruff will keep flagging
F541.Suggested fix
- ts=f"2026-04-20T00:30:00Z", + ts="2026-04-20T00:30:00Z",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/tests/test_cloud_materializer.py` around lines 385 - 388, The ts argument in the _evt(...) call is using an unnecessary f-string (ts=f"2026-04-20T00:30:00Z"); remove the leading "f" so it becomes a plain string literal while leaving any truly interpolated args (e.g., category=f"cat{k}") unchanged; update the _evt invocations in the test where ts is a static timestamp to eliminate the unused f-string prefix.Gradata/src/gradata/cloud/pull.py (1)
198-210:⚠️ Potential issue | 🟠 MajorReject malformed
events/watermark/end_of_streamvalues instead of coercing them.
parsed.get("events") or []silently accepts falsy non-lists like""/0, andbool(parsed.get("end_of_stream", True))turns"false"intoTrue. A non-scalarwatermarkalso flows into pagination/persistence. Those 200s should still return{"status": "error", "reason": "malformed_response"}rather than looking successful.Suggested fix
- page_events = parsed.get("events") or [] - if not isinstance(page_events, list) or any( + if "events" in parsed: + page_events = parsed["events"] + else: + page_events = [] + if not isinstance(page_events, list) or any( not isinstance(evt, dict) for evt in page_events ): log.warning("events/pull: events field is not a list of objects") summary["status"] = "error" summary["reason"] = "malformed_response" return summary all_events.extend(page_events) page_watermark = parsed.get("watermark") - if page_watermark: - final_watermark = page_watermark - end_of_stream = bool(parsed.get("end_of_stream", True)) + if page_watermark is not None and not isinstance( + page_watermark, (str, int, float) + ): + log.warning("events/pull: watermark is not scalar") + summary["status"] = "error" + summary["reason"] = "malformed_response" + return summary + eos = parsed.get("end_of_stream", True) + if not isinstance(eos, bool): + log.warning("events/pull: end_of_stream is not boolean") + summary["status"] = "error" + summary["reason"] = "malformed_response" + return summary + if page_watermark is not None: + final_watermark = str(page_watermark) + end_of_stream = eos🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/cloud/pull.py` around lines 198 - 210, The code currently coerces falsy/non-boolean values; instead validate types strictly: when reading parsed.get("events") check whether "events" is present and is a list and every item is a dict (use the page_events variable and the existing any(...) check) and if not, set summary["status"]="error" and summary["reason"]="malformed_response" and return; for watermark, read parsed.get("watermark") into page_watermark and reject (error/return) if it is present but not a scalar type you expect (e.g., not instance of (str, int, float)); for end_of_stream, do not coerce with bool(...): if "end_of_stream" in parsed then require isinstance(parsed["end_of_stream"], bool) and assign to end_of_stream, otherwise treat it as missing and continue; update uses of final_watermark and all_events accordingly so malformed inputs are rejected rather than coerced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 141-145: The current try/except around parsing e_conf and i_conf
causes both to be reset to 0.0 if either cast fails; change this to parse each
confidence independently (e.g., separate try/except blocks or a small helper
like safe_parse_float) so that a malformed e_data["confidence"] does not wipe
out i_conf and vice versa; update the parsing for e_conf and i_conf (referencing
e_data, i_data, e_conf, i_conf) inside the same function in materializer.py to
preserve the valid confidence value when the other fails.
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 151-159: The User-Agent is hard-coded in the Request construction
(see req variable in pull.py) which causes incorrect SDK version reporting;
change the header to use a central version constant or package metadata instead
of "gradata-sdk/0.6" (e.g., import the shared VERSION or __version__ used
elsewhere in the package and set User-Agent to f"gradata-sdk/{VERSION}") so it
automatically reflects the current release; update the import at the top of the
module to reference that constant and use it in the headers for the Request
creation.
In `@Gradata/src/gradata/cloud/push.py`:
- Around line 52-73: _fetch_events_since currently returns events across all
tenants which can cause cross-tenant uploads; add a tenant_id parameter to
_fetch_events_since(conn, last_event_id, limit, tenant_id) and include a WHERE
condition filtering on tenant_id in the SELECT (e.g. "AND tenant_id = ?") and
pass tenant_id as a bound param before the LIMIT; update every call site that
invokes _fetch_events_since (including the batch-stamping callsite that later
sets tenant_id) to pass the active tenant_id so the query only returns rows for
that tenant.
- Around line 199-205: Validate push_pending_events input parameters
(backoff_base, timeout, max_retries, chunk_size) at the top of the function
before any network or sleep calls and return a structured error summary dict
instead of letting exceptions propagate; specifically check backoff_base >= 0,
timeout >= 0, max_retries >= 0 (and chunk_size > 0 if applicable), and if any
check fails return the same summary shape used on other failures with an error
message and non-success status. Update push_pending_events to perform these
checks before entering the try/except that currently only catches sqlite3.Error
so that invalid inputs never reach time.sleep or urllib.request.urlopen and
violate the "never raise" contract.
In `@Gradata/src/gradata/services/embeddings.py`:
- Around line 65-68: The code currently returns data["embedding"] directly;
update the embedding-returning method (the function that builds req =
Request(str(self.api_url), ...) and currently returns data["embedding"]) to
validate the payload: confirm the "embedding" key exists, that its value is a
list/sequence of numeric types, and that its length matches the expected
dimensionality (use the class field used elsewhere, e.g., self.dim or the
expected_dim variable) and raise a ValueError (or similar) on any mismatch so
callers (like embed()) will trigger the existing fallback path instead of
failing later in clustering/similarity.
---
Duplicate comments:
In `@Gradata/src/gradata/_migrations/_runner.py`:
- Around line 114-135: The code can drop a same-named index from a different
table because existing_unique defaults to False and the PRAGMA check only
inspects indexes listed for the target table; change the logic to verify the
index actually belongs to the target table before dropping it: after
index_exists(conn, index) query sqlite_master for the index row (SELECT tbl_name
FROM sqlite_master WHERE type='index' AND name=?), compare tbl_name to the
target table, and only proceed to DROP INDEX IF EXISTS {index} (or return
False/raise) when they match; alternatively set a boolean found_on_table when
iterating PRAGMA index_list({table}) and only drop/recreate when found_on_table
is true (and preserve the existing unique-check logic using existing_unique).
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 355-365: The loop that constructs MaterializedRule from winners
uses direct float(...) and int(...) casts which can raise on malformed values;
update the construction in the for key, evt in winners.items() block to
defensively parse confidence and fire_count (from evt.get("data") or {}) using
safe conversion (e.g., try/except or helper functions like safe_float/safe_int)
that return a default 0.0 / 0 on failure or non-numeric input, then pass those
sanitized values into MaterializedRule(...); ensure winning_event_ts and
winning_device_id are also coerced to strings safely as currently done.
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 198-210: The code currently coerces falsy/non-boolean values;
instead validate types strictly: when reading parsed.get("events") check whether
"events" is present and is a list and every item is a dict (use the page_events
variable and the existing any(...) check) and if not, set
summary["status"]="error" and summary["reason"]="malformed_response" and return;
for watermark, read parsed.get("watermark") into page_watermark and reject
(error/return) if it is present but not a scalar type you expect (e.g., not
instance of (str, int, float)); for end_of_stream, do not coerce with bool(...):
if "end_of_stream" in parsed then require isinstance(parsed["end_of_stream"],
bool) and assign to end_of_stream, otherwise treat it as missing and continue;
update uses of final_watermark and all_events accordingly so malformed inputs
are rejected rather than coerced.
In `@Gradata/src/gradata/services/embeddings.py`:
- Around line 45-55: The _is_trusted_url function currently treats any scheme as
trusted for localhost/127.0.0.1; update the localhost check so it only returns
True when parsed.hostname is "localhost" or "127.0.0.1" AND parsed.scheme is one
of {"http", "https"} (use parsed.scheme and parsed.hostname variables), then
keep the existing HTTPS check for gradata.ai and subdomains unchanged; this
prevents non-HTTP schemes like file:// from bypassing the SSRF guard.
- Around line 87-90: The docstring for semantic_similarity incorrectly states
the similarity range as [0.0, 1.0]; update the triple-quoted docstring in the
semantic_similarity method to reflect that it returns values in [-1.0, 1.0] (or
0.0 if either input is empty), and mention that it delegates to
gradata._math.cosine_similarity so the contract matches that function's output.
- Around line 111-116: In cluster_lessons_by_similarity, the list comprehension
using l.get("description", "") allows non-string or empty values to reach the
embedder and crash (e.g., None causing .lower()), so replace that comprehension
with normalization: for each lesson get desc = lesson.get("description"); if
desc is None keep None (so the later guard at Line 140 can skip it), otherwise
convert to str(desc).strip() and if that yields an empty string set to None;
then call client.embed only with the normalized non-None strings. Ensure you
update the variable used later (vectors/descriptions) to reflect this
normalization and preserve lesson ordering.
In `@Gradata/tests/test_cloud_materializer.py`:
- Around line 385-388: The ts argument in the _evt(...) call is using an
unnecessary f-string (ts=f"2026-04-20T00:30:00Z"); remove the leading "f" so it
becomes a plain string literal while leaving any truly interpolated args (e.g.,
category=f"cat{k}") unchanged; update the _evt invocations in the test where ts
is a static timestamp to eliminate the unused f-string prefix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9aee0fca-35ea-4a54-8834-7760ddcca719
📒 Files selected for processing (8)
Gradata/src/gradata/_doctor.pyGradata/src/gradata/_migrations/_runner.pyGradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/src/gradata/cloud/push.pyGradata/src/gradata/services/embeddings.pyGradata/tests/test_cloud_events_push.pyGradata/tests/test_cloud_materializer.py
📜 Review details
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/src/gradata/_doctor.pyGradata/tests/test_cloud_materializer.pyGradata/tests/test_cloud_events_push.py
🪛 Ruff (0.15.10)
Gradata/tests/test_cloud_materializer.py
[error] 387-387: f-string without any placeholders
Remove extraneous f prefix
(F541)
Gradata/src/gradata/cloud/materializer.py
[warning] 35-35: Import from collections.abc instead: Iterable
Import from collections.abc
(UP035)
[warning] 369-375: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🔇 Additional comments (9)
Gradata/src/gradata/_doctor.py (7)
24-24: LGTM!The
urllib.parseimport is correctly added to support theurlsplit()call in_check_cloud_reachable().
253-274: LGTM!The error messages correctly guide users to the new
gradata cloud enableCLI command, consistent with the legacy login/logout removal documented in the PR objectives.
277-296: LGTM!The
_resolve_doctor_endpoint()function correctly delegates toresolve_endpoint()from_credentials.py, ensuring doctor probes honor the same endpoint precedence chain as the rest of the SDK. The defensive exception handling with fallback is appropriate for a diagnostics tool.
299-332: LGTM!The function correctly imports and uses
kill_switch_set()from the credentials module, with a fallback that exactly matches the canonical parser's truthy set ("1","true","yes","on"). This ensures consistent behavior between doctor checks and the rest of the SDK.
335-353: LGTM!The function correctly uses
urlsplit()to parse the endpoint URL and extract the actual port, ensuring that custom endpoints likehttps://host:8443/api/v1are probed at the correct port rather than always defaulting to:443.
379-405: LGTM!The auth check now uses
_resolve_doctor_endpoint()for consistent endpoint resolution, and the 401/403 error message correctly directs users togradata cloud enable.
408-443: LGTM!The data check now uses
_resolve_doctor_endpoint(), completing the consistent endpoint resolution across all cloud probe functions.Gradata/src/gradata/_migrations/_runner.py (2)
43-44: Behavior-preserving cleanup looks good.The compact
execute(...).fetchone()form keeps the same semantics and improves readability.
105-111: Docstring clarification is solid.Good addition explaining why UNIQUE backing is required for
ON CONFLICTtargets.
Addresses 12 review items across two incremental CR rounds. Major correctness / schema: - _runner.create_index_if_missing: verify index belongs to target table via sqlite_master before dropping. Prevents an index of the same name on a different table from being silently dropped during the UNIQUE upgrade. - push._fetch_events_since: thread tenant_id into the SELECT with "(tenant_id = ? OR tenant_id IS NULL)" so foreign-tenant rows are never uploaded under the current tenant's identity. Legacy NULL rows still push so pre-migration work is not stranded. - push.push_pending_events: validate chunk_size / max_retries / backoff_base / timeout up front and return reason=invalid_params instead of letting negative values reach time.sleep or the network loop. - materializer._apply_tier: parse e_conf / i_conf independently; a bad cast on one side no longer resets both to 0.0 and mask legitimate Tier 2 drift as zero-delta Tier 1. - services.embeddings._embed_api: validate response shape (dict, embedding list, dim match, numeric entries). Malformed payloads raise ValueError so embed() falls through to the local path instead of poisoning the LRU cache. Minor hardening: - materializer: defensive numeric coercion in the MaterializedRule builder so a non-numeric confidence/fire_count cannot abort the whole pass. - materializer: collapse _MATERIALIZER_EVENT_TYPES duplicate into the existing _MATERIALIZABLE_EVENT_TYPES source of truth. - materializer: switch Iterable import to collections.abc. - services.embeddings.semantic_similarity: docstring range is [-1, 1]. - services.embeddings.cluster_lessons_by_similarity: skip blank/whitespace descriptions instead of embedding empty strings. - pull/push: User-Agent now embeds the resolved SDK version rather than the hardcoded 0.6 placeholder. - tests/test_cloud_materializer.py: drop extraneous f"" prefix on literal ts. New regression tests (+13): - test_cloud_events_push: invalid-tunables short-circuit, foreign-tenant rows are filtered, legacy NULL-tenant rows still push. - test_migration_runner_helpers: cross-table index drop refused, same-table non-UNIQUE -> UNIQUE upgrade works, idempotent when already UNIQUE. - test_embeddings: API shape validation (missing key, wrong dim, non-numeric) raises; embed() falls back to local on malformed response; cluster skips blank descriptions. - test_cloud_materializer: independent confidence parse still detects drift; materialize survives non-numeric winner fields. Full suite: 4036 passed, 2 skipped. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 97-108: The category normalization in the materializer currently
uses .lower() (variable category) which mismatches the apply path’s .upper() and
causes failed lookups; update the normalization so category =
str(data.get("category") or "").strip().upper() (keeping pattern_hash,
description and ident logic unchanged) so the returned tuple (category, ident)
matches the keys built in the apply path and prevents duplicate lesson creation.
- Around line 383-389: The exported names in the __all__ list
(CONFLICT_THRESHOLD, Conflict, MaterializedRule, MaterializeResult, materialize)
are not alphabetically sorted; update the __all__ assignment so the strings are
ordered lexicographically (e.g., "CONFLICT_THRESHOLD", "Conflict",
"MaterializedRule", "MaterializeResult", "materialize") to satisfy
static-analysis/style checks and maintain consistency.
In `@Gradata/src/gradata/cloud/push.py`:
- Around line 287-289: tenant_for(brain) and get_or_create_device_id(brain) are
invoked unguarded inside push_pending_events which violates the "never raises"
contract; wrap the calls to tenant_for and get_or_create_device_id in a
try/except that catches OSError (or a broader Exception if necessary), log the
error with context (including brain) using the module logger, and return/exit
push_pending_events gracefully (or skip pushing events) rather than allowing the
exception to propagate; update the code paths around tenant_id and device_id
initialization in push_pending_events to rely on the guarded values so no file
I/O errors escape.
In `@Gradata/src/gradata/services/embeddings.py`:
- Around line 113-120: get_client() has a race when multiple threads call it
concurrently and may create multiple EmbeddingClient instances; add a
module-level threading.Lock (e.g., _default_client_lock) and use a
double-checked locking pattern in get_client: if _default_client is None,
acquire the lock, check again, then instantiate EmbeddingClient and assign to
_default_client; ensure you import threading and only hold the lock briefly
while constructing/assigning to avoid contention.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 81374656-a56f-4359-a2d4-58a70d836403
📒 Files selected for processing (9)
Gradata/src/gradata/_migrations/_runner.pyGradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/src/gradata/cloud/push.pyGradata/src/gradata/services/embeddings.pyGradata/tests/test_cloud_events_push.pyGradata/tests/test_cloud_materializer.pyGradata/tests/test_embeddings.pyGradata/tests/test_migration_runner_helpers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/src/gradata/cloud/push.pyGradata/tests/test_cloud_materializer.pyGradata/tests/test_cloud_events_push.py
🪛 Ruff (0.15.10)
Gradata/src/gradata/cloud/materializer.py
[warning] 383-389: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🔇 Additional comments (45)
Gradata/src/gradata/_migrations/_runner.py (3)
43-43: LGTM!Formatting cleanup that collapses the query onto a single line while preserving the same
row is None/row is not Nonelogic.
97-111: Clean API extension with backward compatibility.The keyword-only
uniqueparameter maintains full backward compatibility with existing migration callers (all current usages in 001/002/003 use defaultunique=False). The docstring clearly documents the SQLite-specific constraint requirement.
114-145: All prior review issues addressed; logic is correct.The implementation properly handles the three flagged scenarios:
- Cross-table guard (lines 119–124): Queries
sqlite_master.tbl_nameto refuse mutation when the index belongs to a different table.- Reliable UNIQUE detection (lines 127–138): Uses
PRAGMA index_listrather than substring matching on DDL text.- TOCTOU window (line 143):
DROP INDEX IF EXISTScloses the race between existence check and drop.The logic flow is sound—after confirming table ownership, the
existing_uniqueflag from PRAGMA is authoritative.Gradata/tests/test_migration_runner_helpers.py (3)
16-39: Solid regression test for the cross-table bug.This test directly reproduces the scenario flagged in prior reviews—an index with the same name on a different table must not be dropped. The assertions verify both the return value and the index's continued ownership.
42-58: Good coverage of the happy-path upgrade.Verifies the non-UNIQUE → UNIQUE upgrade still works on the same table. The
for/elsepattern withAssertionErrorensures the test fails explicitly if the index disappears.
61-69: Covers the idempotent no-op case.Ensures the function returns
Falsewithout mutation when the index is already UNIQUE.Consider adding a simple test for the base case (index doesn't exist, gets created) for completeness, though the core regression scenarios are well covered.
[approve_code_changes, suggest_optional_refactor]Gradata/src/gradata/cloud/pull.py (6)
72-76: LGTM —no_dbstatus now properly returned.The status handling for missing
system.dbcorrectly returns"no_db"as the stable status value, matching the documented contract.
99-111: LGTM — Endpoint resolution now uses the shared resolver.The code correctly routes through
_creds.resolve_endpoint()soGRADATA_ENDPOINT/GRADATA_CLOUD_API_BASEenv overrides take effect consistently with push and CLI.
191-206: LGTM — Response shape validation is thorough.The code now validates that
parsedis a dict and thateventsis a list of dicts before proceeding, preventing crashes from malformed server responses.
266-292: LGTM — Watermark persistence failure now blocks success telemetry.The code correctly captures the return value of
update_pull_cursor(), checks for persistence failure, and returns an error status before emittingCLOUD_SYNC_COMPLETED. This prevents callers from treating the run as successful when the cursor wasn't persisted.
247-264: Apply path only writes lessons when there are rules or conflicts to merge.When
apply=Truebut the materialized result has neither rules nor conflicts (e.g., page contained only tombstones or events from other devices), the lessons file is left untouched but the watermark is still persisted (lines 271-292). This is correct behavior per the comment on lines 266-270.
294-312: Telemetry emission is best-effort and doesn't affect return status.The
CLOUD_SYNC_COMPLETEDevent emission is wrapped in a try/except that logs failures without changing the summary status. This is appropriate for telemetry that shouldn't block the success path.Gradata/tests/test_cloud_materializer.py (6)
23-46: LGTM — Test helper is well-structured.The
_evt()helper provides sensible defaults for all required fields while allowing tests to override specific values. The conditionalpattern_hashinclusion keeps test events minimal.
96-128: Good Tier 2 conflict coverage.The tests correctly validate:
- Large confidence drift (≥0.15) triggers conflict
- State disagreement triggers conflict
- Post-conflict events are held (not advancing state)
The assertion
result.events_skipped >= 1on line 128 appropriately allows for implementation flexibility.
131-241: Comprehensive RULE_CONFLICT_RESOLVED test coverage.The tests cover all resolution paths:
- Resolution with matching graduation history
- Resolution with embedded
winning_event(incremental pull case)- Resolution with
winning_snapshotreconstruction- Mismatched
winning_tsclears hold but allows subsequent graduationThis aligns with the materializer's fallback chain implementation.
338-404: Convergence property tests are valuable ship-gate coverage.The tests verify that 200+ shuffled orderings yield identical final state after sorting by
ts. The second test (test_convergence_with_injected_conflicts) validates that conflict keys remain stable across permutations—important for deterministic Tier 2 behavior.
407-449: Good regression coverage for malformed confidence handling.
test_independent_confidence_parse_still_detects_driftvalidates that a malformed confidence on one side doesn't zero out the other side's valid confidence.test_materialize_survives_non_numeric_confidence_on_winnerconfirms the defensive parsing in the winner-building loop.
333-335: LGTM — Threshold constant pinned to spec.This assertion ensures
CONFLICT_THRESHOLDstays at 0.15 per the spec, catching accidental changes.Gradata/src/gradata/cloud/materializer.py (6)
141-151: LGTM — Independent confidence parsing implemented.Each confidence value is now parsed in its own try/except block, preventing a malformed value on one side from zeroing out the other side's valid confidence. This addresses the previously flagged Tier 2 bypass issue.
241-255: LGTM — Event type gating is correct.Non-materializable event types are properly skipped (lines 241-246), and
META_RULE_SYNTHESIZEDis explicitly held per spec §4 (lines 247-255). This prevents non-graduation events from influencing winner selection.
267-321: LGTM — RULE_CONFLICT_RESOLVED handles all resolution paths.The implementation correctly:
- Looks up matching history by
(winning_ts, winning_device_id)- Falls back to embedded
winning_eventfor incremental pulls- Falls back to
winning_snapshotreconstruction- Clears the hold even if no winner can be reconstructed (allowing next graduation to materialize)
357-379: LGTM — Defensive numeric coercion in winner building.The try/except blocks around
confidenceandfire_countparsing prevent malformed data from crashing the entire materialization pass.
170-204: LGTM — DB event loading is robust.The function handles missing DB files, SQLite errors, and malformed JSON gracefully by returning partial results rather than raising. The comment on lines 181-184 explains the consolidation of event type sets.
351-355: LGTM — Conflict pruning prevents stale events.Conflicts resolved later in the same stream are correctly filtered out before returning, preventing callers from emitting
RULE_CONFLICTevents for already-adjudicated keys.Gradata/src/gradata/cloud/push.py (5)
53-123: LGTM!The
_fetch_events_sincefunction correctly implements tenant-scoped event fetching with proper handling of legacyNULLtenant rows, safe JSON parsing with fallbacks, and correct parameter binding order.
126-139: LGTM!The watermark read correctly queries by the
(tenant_id, device_id)composite key and handles missing table/row gracefully.
142-170: LGTM!The watermark upsert correctly uses
ON CONFLICT(tenant_id, device_id)to target the composite unique index, maintaining per-device isolation while preservingbrain_idfor backward compatibility.
173-207: LGTM!The batch POST implementation correctly handles retry logic with exponential backoff, distinguishes between fatal 4xx and retriable 5xx/transport errors, and includes proper headers including a versioned User-Agent.
291-368: LGTM!The database workflow is correctly wrapped with
sqlite3.Errorhandling, the watermark only advances after full batch acceptance, and the connection is properly closed in thefinallyblock. The partial 2xx safeguards (rejected list and accepted count mismatch) are well-implemented.Gradata/tests/test_cloud_events_push.py (6)
24-40: LGTM!The isolation fixture correctly rebinds keyfile paths and clears all relevant environment variables including the new endpoint overrides.
84-96: Test schema relies on production_ensure_sync_state_schemafor composite key index.The test fixture creates
sync_statewithout theUNIQUE INDEX idx_sync_state_tenant_devicethatpush.pyrequires forON CONFLICT(tenant_id, device_id). This works becausepush_pending_eventscalls_ensure_sync_state_schema()before any watermark operations, which idempotently adds the missing index. This is the intended behavior per the comment in_sync_state.py("lets a callsite use the cursor API on brains that predate 003").No change needed, but documenting this coupling for future maintainers.
138-153: LGTM!The mock response class correctly implements the context manager protocol and the default empty body aligns with the push client's backward-compatible handling of missing
acceptedkeys.
189-218: LGTM!Excellent end-to-end test that verifies the complete happy path including request body structure, summary counters, and watermark persistence in the database.
358-409: LGTM!The partial 2xx tests provide essential regression coverage for the watermark-advancement gate. Both scenarios (rejected list and count mismatch) correctly assert that the watermark remains unset, preventing permanently skipped events.
434-516: LGTM!The tenant isolation tests correctly verify that foreign tenant rows are excluded and legacy
NULLtenant rows are included, matching the_fetch_events_sinceSQL logic in the production code.Gradata/src/gradata/services/embeddings.py (6)
50-51: Localhost SSRF guard still allows arbitrary schemes.The
_is_trusted_urlmethod returnsTruefor localhost/127.0.0.1 without validating the scheme. A URL likefile://localhost/etc/passwdwould pass the trust check. Restrict localhost to safe schemes.🔒 Proposed fix
- if parsed.hostname in ("localhost", "127.0.0.1"): - return True + if parsed.hostname in ("localhost", "127.0.0.1"): + return parsed.scheme in {"http", "https"}
1-28: LGTM!Module structure, imports, and constants are well-organized with clear documentation.
57-80: LGTM!The API response validation is thorough — correctly rejects non-dict responses, wrong dimensions, non-numeric entries, and booleans. Exceptions properly trigger the local embedding fallback.
82-97: LGTM!Good use of MD5 for deterministic trigram hashing with clear documentation explaining why the built-in
hash()is unsuitable. The normalization correctly handles edge cases.
123-171: LGTM!Clean union-find implementation with path splitting and union-by-rank for near-O(α(n)) amortized complexity. The blank-description filtering avoids unnecessary embedding calls, and the pairwise similarity check is appropriate for single-linkage clustering.
174-225: LGTM!Thread-safe LRU cache implementation with proper lock handling — released during the blocking
embed()call to avoid serializing the thread pool. Themove_to_end()on cache hits ensures true LRU eviction.Gradata/tests/test_embeddings.py (4)
1-31: LGTM!Import updates correctly reflect the module relocation to
gradata.services.embeddings. Existing similarity tests are preserved.
33-45: LGTM!Clean context manager helper for mocking
urlopenresponses in tests.
47-88: LGTM!Good coverage of malformed API response scenarios including missing keys, wrong dimensions, non-numeric values, and the local fallback path.
91-104: LGTM!Effective test for blank-description handling — confirms both empty strings and whitespace-only strings are excluded from clustering while identical descriptions properly merge.
Addresses 4 items from CR review at 16:53:31Z (commit 0705be6). - materializer._rule_key: normalize category to upper() instead of lower(), matching the apply path (_apply_materialized.py) and the lessons.md convention. Functionally equivalent today — both sides uppercase on lookup — but aligns MaterializedRule.category with Lesson.category so external readers no longer need to re-normalize. - materializer.__all__: alphabetical sort (Ruff RUF022). - push.push_pending_events: guard tenant_for() / get_or_create_device_id() with try/except OSError so a corrupted brain dir returns a summary instead of raising — the public contract is "never raises". - pull.pull_events: same identity-resolution guard for symmetry. - embeddings.get_client: double-checked locking around _default_client initialization so concurrent async event-bus handlers can't race into creating two clients. New regression tests (+2): - test_identity_resolution_oserror_returns_summary: forces OSError out of tenant_for() and asserts summary.reason == "identity_error". - test_get_client_is_threadsafe_singleton: 16 threads racing on the first get_client() call all receive the same instance. Full suite: 4038 passed, 2 skipped. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
Gradata/src/gradata/cloud/pull.py (1)
219-222:⚠️ Potential issue | 🟠 MajorReject non-boolean
end_of_streamand non-stringwatermark.The response-shape validation is still incomplete here:
bool(parsed.get("end_of_stream", True))turns values like"false"intoTrue, and a dict/listwatermarkwill later be stringified and persisted. These should fail asmalformed_responseinstead of being coerced.Proposed fix
all_events.extend(page_events) page_watermark = parsed.get("watermark") + if page_watermark is not None and not isinstance(page_watermark, str): + log.warning("events/pull: watermark field is not a string") + summary["status"] = "error" + summary["reason"] = "malformed_response" + return summary + page_end_of_stream = parsed.get("end_of_stream", True) + if not isinstance(page_end_of_stream, bool): + log.warning("events/pull: end_of_stream field is not a boolean") + summary["status"] = "error" + summary["reason"] = "malformed_response" + return summary if page_watermark: final_watermark = page_watermark - end_of_stream = bool(parsed.get("end_of_stream", True)) + end_of_stream = page_end_of_stream pages_fetched += 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/cloud/pull.py` around lines 219 - 222, The code coercing response fields is accepting wrong types: instead of bool(parsed.get("end_of_stream", True)) and implicitly stringifying watermark, validate types explicitly — check parsed.get("end_of_stream") is a bool and parsed.get("watermark") is a str (or absent) before assigning to end_of_stream and final_watermark (symbols: parsed, end_of_stream, page_watermark, final_watermark); if either type check fails, raise or return the existing malformed_response error path (use the same malformed_response mechanism used elsewhere) so non-boolean end_of_stream and non-string watermark are rejected rather than coerced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 201-206: The loop that parses rows currently appends any valid
JSON value (in for loop over rows) which can be non-object types and later
breaks functions like _rule_key() and materialize(db_path=...) that expect
evt["data"].get(); after json.loads(data_json) in the loop (the block handling
data = json.loads(...)), check that the decoded value is a mapping/dict (e.g.,
isinstance(data, dict)) and skip (continue) non-dict values so only object JSON
rows are appended to out.
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 173-175: The code calls resp.read().decode("utf-8") which can
raise UnicodeDecodeError and bypass the JSON/error-guarding logic; wrap the byte
read and decode (the resp.read() / .decode("utf-8") operation around the
urllib.request.urlopen block) in a try/except that catches UnicodeDecodeError
and handles it exactly like a malformed_response (e.g., populate the stable
summary/error state and return or raise the same malformed_response path), and
apply the same change to the second occurrence of resp.read().decode("utf-8")
around the block referenced at lines 194-200 so both places consistently treat
bad UTF-8 as malformed_response.
In `@Gradata/tests/test_cloud_events_push.py`:
- Around line 215-218: Replace the manual open/close pattern with a context
manager: use "with sqlite3.connect(brain / 'system.db') as conn:" and run wm =
conn.execute("SELECT last_push_event_id FROM sync_state").fetchone() inside that
block (no explicit conn.close()), then assert wm[0] ==
"01HN000000000000000000000C"; this makes the code consistent with other tests
that use the same pattern.
---
Duplicate comments:
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 219-222: The code coercing response fields is accepting wrong
types: instead of bool(parsed.get("end_of_stream", True)) and implicitly
stringifying watermark, validate types explicitly — check
parsed.get("end_of_stream") is a bool and parsed.get("watermark") is a str (or
absent) before assigning to end_of_stream and final_watermark (symbols: parsed,
end_of_stream, page_watermark, final_watermark); if either type check fails,
raise or return the existing malformed_response error path (use the same
malformed_response mechanism used elsewhere) so non-boolean end_of_stream and
non-string watermark are rejected rather than coerced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e1a8adfb-da09-4dc2-8c29-1a8489442092
📒 Files selected for processing (6)
Gradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/src/gradata/cloud/push.pyGradata/src/gradata/services/embeddings.pyGradata/tests/test_cloud_events_push.pyGradata/tests/test_embeddings.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/src/gradata/cloud/push.pyGradata/tests/test_cloud_events_push.py
🔇 Additional comments (28)
Gradata/src/gradata/services/embeddings.py (7)
50-51: Localhost SSRF guard still accepts any scheme.The localhost check returns
Truefor any URL scheme, including potentially dangerous ones likefile://localhost/etc/passwd. This concern was already raised in a previous review.
1-28: LGTM!Module structure is clean. The import of
cosine_similarityat line 27 after the constants is slightly unconventional (typically all imports are grouped at the top), but this is a minor style preference and doesn't affect correctness.
57-80: LGTM!Thorough validation of API response shape. The explicit
not isinstance(v, bool)check correctly handles Python's bool-is-int subtyping quirk. Any validation failure surfaces as an exception that the outerembed()catches for clean fallback.
82-97: LGTM!The MD5 usage is correctly justified in the comment—avoiding
PYTHONHASHSEEDnon-determinism. Short texts (< 3 chars) produce zero vectors, whichcosine_similarityhandles gracefully by returning0.0.
110-129: LGTM!Double-checked locking pattern correctly implemented. The outer check avoids lock contention on the hot path after initialization; the inner check prevents duplicate instantiation under concurrent first-call races.
132-180: LGTM!Union-find implementation is well-optimized with path splitting and union-by-rank for near-O(α(n)) amortized operations. Blank descriptions correctly produce
Nonevectors and remain as singleton clusters.
183-234: LGTM!Thread-safe LRU cache implementation is solid. The lock is correctly released during the
embed()call to avoid serializing the thread pool on network I/O. Themove_to_end()on cache hits implements true LRU semantics, and no redundant call is made on inserts sinceOrderedDict.__setitem__already places new keys at the end.Gradata/tests/test_embeddings.py (5)
3-9: LGTM!Import updates correctly reference the relocated
gradata.services.embeddingsmodule.
33-44: LGTM!Clean test helper that correctly implements the context manager protocol needed to mock
urlopenresponses.
47-88: LGTM!Comprehensive coverage of
_embed_apivalidation paths. The tests correctly verify that malformed responses raiseValueErrorand thatembed()gracefully falls back to local embeddings.
91-104: LGTM!Good coverage of blank-description handling. The assertion correctly expects two singleton clusters for the blank entries and one cluster of size 2 for the identical descriptions.
107-132: LGTM!Well-structured concurrency test. The
Eventbarrier ensures threads race simultaneously onget_client(), and the final assertion correctly verifies singleton semantics by checking all returned instances share the sameid().Gradata/src/gradata/cloud/pull.py (1)
113-122: Nice contract hardening for corrupt brain dirs.Catching
OSErroraroundtenant_for()/get_or_create_device_id()keepspull_events()on the documented summary-return path instead of surfacing filesystem corruption as an exception. Based on learnings, PR#102addressed identity helpers returning a summary on corruption instead of raising.Gradata/src/gradata/cloud/materializer.py (1)
144-154: Independent confidence parsing is the right fix.A malformed confidence on one side no longer wipes out the valid value from the other side before Tier 2 comparison.
Gradata/src/gradata/cloud/push.py (6)
1-51: Module structure and imports look good.The module docstring clearly documents the endpoint contract, safety properties (never raises, honors kill switch, watermark advancement semantics), and the overall design. The constants for chunking and retry are reasonable defaults.
53-123: Tenant scoping correctly addresses cross-tenant upload risk.The query filter
(tenant_id = ? OR tenant_id IS NULL)ensures:
- Only current tenant's rows are pushed
- Legacy pre-tenant rows (NULL) are still included for migration continuity
JSON parsing handles malformed data gracefully with fallback values.
Based on learnings,
_events.pyrequired tenant-scoped fallback SELECT for dedup — this follows the same pattern.
126-170: Watermark read/write correctly scoped to (tenant_id, device_id).The
_read_watermarkquery usesWHERE tenant_id = ? AND device_id = ?, and_write_watermarkusesON CONFLICT(tenant_id, device_id)targeting the composite unique index created by_ensure_sync_state_schema. This correctly isolates watermarks per-device for multi-device scenarios.
173-207: Retry logic with exponential backoff is well-implemented.The distinction between 4xx (client error, fatal) and 5xx/transport errors (transient, retry) is correct. Backoff is applied only between retries (
if attempt < max_retries), avoiding unnecessary delay after the final failure.
210-298: Comprehensive input validation maintains the "never raises" contract.The validation sequence properly handles:
- Invalid tunable parameters (returns
invalid_params)- Missing database, kill switch, disabled sync, missing credential, non-HTTPS endpoint
- Identity resolution failures (OSError caught and returns
identity_error)All paths return a structured summary dict.
300-377: Main push loop correctly handles partial acceptance and database errors.The partial 2xx safeguard (lines 338-355) prevents watermark advancement when
rejectedis non-empty oraccepted != len(events), ensuring rejected events aren't permanently skipped. Thesqlite3.Errorhandler maintains the "never raises" contract for database failures.Gradata/tests/test_cloud_events_push.py (8)
24-40: Isolation fixture properly includes endpoint override vars.The fixture now clears
GRADATA_ENDPOINTandGRADATA_CLOUD_API_BASE(lines 31-32), preventing CI/shell environment leakage from affecting test behavior.
43-135: Test helper creates a well-structured minimal brain.The helper creates the necessary schema and identity files. The sync_state table intentionally lacks the UNIQUE index on
(tenant_id, device_id)— this tests that_ensure_sync_state_schemacorrectly upgrades the schema at runtime.
156-187: Early-exit tests correctly verify all short-circuit paths.Tests cover: missing database, kill switch, disabled config, missing credential, and non-HTTPS rejection — all returning the expected structured error summaries.
189-259: Happy path and batching tests provide solid coverage.The tests verify:
- Correct request body shape and event count
- Watermark persistence after successful push
- Resume semantics (skip already-pushed events)
- Chunked batching produces expected batch sizes
[2, 2, 1]for 5 events withchunk_size=2
262-325: Retry tests correctly verify 4xx fatal vs 5xx transient distinction.The tests assert:
- 4xx errors trigger immediate failure (1 call only)
- 5xx errors exhaust retries (
1 + max_retriescalls)- Transient failure followed by success yields
status=okSleep is monkeypatched for fast execution.
358-409: Partial 2xx regression tests verify watermark safety.These critical tests ensure the watermark isn't advanced when:
- Server returns a non-empty
rejectedlistacceptedcount doesn't match events sentBoth correctly assert
status=error,reason=batch_rejected, and verify no watermark row exists.
434-516: Tenant scoping tests verify cross-tenant isolation and legacy row handling.The tests confirm:
- Foreign tenant rows (different
tenant_id) are excluded from push- Legacy rows (
tenant_id IS NULL) are included under the current tenantThis matches the
(tenant_id = ? OR tenant_id IS NULL)filter in_fetch_events_since.
519-529: Identity resolution error test verifies the "never raises" contract.The test confirms that
OSErrorfrom corrupted brain files is caught and returns a structured error summary (reason=identity_error) rather than propagating.
…materializer rows
CodeRabbit round on batch-5 flagged four residual hardening gaps; this
lands them together so the PR stays on the narrow cloud-sync footprint.
pull.py — response decode:
- ``resp.read().decode("utf-8")`` could raise ``UnicodeDecodeError``
that bypassed the HTTP / transport / JSON guards, escaping the
"never raise — return summary" contract. Now caught and mapped to
``reason=malformed_response`` like other payload failures.
pull.py — end_of_stream / watermark type validation:
- ``bool(parsed.get("end_of_stream", True))`` coerced the string
``"false"`` to True, silently claiming the stream was drained.
- A dict / list ``watermark`` would later be ``str()``-ified and
persisted into ``sync_state`` as garbage, poisoning every
subsequent incremental pull.
Both now fail fast as ``malformed_response`` before the value is
consulted.
materializer._load_events_from_db — skip non-dict ``data`` payloads:
- A legacy events row whose ``data_json`` decoded to a scalar / list
/ string would reach ``_rule_key`` / ``_apply_tier`` where
``evt["data"].get(...)`` crashes. Drop non-mapping payloads
alongside the existing JSONDecodeError skip.
test_cloud_events_push.py — sqlite context manager:
- Replaced the manual ``connect`` / ``close`` pair in the happy-path
watermark assertion with ``with sqlite3.connect(...) as conn:`` so
a failing assertion can't leak the connection handle and mask the
real failure behind a Windows file-lock on teardown.
Regression coverage (+4 tests, full suite 4042 passed):
- test_server_200_non_utf8_body_returns_malformed_response
- test_server_200_non_string_watermark_is_rejected
- test_server_200_non_bool_end_of_stream_is_rejected
- test_load_events_from_db_skips_non_dict_data
Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/cloud/materializer.py`:
- Around line 184-192: The default event types tuple is built from
_MATERIALIZABLE_EVENT_TYPES which includes META_RULE_SYNTHESIZED but the
materialize loop (which checks for META_RULE_SYNTHESIZED and always skips it)
makes fetching those rows wasted I/O; change the default building of types in
_load_events_from_db (the variable types and placeholders/query construction) to
filter out META_RULE_SYNTHESIZED (e.g., remove it from the tuple when
event_types is None) or explicitly pass a filtered tuple so the SQL query never
includes META_RULE_SYNTHESIZED, ensuring the SELECT only requests materializable
types that will be processed.
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 282-297: The review points out that summary["applied"] is only set
when lessons were actually written (inside the mat.rules or mat.conflicts
branch), so callers can't distinguish "apply ran successfully but nothing to
change" from "apply not run"; change the logic so that when the apply path
completes successfully (i.e., apply=True and you persist the watermark and no
exception occurred), you set summary["applied"] = True regardless of whether
mat.rules or mat.conflicts were present. Locate the apply path code handling mat
(symbols: mat.rules, mat.conflicts, parse_lessons, apply_to_lessons,
write_lessons_safe, summary) and move or add the summary["applied"] = True
assignment to the successful completion point of that apply branch (after
persisting the watermark and after any lessons write try/except), keeping the
existing error handling that sets summary["status"] = "error" and returns on
exceptions.
In `@Gradata/tests/test_cloud_events_pull.py`:
- Around line 23-27: The brain fixture creates a Brain instance and emits an
event but never persists cloud config, so tests that expect the config to exist
may hit the "config load failed" path; update the fixture (or tests) so the
Brain's configuration is saved by calling its _save_cfg() with sensible defaults
after instantiation (e.g., call Brain(...); b.emit(...); b._save_cfg() )—or
alternatively add a brief comment on the fixture documenting that callers must
call _save_cfg() explicitly; reference the fixture name brain and the
Brain._save_cfg method when making the change.
In `@Gradata/tests/test_cloud_events_push.py`:
- Around line 84-96: The test schema for sync_state is missing the
UNIQUE(tenant_id, device_id) index that production upsert logic in
_write_watermark() expects; update the test table creation in the test helper so
the CREATE TABLE or a subsequent CREATE UNIQUE INDEX ensures UNIQUE(tenant_id,
device_id) exists (i.e., add UNIQUE constraint on tenant_id and device_id or
create a unique index for those columns) so ON CONFLICT(tenant_id, device_id)
behavior in _write_watermark() is exercised correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d21de32d-e687-4e67-af8d-78702e0d1b85
📒 Files selected for processing (5)
Gradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/tests/test_cloud_events_pull.pyGradata/tests/test_cloud_events_push.pyGradata/tests/test_cloud_materializer.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/tests/test_cloud_events_pull.pyGradata/tests/test_cloud_materializer.pyGradata/tests/test_cloud_events_push.py
🔇 Additional comments (6)
Gradata/tests/test_cloud_events_push.py (1)
1-528: LGTM!Comprehensive test coverage for
push_pending_events()including:
- Early-exit status mappings (no_db, kill_switch, disabled, no_credential, https_required)
- Happy path with watermark advancement
- Resume from persisted watermark
- Chunked batching across multiple POSTs
- Retry semantics (4xx fatal, 5xx retried with backoff)
- Partial 2xx rejection handling
- Tenant scoping (foreign tenant rows excluded, legacy NULL rows included)
- Identity resolution error handling
Past review comments have been addressed (env var isolation, partial 2xx test, context manager usage).
Gradata/src/gradata/cloud/pull.py (1)
1-349: LGTM overall — well-structured pull client with comprehensive error handling.The implementation correctly:
- Uses shared endpoint resolver for symmetric env override handling
- Validates response JSON shape (dict, events list, watermark type, end_of_stream type)
- Handles UTF-8 decode failures as malformed_response
- Persists watermark only after successful apply and checks return value
- Emits telemetry only after watermark persistence succeeds
- Uses dynamic SDK version in User-Agent header
All past review comments have been addressed.
Gradata/tests/test_cloud_events_pull.py (1)
1-583: LGTM!Comprehensive contract test suite for
pull_events()covering:
- All early-exit status mappings (no_db, kill_switch, disabled, no_credential, https_required, 501 → disabled_server_side, 410 → rewind_beyond_retention)
- Dry-run vs. apply behavior (lessons.md mutation, applied flag)
- Watermark persistence after successful apply
- Malformed response handling (non-JSON, non-UTF8, wrong types)
- Request composition (brain_id, device_id, Authorization header, limit clamping)
- Auto-resume from persisted cursor
- Pagination (end_of_stream, cursor advancement, defensive stop on missing watermark)
- Configurable conflict threshold
- Credential resolution from keyfile
- Telemetry emission (CLOUD_SYNC_COMPLETED on apply only)
Past review comments have been addressed (env var isolation expanded).
Gradata/tests/test_cloud_materializer.py (2)
452-496: Excellent regression test for_load_events_from_dbskipping non-dict data.This test directly validates the fix for the past review comment about rows with scalar/list
data_jsoncrashing the materializer. The test correctly inserts well-formed and malformed rows (scalar42, list[1,2,3], string"hi") and verifies only the dict row is returned.
1-496: LGTM!Thorough test coverage for the materializer including:
- Single-device graduation and promotion semantics
- Tier 1 LWW (ts wins, device_id tiebreak, small drift)
- Tier 2 conflict detection (confidence drift, state disagreement, post-conflict hold)
- Conflict resolution via RULE_CONFLICT_RESOLVED with various payload variants (winning_ts lookup, winning_event, winning_snapshot)
- Tier 3 RULE_OVERRIDE clearing conflict holds
- Property invariants (idempotent replay, duplicate noop, order-independence)
- Convergence across 200+ shuffled orderings
- Regression tests for independent confidence parsing and non-dict data skipping
Past review comment (extraneous f-string) has been addressed.
Gradata/src/gradata/cloud/materializer.py (1)
1-397: LGTM overall — well-structured materializer with comprehensive Tier 1/2/3 handling.The implementation correctly:
- Normalizes category to uppercase for consistent key matching with the apply path
- Parses confidence values independently to avoid masking legitimate Tier 2 conflicts
- Filters event types to only process materializable events
- Handles RULE_CONFLICT_RESOLVED with multiple fallback strategies (history lookup, winning_event, winning_snapshot)
- Defensively parses numeric fields when building MaterializedRule
- Skips non-dict data_json values from the database
- Prunes conflicts that were resolved within the same stream
All past review comments have been addressed.
| conn.execute( | ||
| """ | ||
| CREATE TABLE sync_state ( | ||
| brain_id TEXT PRIMARY KEY, | ||
| last_push_at TEXT, | ||
| updated_at TEXT, | ||
| device_id TEXT, | ||
| last_push_event_id TEXT, | ||
| last_pull_cursor TEXT, | ||
| tenant_id TEXT | ||
| ) | ||
| """ | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Test sync_state schema lacks the unique index required by production code.
The test helper creates sync_state without the UNIQUE(tenant_id, device_id) index that _write_watermark() relies on (see Gradata/src/gradata/cloud/push.py lines 155-161 using ON CONFLICT(tenant_id, device_id)). While SQLite will accept the ON CONFLICT clause without the index (falling back to no-op or row replacement depending on the conflict), this divergence means tests won't catch regressions if the production upsert logic changes.
♻️ Proposed fix to align test schema with production
conn.execute(
"""
CREATE TABLE sync_state (
brain_id TEXT PRIMARY KEY,
last_push_at TEXT,
updated_at TEXT,
device_id TEXT,
last_push_event_id TEXT,
last_pull_cursor TEXT,
- tenant_id TEXT
+ tenant_id TEXT,
+ UNIQUE(tenant_id, device_id)
)
"""
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/tests/test_cloud_events_push.py` around lines 84 - 96, The test
schema for sync_state is missing the UNIQUE(tenant_id, device_id) index that
production upsert logic in _write_watermark() expects; update the test table
creation in the test helper so the CREATE TABLE or a subsequent CREATE UNIQUE
INDEX ensures UNIQUE(tenant_id, device_id) exists (i.e., add UNIQUE constraint
on tenant_id and device_id or create a unique index for those columns) so ON
CONFLICT(tenant_id, device_id) behavior in _write_watermark() is exercised
correctly.
materializer._load_events_from_db — drop META_RULE_SYNTHESIZED from the
default types tuple. The materialize() loop always skips that type (see
line 255), so including it in the SQL IN (...) list was pure I/O waste
— every pulled meta-rule row was fetched just to be discarded in the
next loop iteration. Callers that genuinely want every type can still
pass ``event_types`` explicitly. Also guards against an empty types
tuple so a malformed override can't produce a ``WHERE type IN ()``
query.
pull.py — ``applied`` now reflects "apply path ran end-to-end", not
"lessons write touched at least one byte":
- Previous behavior: a pull that materialized to an empty delta kept
``applied=False``, indistinguishable from apply=False preview mode.
Callers scheduling follow-ups based on that flag would re-run the
same drained page forever.
- Fix: set ``summary["applied"] = True`` only after every hard
failure path (lessons_write_failed, watermark_persist_failed) has
had its chance to short-circuit, but unconditionally on the empty-
delta happy path.
test_cloud_events_pull.py — document that the ``brain`` fixture
intentionally omits cloud-config; individual tests call ``_save_cfg``
with per-test flags so the unconfigured path stays exercisable.
test_cloud_events_push.py — document that the in-test ``sync_state``
table intentionally lacks UNIQUE(tenant_id, device_id). That's the
regression test for ``_ensure_sync_state_schema`` — if the migration
ever regresses, the first upsert will fail with "ON CONFLICT clause
does not match any PRIMARY KEY or UNIQUE constraint" and every push
test breaks loudly.
Regression coverage (+1 test, full suite 4043 passed):
- test_apply_true_on_empty_delta_still_reports_applied
Co-Authored-By: Gradata <noreply@gradata.ai>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Gradata/src/gradata/cloud/pull.py`:
- Around line 282-298: The summary dict currently only gets
"conflict_events_emitted" when the branch for mat.rules or mat.conflicts runs,
causing an inconsistent schema; ensure summary always contains this key by
initializing summary["conflict_events_emitted"] = 0 before the if (or assigning
it after the block) so that when apply=True and there are no rules/conflicts
callers can rely on summary; update the code around the summary variable in
pull.py where mat and emit_conflict_events are used (referencing summary, mat,
and emit_conflict_events) to set a default zero value and overwrite it with
emit_conflict_events(mat) inside the existing branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c294d02f-2303-4d15-b5a9-c6cf9c8738eb
📒 Files selected for processing (4)
Gradata/src/gradata/cloud/materializer.pyGradata/src/gradata/cloud/pull.pyGradata/tests/test_cloud_events_pull.pyGradata/tests/test_cloud_events_push.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/tests/test_cloud_events_push.pyGradata/tests/test_cloud_events_pull.py
🔇 Additional comments (13)
Gradata/tests/test_cloud_events_push.py (3)
84-105: Test schema design is intentional and well-documented.The deliberate omission of
UNIQUE(tenant_id, device_id)serves as a regression test for the schema migration in_ensure_sync_state_schema(). The inline comment (lines 84-92) clearly documents this rationale. This is a sound testing strategy that will catch migration regressions early.
198-226: Comprehensive happy-path test with proper assertions.This test validates the complete push flow: event capture, HTTP request shaping, watermark advancement, and summary fields. The use of context manager for SQLite (line 224) is consistent with other tests. Good coverage of the core contract.
366-394: Regression test for partial acceptance correctly validates watermark safety.This test ensures that when the server returns a 2xx with rejected events, the watermark is not advanced. This prevents permanently skipping rejected events on subsequent runs. The assertion
row is None or row[0] is Nonecorrectly handles both "no row inserted" and "row exists but watermark is NULL" cases.Gradata/src/gradata/cloud/pull.py (2)
1-36: Module structure and imports are well-organized.The module docstring clearly documents the contract behavior for different HTTP status codes and the
apply=Falsedefault. Imports are clean and the use of__version__for User-Agent ensures version consistency.
144-148: Good defensive bounds on pagination.
PAGE_CAP = 50prevents runaway servers from causing infinite loops. Combined withlimit=1000, this bounds a singlepull_events()call to at most 50,000 events — a reasonable safety limit that still allows large syncs.Gradata/tests/test_cloud_events_pull.py (3)
34-54: Comprehensive environment isolation fixture.The
_isolate_keyfilefixture correctly clears all cloud-related environment variables including the newerGRADATA_ENDPOINT,GRADATA_CLOUD_API_BASE, andGRADATA_API_URL. This ensures tests remain hermetic regardless of the runner's environment.
225-255: Excellent regression test for empty delta behavior.This test validates a critical fix:
applied=Truemust reflect "the apply path ran" not "bytes were written". The test correctly verifies both the summary flag and watermark persistence, ensuring empty-delta pulls don't re-fetch the same page indefinitely.
578-602: Telemetry test correctly validates emission and payload structure.The test verifies that
CLOUD_SYNC_COMPLETEDis emitted exactly once with all expected fields. The mock recorder pattern cleanly captures events for assertion. Good coverage of the telemetry contract.Gradata/src/gradata/cloud/materializer.py (5)
39-55: Event type allowlist is well-documented.The
_MATERIALIZABLE_EVENT_TYPESfrozenset with its inline comment clearly explains why certain event types are excluded from winner selection. This prevents spec-excluded types likeCORRECTIONfrom affecting graduation state.
144-154: Independent confidence parsing prevents cross-contamination.The separate try/except blocks for
e_confandi_confensure that a malformed confidence on one side doesn't zero out both values. This was correctly refactored from the previous coupled parsing.
281-335: RULE_CONFLICT_RESOLVED handling is robust with multiple fallback mechanisms.The implementation correctly handles resolver events that arrive in incremental pulls without the original graduation:
- First checks
graduation_historyfor events in the same batch- Falls back to embedded
winning_eventpayload- Falls back to
winning_snapshotscalar fields and reconstructs a synthetic eventThis addresses the previous critical issue about resolvers becoming no-ops on incremental pulls.
184-194: Default event types correctly excludeMETA_RULE_SYNTHESIZEDto avoid wasted I/O.Since the materialize loop always skips
META_RULE_SYNTHESIZEDevents, excluding them from the default DB query prevents fetching rows that will never be processed. The guard at line 193-194 for emptytypesalso prevents malformed SQL.
371-393: Defensive numeric coercion in result building handles malformed data gracefully.The separate try/except blocks for
confidenceandfire_countensure that malformed values in winning events don't crash the entire materialization pass. Defaults to0.0/0on failure, which is a safe fallback.
…ys present CodeRabbit batch-7 flagged a subtle schema inconsistency: ``summary[ "conflict_events_emitted"]`` was only populated inside the ``if mat.rules or mat.conflicts:`` branch, so callers on the empty- delta apply path had to fall back to ``.get(..., 0)`` while callers on the non-empty path could index directly. Initialize the key to ``0`` at the top of the ``if apply:`` branch so the summary schema is stable across both paths and ``emit_conflict_ events(mat)`` just overwrites it when rules/conflicts actually ran. Regression lock-in: extend the existing empty-delta test to assert ``result["conflict_events_emitted"] == 0``. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
Ships Phase 2 of cloud sync end-to-end, plus a significant legacy-bloat audit that removes ~1,600 net LOC of deprecated shims and dead code.
Cloud sync (Phase 2)
3a76c3ad)/events/pushwith watermark + retry (ac4ecc4e)/events/pullclient, materializer (Tier 1 LWW + Tier 2 conflict detection), and apply pipeline (ac31aba7→a2b61d13)f8762487)gradata cloud enable/rotate-key/status/disconnect+gradata cloud sync-pull(082fc05a,cf945139)64bee364)cloud-config.json(266a76fe)cfa1cfac)CLOUD_SYNC_COMPLETEDtelemetry emitted after watermark persist, withsync_statustag enrichment (04d651f1)4c86d323)Legacy bloat cleanup (
c8a106f9)Shims whose deprecation warnings fired since 0.6.x are now removed per new
docs/contributing/deprecation-policy.md(two-minor-version carry rule):gradata.patterns/shim → usegradata.contrib.patterns/gradata.rulesgradata.enhancements.carl→ use.behavioral_enginegradata.enhancements.profiling.tone_profile(orphan, 546 LOC)gradata.integrations.{anthropic,openai,langchain,crewai}_adapter→ usegradata.middleware.{wrap_*, LangChainCallback, CrewAIGuard}gradata login/gradata logoutCLI + device-auth helpers → usegradata cloud enable --key gk_live_...Live modules colocated with the deprecated tree relocated to
gradata.services/:gradata.services.embeddingsgradata.services.session_historyForwarding shims kept at old paths through v0.9.0 per policy.
Supporting
a6e9c518)RULE_GRADUATEDevents at every state transition (731cb536)10df605a)Test plan
pytest tests/)CLOUD_SYNC_COMPLETEDemits on apply, not on dry-runservices.embeddings/services.session_historystill importable from old pathgradata cloud enable→gradata cloud sync-pullagainst stagingCLOUD_SYNC_COMPLETEDin dashboard after real pullBreaking changes
See CHANGELOG
[0.7.0]entry. Public symbols/paths removed have emittedDeprecationWarningfor at least two minor versions per the new deprecation policy.Generated with Gradata