fix(sync): drain sync_queue on daemon startup (GRA-1245 / epic GRA-1198)#213
Conversation
Previously, daemon restart left rows in sync_queue with synced_at IS NULL until the next correction event triggered a flush. If the daemon crashed mid-batch, pending syncs sat indefinitely. This change adds gradata._sync_worker.drain_sync_queue(), a synchronous one-shot drainer that GradataDaemon.start() now invokes BEFORE binding the HTTP listener. The hook is idempotent, safe to call concurrently with the background SyncWorker (both paths open their own short-lived sqlite connections; the cloud /ingest endpoint dedupes any duplicate POST on event_id), and emits a discoverable info log line 'sync queue drained at startup: N rows' (visible in journalctl / daemon.log). Summary: - New gradata._sync_worker.drain_sync_queue(brain_dir, api_key, ingest_url=...) -> int - New GradataDaemon._drain_sync_queue_at_startup() called before _try_bind() - New tests/test_sync_drain_on_startup.py covers happy path, idempotency, missing api_key, fresh brain, concurrent callers, and the daemon hook itself Layering: Layer 0 helper called from Layer 2 daemon; no Layer 0 -> 2 import introduced. Risk: low — startup-drain failures are caught and logged; the background SyncWorker still handles steady-state and retries.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough
WalkthroughThis PR introduces a synchronous ChangesSync Queue Startup Drain
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.21.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/_sync_worker.py`:
- Around line 85-90: The function drain_sync_queue has multiple early-return
paths that don't log the startup no-op; add a discoverability debug log before
each early return so all no-op exits emit the same startup-drain message.
Specifically, in _sync_worker.py add a logger.debug call (matching the existing
style, e.g. "drain_sync_queue: startup no-op - reason") immediately before the
early returns where db_path is missing (the not db_path.exists() branch) and the
other early-return block around lines 101-105, referencing the same logger used
elsewhere so that drain_sync_queue, db_path, and api_key code paths all log the
startup-drain info.
In `@Gradata/tests/test_sync_drain_on_startup.py`:
- Around line 315-317: The assertion is tautological because a prior check
already ensures pending is zero; replace the current assertion with a true
validation of returned accounting by removing the fallback clause and asserting
that sum(results) >= 10 (i.e., assert sum(results) >= 10) so the test actually
verifies drain_sync_queue's reported drained count; locate the assertion
referencing results and _count_pending(db_path) in the
test_sync_drain_on_startup test and update it accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 094e5896-95cb-4b9c-bec7-306de8732ef1
📒 Files selected for processing (3)
Gradata/src/gradata/_sync_worker.pyGradata/src/gradata/daemon.pyGradata/tests/test_sync_drain_on_startup.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest windows-latest / py3.12
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/daemon.pyGradata/src/gradata/_sync_worker.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_sync_drain_on_startup.py
🔇 Additional comments (3)
Gradata/src/gradata/_sync_worker.py (1)
36-36: LGTM!Also applies to: 47-84, 92-100, 107-151, 318-318
Gradata/src/gradata/daemon.py (1)
1040-1068: LGTM!Also applies to: 1075-1080
Gradata/tests/test_sync_drain_on_startup.py (1)
1-314: LGTM!
| if not db_path.exists(): | ||
| return 0 | ||
| if not api_key: | ||
| # Cloud sync not configured — nothing to do. Don't block startup. | ||
| logger.debug("drain_sync_queue: no api_key, skipping") | ||
| return 0 |
There was a problem hiding this comment.
Emit the startup-drain info log on all early no-op exits.
Line 85, Line 87, and Line 101 return early without the expected discoverability log. That makes startup behavior harder to audit in production.
🔧 Suggested patch
db_path = brain_dir / "system.db"
if not db_path.exists():
+ logger.info("sync queue drained at startup: 0 rows")
return 0
if not api_key:
# Cloud sync not configured — nothing to do. Don't block startup.
logger.debug("drain_sync_queue: no api_key, skipping")
+ logger.info("sync queue drained at startup: 0 rows")
return 0
@@
except sqlite3.DatabaseError as exc:
# No sync_queue table yet (fresh brain) or DB locked — let the
# background worker handle it on its first tick.
logger.debug("drain_sync_queue: pre-check failed (%s); skipping", exc)
+ logger.info("sync queue drained at startup: 0 rows")
return 0Also applies to: 101-105
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/_sync_worker.py` around lines 85 - 90, The function
drain_sync_queue has multiple early-return paths that don't log the startup
no-op; add a discoverability debug log before each early return so all no-op
exits emit the same startup-drain message. Specifically, in _sync_worker.py add
a logger.debug call (matching the existing style, e.g. "drain_sync_queue:
startup no-op - reason") immediately before the early returns where db_path is
missing (the not db_path.exists() branch) and the other early-return block
around lines 101-105, referencing the same logger used elsewhere so that
drain_sync_queue, db_path, and api_key code paths all log the startup-drain
info.
| # Total drained across both runners equals 10 (sum across threads; one | ||
| # may see 10, the other 0, or any split — but no row is lost). | ||
| assert sum(results) >= 10 or _count_pending(db_path) == 0 |
There was a problem hiding this comment.
Fix the tautological assertion in the concurrent test.
Line 314 already guarantees pending is zero, so Line 317 always passes and doesn’t validate drain_sync_queue return accounting.
✅ Suggested assertion
- assert sum(results) >= 10 or _count_pending(db_path) == 0
+ assert 10 <= sum(results) <= 20📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Total drained across both runners equals 10 (sum across threads; one | |
| # may see 10, the other 0, or any split — but no row is lost). | |
| assert sum(results) >= 10 or _count_pending(db_path) == 0 | |
| # Total drained across both runners equals 10 (sum across threads; one | |
| # may see 10, the other 0, or any split — but no row is lost). | |
| assert 10 <= sum(results) <= 20 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/tests/test_sync_drain_on_startup.py` around lines 315 - 317, The
assertion is tautological because a prior check already ensures pending is zero;
replace the current assertion with a true validation of returned accounting by
removing the fallback clause and asserting that sum(results) >= 10 (i.e., assert
sum(results) >= 10) so the test actually verifies drain_sync_queue's reported
drained count; locate the assertion referencing results and
_count_pending(db_path) in the test_sync_drain_on_startup test and update it
accordingly.
…-1198) (#214) The .claude-plugin/ directory itself was already removed in a prior cleanup (see CHANGELOG: 'Remove orphaned gradata-plugin/ subdir (#54)'). What remained were stale string references in docs and examples now that the SDK ships all subcommands directly (PRs #208/#209/#210/#211 + #213). Changes: - .dockerignore: removed dead .claude-plugin exclude line - examples/with_claude_code.py: replaced '/plugin install gradata' language with the canonical 'gradata install --agent claude-code' - examples/README.md: fix broken link to .claude-plugin/README.md - CHANGELOG.md: BREAKING entry under Unreleased documenting the retirement This closes out the kill-the-plugin epic (GRA-1198 / GH #206) from the references side. Anyone who installed via /plugin marketplace before 2026-05-20 must migrate to the SDK install path. Verified: - pip install /home/olive/work/gradata-sdk/Gradata in a fresh venv succeeds - gradata install --agent claude-code --brain /tmp/test-brain --help works - pytest tests/ -x -q passes (816 tests, 7 skipped, 1 known-skip on test_byo_key_provider for missing httpx in dev env unrelated to this) - ruff check clean on touched files - grep for 'claude-plugin|gradata-plugin' on src/ + docs/ shows only the intentional CHANGELOG entries (current BREAKING note + historical refs) Branch authored by delegate_task subagent (hit max_iterations on PR-open); parent agent verified + extracted clean diff + opened PR. Co-authored-by: data-engineer <data-engineer@gradata.ai>
GRA-1245: drain sync_queue on daemon startup
Summary
Daemon restart used to leave rows in
sync_queuewithsynced_at IS NULLuntil the next correction event triggered a flush. If the daemon crashed mid-batch, pending syncs sat indefinitely.This PR adds a one-shot synchronous drain that runs before the HTTP listener opens, so the first request after a daemon (re)start never observes a stale backlog.
What changed
src/gradata/_sync_worker.py— new module-leveldrain_sync_queue(brain_dir, api_key, ingest_url=...) -> intthat synchronously flushes pending rows. Bounded (max 64 batches × 50 rows), idempotent, and gracefully no-ops on missingsystem.db/ missingapi_key/ missingsync_queuetable so startup never fails because cloud sync is misconfigured.src/gradata/daemon.py—GradataDaemon.start()now calls a new_drain_sync_queue_at_startup()hook before_try_bind(). Wraps the drain intry/exceptso a drain error never blocks daemon boot; the backgroundSyncWorkerstill handles steady-state.sync queue drained at startup: N rows(info, loggergradata.sync_worker) so operators can confirm viajournalctlordaemon.log.Concurrency contract
Safe to call concurrently with the background
SyncWorker:/api/v1/ingestendpoint deduplicates onevent_id.synced_atare filtered out bypeek_pending, so re-invocation is a no-op.Test plan
New
tests/test_sync_drain_on_startup.py(7 tests, all passing):test_drain_sync_queue_flushes_pending_rows— seed 5 pending rows, call drain, assert allsynced_atpopulated + log line emitted + 5 HTTP POSTs.test_drain_sync_queue_is_idempotent— second call on a clean queue drains 0 rows.test_drain_sync_queue_noop_without_api_key—api_key=None→ 0 rows, no crash, rows remain pending.test_drain_sync_queue_noop_when_db_missing— fresh brain dir.test_drain_sync_queue_noop_when_table_missing— DB withoutsync_queuetable.test_daemon_startup_hook_drains_before_listener— callsGradataDaemon._drain_sync_queue_at_startup()directly (no HTTP server bound), proves the integration.test_drain_sync_queue_safe_concurrent_with_worker— two concurrent drainers; queue ends fully drained, no errors, no double-mark from the local DB's perspective.Existing
test_sync_worker.py,test_sync_queue.py,test_daemon_sync.pyall still pass (34 tests total across all sync files).Ruff:
ruff check --fix+ruff formatclean on all touched files.Layering check
gradata._sync_worker(Layer 0) gains a module-level helper.gradata.daemon(Layer 2) imports fromgradata._sync_worker(Layer 0).Risk
Low.
SyncWorkercontinues to handle steady-state and retries anything the startup drain skipped.Kanban: t_bd7b2bf4 · Epic: GRA-1198