feat: retain orchestrator + cluster-level injection by Gradata · Pull Request #96 · Gradata/gradata

Gradata · 2026-04-16T21:53:35Z

Summary

RetainOrchestrator: 3-phase event persistence with crash recovery + delta detection
Cluster-level injection: summaries replace individual rules, saves injection slots
2871 tests passing (+30 new)

Test plan

30 new tests (23 orchestrator + 7 cluster injection)
Full suite: 2871 passed

Generated with Gradata

- RetainOrchestrator: 3-phase event persistence adapted from Hindsight. Phase 1 (read): delta detection via dedup keys. Phase 2 (atomic): JSONL + SQLite write with crash recovery cursor. Phase 3 (best-effort): manifest update. 23 new tests. - Cluster-level injection: inject cluster summaries instead of individual rules when cluster has confidence >= 0.75, size >= 3, no contradictions. Saves injection slots (10 → fewer items, more semantic density). 7 new tests. 2871 tests passing (+30 new). Co-Authored-By: Gradata <noreply@gradata.ai>

greptile-apps

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-04-16T21:53:48Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33e46cc8-9ade-471a-a4a9-e21abed77611

📥 Commits

Reviewing files that changed from the base of the PR and between 76758a8 and aa29714.

📒 Files selected for processing (4)

src/gradata/_events.py
src/gradata/hooks/inject_brain_rules.py
tests/test_cluster_injection.py
tests/test_retain_orchestrator.py

📝 Walkthrough

Introduces RetainOrchestrator class for batch-persisting queued event dictionaries with cursor-based deduplication and 3-phase workflow:
- Phase 1: Delta detection via stable per-event dedup keys (ts|type|source)
- Phase 2: Atomic append to events.jsonl and conditional SQLite insert with crash-recovery cursor
- Phase 3: Best-effort manifest update
Public API additions to RetainOrchestrator: queue(event), pending_count property, flush() with structured result reporting
Cluster-level rule injection: Replaces individual-rule injections with cluster summaries when clusters meet thresholds (confidence ≥ 0.75, size ≥ 3, no contradictions), reducing injection slots and increasing semantic density
Fallback behavior: Cluster injection degrades gracefully to individual rule injection if clustering is unavailable
Comprehensive test coverage: 23 tests for orchestrator (queue, delta detection, crash recovery, phase interactions), 7 tests for cluster injection (qualifying/non-qualifying clusters, contradictions, confidence filtering)
Full test suite: 2,871 tests passing (+30 new tests)
No breaking changes: Existing inject_brain_rules.main() signature unchanged; new RetainOrchestrator is additive

Walkthrough

Introduces RetainOrchestrator class for batch-persisting event dictionaries with cursor-based deduplication and 3-phase flush logic. Enhances rule injection hook to cluster qualifying lessons into single compressed summary lines. Adds comprehensive test coverage for both features.

Changes

Cohort / File(s)	Summary
Event Orchestration `src/gradata/_events.py`	New `RetainOrchestrator` class implementing queue/flush persistence with cursor-based deduplication, 3-phase flush (read existing → write new events → update manifest), database integration with `INSERT OR IGNORE`, and structured error reporting.
Rule Clustering Enhancement `src/gradata/hooks/inject_brain_rules.py`	Enhanced rule injection to optionally cluster filtered lessons (min_cluster_size=3, confidence threshold 0.75, no contradictions) into single summary lines, falling back to individual rules if clustering unavailable or not applicable.
Test Suites `tests/test_retain_orchestrator.py`, `tests/test_cluster_injection.py`	New test modules: `test_retain_orchestrator.py` validates orchestrator queue/flush behavior, deduplication across restarts, cursor persistence, crash recovery, and phase-specific error handling; `test_cluster_injection.py` validates cluster injection scenarios including qualification criteria, formatting, and fallback to non-clustered rules.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Orchestrator as RetainOrchestrator
    participant FileSystem
    participant Database
    participant Manifest

    Client->>Orchestrator: queue(event)
    Orchestrator->>Orchestrator: Accumulate in pending queue

    Client->>Orchestrator: flush()
    activate Orchestrator
    
    rect rgba(100, 149, 237, 0.5)
    Note over Orchestrator, FileSystem: Phase 1: Read Existing
    Orchestrator->>FileSystem: Load events.jsonl (best-effort)
    Orchestrator->>FileSystem: Load .event_cursor.json
    Orchestrator->>Orchestrator: Compute existing_keys<br/>(dedup from ts|type|source)
    Orchestrator->>Orchestrator: Identify new events
    end
    
    rect rgba(60, 179, 113, 0.5)
    Note over Orchestrator, Database: Phase 2: Write & Persist
    Orchestrator->>FileSystem: Append new events to events.jsonl
    Orchestrator->>Database: Ensure events table
    loop For each event
        Orchestrator->>Database: INSERT OR IGNORE
    end
    Orchestrator->>FileSystem: Save cursor (last_committed_key)
    end
    
    rect rgba(184, 134, 11, 0.5)
    Note over Orchestrator, Manifest: Phase 3: Post-Write (best-effort)
    Orchestrator->>Manifest: update_manifest(brain_dir)
    end
    
    Orchestrator->>Client: Return {written, errors, phases}
    deactivate Orchestrator

sequenceDiagram
    participant Hook
    participant Lessons as Lesson Parser
    participant Clustering
    participant Filter as Rule Filter
    participant Output

    Hook->>Lessons: parse_lessons(data)
    Lessons->>Hook: Return lesson list

    alt Clustering available
        Hook->>Clustering: cluster_rules(filtered_lessons,<br/>min_cluster_size=3)
        Clustering->>Clustering: Form clusters by category
        
        rect rgba(60, 179, 113, 0.5)
        Note over Clustering: Per-Cluster Decision
        alt confidence >= 0.75 AND<br/>not contradictions
            Clustering->>Output: Emit [CLUSTER:...|N rules]
        else Low confidence OR contradictions
            Clustering->>Filter: Pass to individual rules
        end
        end
    else Clustering unavailable
        Hook->>Filter: Proceed with all lessons
    end

    Filter->>Filter: Build individual [RULE:]/[PATTERN:] lines<br/>for non-clustered rules
    Filter->>Output: Append individual lines
    Output->>Hook: Return combined lines

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(hooks): inject meta-rules into LLM context at session start #45: Modifies the same inject_brain_rules.main() function to add meta-rule loading/injection, which may intersect with cluster-based rule compression changes in rule injection logic.

Suggested labels

feature

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/behavioral-engine

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Gradata merged commit 638ae22 into main Apr 16, 2026
3 of 5 checks passed

greptile-apps Bot reviewed Apr 16, 2026

View reviewed changes

coderabbitai Bot added the feature label Apr 16, 2026

This was referenced Apr 17, 2026

feat: wire RetainOrchestrator + dedup-safe emit() writes #98

Merged

feat: multi-tenant SDK + cloud alignment (tenant_id, visibility, clusters, sync_state) #102

Merged

Gradata deleted the feat/behavioral-engine branch April 17, 2026 17:47

This was referenced Apr 17, 2026

docs(cloud): v2 Postgres-as-monolith schema upgrade #105

Merged

feat(hooks): opt-out env kill switches for 6 SDK hooks + audit fixes #133

Merged

feat(inject): session-start emits synthesized prose instead of N rule lines #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: retain orchestrator + cluster-level injection#96

feat: retain orchestrator + cluster-level injection#96
Gradata merged 1 commit intomainfrom
feat/behavioral-engine

Gradata commented Apr 16, 2026

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 16, 2026

Summary

Test plan

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading