Skip to content

feat: retain orchestrator + cluster-level injection#96

Merged
Gradata merged 1 commit intomainfrom
feat/behavioral-engine
Apr 16, 2026
Merged

feat: retain orchestrator + cluster-level injection#96
Gradata merged 1 commit intomainfrom
feat/behavioral-engine

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 16, 2026

Summary

  • RetainOrchestrator: 3-phase event persistence with crash recovery + delta detection
  • Cluster-level injection: summaries replace individual rules, saves injection slots
  • 2871 tests passing (+30 new)

Test plan

  • 30 new tests (23 orchestrator + 7 cluster injection)
  • Full suite: 2871 passed

Generated with Gradata

- RetainOrchestrator: 3-phase event persistence adapted from Hindsight.
  Phase 1 (read): delta detection via dedup keys. Phase 2 (atomic):
  JSONL + SQLite write with crash recovery cursor. Phase 3 (best-effort):
  manifest update. 23 new tests.
- Cluster-level injection: inject cluster summaries instead of individual
  rules when cluster has confidence >= 0.75, size >= 3, no contradictions.
  Saves injection slots (10 → fewer items, more semantic density). 7 new tests.

2871 tests passing (+30 new).

Co-Authored-By: Gradata <noreply@gradata.ai>
@Gradata Gradata merged commit 638ae22 into main Apr 16, 2026
3 of 5 checks passed
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33e46cc8-9ade-471a-a4a9-e21abed77611

📥 Commits

Reviewing files that changed from the base of the PR and between 76758a8 and aa29714.

📒 Files selected for processing (4)
  • src/gradata/_events.py
  • src/gradata/hooks/inject_brain_rules.py
  • tests/test_cluster_injection.py
  • tests/test_retain_orchestrator.py

📝 Walkthrough
  • Introduces RetainOrchestrator class for batch-persisting queued event dictionaries with cursor-based deduplication and 3-phase workflow:
    • Phase 1: Delta detection via stable per-event dedup keys (ts|type|source)
    • Phase 2: Atomic append to events.jsonl and conditional SQLite insert with crash-recovery cursor
    • Phase 3: Best-effort manifest update
  • Public API additions to RetainOrchestrator: queue(event), pending_count property, flush() with structured result reporting
  • Cluster-level rule injection: Replaces individual-rule injections with cluster summaries when clusters meet thresholds (confidence ≥ 0.75, size ≥ 3, no contradictions), reducing injection slots and increasing semantic density
  • Fallback behavior: Cluster injection degrades gracefully to individual rule injection if clustering is unavailable
  • Comprehensive test coverage: 23 tests for orchestrator (queue, delta detection, crash recovery, phase interactions), 7 tests for cluster injection (qualifying/non-qualifying clusters, contradictions, confidence filtering)
  • Full test suite: 2,871 tests passing (+30 new tests)
  • No breaking changes: Existing inject_brain_rules.main() signature unchanged; new RetainOrchestrator is additive

Walkthrough

Introduces RetainOrchestrator class for batch-persisting event dictionaries with cursor-based deduplication and 3-phase flush logic. Enhances rule injection hook to cluster qualifying lessons into single compressed summary lines. Adds comprehensive test coverage for both features.

Changes

Cohort / File(s) Summary
Event Orchestration
src/gradata/_events.py
New RetainOrchestrator class implementing queue/flush persistence with cursor-based deduplication, 3-phase flush (read existing → write new events → update manifest), database integration with INSERT OR IGNORE, and structured error reporting.
Rule Clustering Enhancement
src/gradata/hooks/inject_brain_rules.py
Enhanced rule injection to optionally cluster filtered lessons (min_cluster_size=3, confidence threshold 0.75, no contradictions) into single summary lines, falling back to individual rules if clustering unavailable or not applicable.
Test Suites
tests/test_retain_orchestrator.py, tests/test_cluster_injection.py
New test modules: test_retain_orchestrator.py validates orchestrator queue/flush behavior, deduplication across restarts, cursor persistence, crash recovery, and phase-specific error handling; test_cluster_injection.py validates cluster injection scenarios including qualification criteria, formatting, and fallback to non-clustered rules.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Orchestrator as RetainOrchestrator
    participant FileSystem
    participant Database
    participant Manifest

    Client->>Orchestrator: queue(event)
    Orchestrator->>Orchestrator: Accumulate in pending queue

    Client->>Orchestrator: flush()
    activate Orchestrator
    
    rect rgba(100, 149, 237, 0.5)
    Note over Orchestrator, FileSystem: Phase 1: Read Existing
    Orchestrator->>FileSystem: Load events.jsonl (best-effort)
    Orchestrator->>FileSystem: Load .event_cursor.json
    Orchestrator->>Orchestrator: Compute existing_keys<br/>(dedup from ts|type|source)
    Orchestrator->>Orchestrator: Identify new events
    end
    
    rect rgba(60, 179, 113, 0.5)
    Note over Orchestrator, Database: Phase 2: Write & Persist
    Orchestrator->>FileSystem: Append new events to events.jsonl
    Orchestrator->>Database: Ensure events table
    loop For each event
        Orchestrator->>Database: INSERT OR IGNORE
    end
    Orchestrator->>FileSystem: Save cursor (last_committed_key)
    end
    
    rect rgba(184, 134, 11, 0.5)
    Note over Orchestrator, Manifest: Phase 3: Post-Write (best-effort)
    Orchestrator->>Manifest: update_manifest(brain_dir)
    end
    
    Orchestrator->>Client: Return {written, errors, phases}
    deactivate Orchestrator
Loading
sequenceDiagram
    participant Hook
    participant Lessons as Lesson Parser
    participant Clustering
    participant Filter as Rule Filter
    participant Output

    Hook->>Lessons: parse_lessons(data)
    Lessons->>Hook: Return lesson list

    alt Clustering available
        Hook->>Clustering: cluster_rules(filtered_lessons,<br/>min_cluster_size=3)
        Clustering->>Clustering: Form clusters by category
        
        rect rgba(60, 179, 113, 0.5)
        Note over Clustering: Per-Cluster Decision
        alt confidence >= 0.75 AND<br/>not contradictions
            Clustering->>Output: Emit [CLUSTER:...|N rules]
        else Low confidence OR contradictions
            Clustering->>Filter: Pass to individual rules
        end
        end
    else Clustering unavailable
        Hook->>Filter: Proceed with all lessons
    end

    Filter->>Filter: Build individual [RULE:]/[PATTERN:] lines<br/>for non-clustered rules
    Filter->>Output: Append individual lines
    Output->>Hook: Return combined lines
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

feature

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/behavioral-engine

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant