Skip to content

feat: S102 — MiroFish P0-P2 roadmap implementation (9 features)#24

Merged
Gradata merged 16 commits intomainfrom
feat/s102-mirofish-p0-p2
Apr 11, 2026
Merged

feat: S102 — MiroFish P0-P2 roadmap implementation (9 features)#24
Gradata merged 16 commits intomainfrom
feat/s102-mirofish-p0-p2

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Apr 10, 2026

Summary

Implements all 9 features from the MiroFish expert panel P0-P2 roadmap. Three priority levels executed in parallel worktrees with TDD.

P0 — Tightly Coupled (semantic severity, rule suppression, taxonomy)

  • P0.1 Semantic severity: EmbeddingClient.semantic_similarity() + adjust_severity_by_semantics() — downgrades severity when meaning is preserved despite high edit distance
  • P0.2 Rule suppression tracking: RULE_SUPPRESSION events emitted at all 4 filter stages in apply_rules() (domain_disabled, relevance_threshold, assumption_invalid, conflict)
  • P0.3 Rosch 6-category taxonomy: 15 flat categories restructured into 6 parent categories (CONTENT/VOICE/STRUCTURE/PROCESS/CONSTRAINT/QUALITY) with subordinate mappings. Fully backward-compatible

P1 — Moderately Coupled (Bayesian confidence, implicit approval, intent classifier)

  • P1.1 Bayesian confidence: Existing beta_posterior() from _stats.py integrated into confidence pipeline. Alpha/beta parameters on Lesson, Bayesian+FSRS blend that ramps from pure FSRS (< 5 obs) to 70% Bayesian (>= 20 obs). Roundtrip persistence in lessons.md
  • P1.2 Implicit approval: Positive signal patterns (looks good, perfect, ship it, etc.) added to implicit_feedback.py. Emits OUTPUT_ACCEPTED events alongside existing negative signals
  • P1.3 Intent classifier: New detection/intent_classifier.py classifies corrections by intent (factual_correction, compliance, formatting, clarity, conciseness, completeness, tone_shift, preference, unknown)

P2 — Independent (recurring patterns, temporal scope, cognitive load)

  • P2.1 Recurring patterns: detect_recurring_patterns() groups lessons by category and identifies cross-correction patterns (3+ corrections in same category)
  • P2.2 Temporal scope: 3 new fields on RuleScope (temporal_relevance, max_idle_sessions, created_session) + temporal_decay() with configurable exponential decay
  • P2.3 Cognitive load tag: New cognitive_load closed tag (intrinsic/extraneous/germane) with auto-inference from correction category

Stats

  • Baseline: 1,858 tests
  • Final: 1,919 tests (+61 new)
  • 2 pre-existing test isolation failures in TestBug7DomainAgnostic (pass individually, fail in suite due to module-level TAXONOMY pollution — not caused by these changes)
  • No new dependencies added
  • All changes extend existing files (1 new file: intent_classifier.py)

Test plan

  • Full test suite: 1,919 passed, 23 skipped
  • P0 semantic similarity: identical texts = 1.0, different texts < 0.8
  • P0 severity downgrade: moderate→minor at 0.90 sim, no change at 0.50
  • P0 suppression events: emitted at domain, threshold, assumption, conflict stages
  • P0 Rosch taxonomy: all existing categories map to a parent, parent_category() roundtrips
  • P1 Bayesian: alpha increments on reinforce, beta on contradict, blend produces valid [0,1]
  • P1 alpha/beta roundtrip: format_lessons→parse_lessons preserves values
  • P1 approval signals: "looks good" detected, "now do the next task" not detected
  • P1 intent: factual markers trigger factual_correction, compliance terms trigger compliance
  • P2 recurring: 3+ same-category lessons detected, < 3 returns empty
  • P2 temporal decay: fresh=1.0, midpoint=0.4-0.8, at-limit<0.15, beyond=0.05
  • P2 cognitive load: FACTUAL→intrinsic, STYLE→extraneous, PROCESS→germane, explicit not overridden

Generated with Gradata

Greptile Summary

This PR implements all 9 features from the MiroFish P0-P2 roadmap across 22 files, adding semantic severity downgrading, rule suppression tracking, Rosch taxonomy restructuring, Bayesian confidence blending, implicit approval signals, intent classification, recurring pattern detection, temporal scope decay, and cognitive load tagging to the Gradata SDK.

The implementation is well-structured overall — TDD approach with +61 tests, no new dependencies, and all new features extend existing files cleanly. Two actionable bugs were found:

  • Scope field loss in apply_rules (P1): When apply_rules rebuilds RuleScope to inject a detected task type, it manually copies only 5 of 10 fields, silently dropping agent_type, namespace, and all 3 new P2.2 temporal fields (temporal_relevance, max_idle_sessions, created_session). A dataclasses.replace() call fixes this.
  • Unguarded emit in log_suppression (P1): The new log_suppression function calls emit bare, unlike log_application which wraps it in try-except. A bus failure at any of the 4 suppression call sites in apply_rules would propagate and abort the entire rule-selection pipeline, violating Rule 4.

Style issues (repeated import json aliases in loop body, _COGNITIVE_LOAD_MAP as a local dict) are minor and can be addressed as a follow-up.

Confidence Score: 4/5

Safe to merge after fixing the two P1 bugs — scope field loss and unguarded emit in log_suppression.

Large, well-tested PR (+61 tests, 1919 total passing) with clean architecture across 9 features. Two P1 bugs exist: the scope reconstruction regression in apply_rules silently discards P2.2 temporal fields and is triggered whenever user_message is provided without a pre-set task_type, and the unguarded emit in log_suppression can abort the rule pipeline on bus failure. Both are targeted one-line fixes. No security issues, no new dependencies, prior thread concerns about _p shadowing are confirmed resolved.

src/gradata/rules/rule_engine.py (scope reconstruction at line 549) and src/gradata/rules/rule_tracker.py (bare emit in log_suppression at line 105)

Important Files Changed

Filename Overview
src/gradata/rules/rule_engine.py Adds log_suppression calls at 4 filter stages — but scope reconstruction at line 549 silently drops all new P2.2 temporal fields when rebuilding the scope.
src/gradata/rules/rule_tracker.py New log_suppression function emits RULE_SUPPRESSION events but the bare emit call lacks a try-except guard, unlike log_application — a failing event bus will propagate and abort apply_rules.
src/gradata/detection/intent_classifier.py New dual-layer regex intent classifier; clean structure, good use of from future import annotations.
src/gradata/_scope.py Adds temporal_relevance, max_idle_sessions, created_session fields to RuleScope and new temporal_decay() function — correct implementation, scope_from_dict properly coerces int fields.
src/gradata/_tag_taxonomy.py Adds Rosch 6-category hierarchy, cognitive_load closed tag. Loop variable renamed from _p to _rp (fixing prior thread bug). _COGNITIVE_LOAD_MAP should be a module-level constant.
src/gradata/enhancements/self_improvement.py Adds alpha/beta Bayesian fields to parse_lessons/format_lessons and _bayesian_confidence() blend function. Multiple import json aliases inside loop body are unnecessary but harmless.
src/gradata/enhancements/diff_engine.py Adds adjust_severity_by_semantics() and semantic_similarity field to DiffResult — clean implementation, no issues.
src/gradata/integrations/embeddings.py Adds semantic_similarity() to EmbeddingClient — straightforward cosine similarity wrapper with correct empty-string guard.
src/gradata/hooks/implicit_feedback.py Adds APPROVAL_PATTERNS and emits OUTPUT_ACCEPTED events — emit is wrapped in try-except, consistent with existing pattern.
src/gradata/_types.py Adds alpha and beta_param float fields to Lesson with correct defaults (1.0 = Beta(1,1) uninformative prior).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[apply_rules called with RuleScope + user_message] --> B{scope.task_type empty AND user_message set?}
    B -- Yes --> C[detect_task_type from user_message]
    C --> D[RuleScope rebuilt - drops agent_type, namespace, temporal_relevance, max_idle_sessions, created_session]
    B -- No --> E[Use original scope]
    D --> F[Step 1: Eligibility gate PATTERN + RULE only]
    E --> F
    F --> G[Step 1.5: Domain disable check]
    G -- Suppressed --> H[log_suppression reason: domain_disabled]
    G -- Passes --> I[Step 2/3: Scope scoring]
    H --> I
    I -- relevance lt 0.3 --> J[log_suppression reason: relevance_threshold]
    I -- relevance ge 0.3 --> K[Step 3.5: Assumption check]
    J --> K
    K -- Invalid --> L[log_suppression reason: assumption_invalid]
    K -- Valid --> M[Step 4/5: Sort by state x difficulty x relevance x confidence]
    L --> M
    M --> N{graph conflict check}
    N -- Conflict --> O[log_suppression reason: conflict]
    N -- No conflict --> P[Step 6: Build AppliedRule objects]
    O --> P
    P --> Q[Return AppliedRule list]
Loading

Comments Outside Diff (2)

  1. src/gradata/rules/rule_engine.py, line 549-556 (link)

    P1 New RuleScope fields are silently dropped when scope is enriched with task type

    This PR adds five new fields to RuleScope: agent_type, namespace, temporal_relevance, max_idle_sessions, created_session. However, when apply_rules reconstructs a frozen scope after detecting a task type, it only copies the original five fields — the new ones are lost.

    Any caller that sets, for example, temporal_relevance="recent" or max_idle_sessions=10 and relies on task-type auto-detection will silently get a scope with those fields zeroed out, making the P2 temporal decay feature inoperative for that code path.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/rules/rule_engine.py
    Line: 549-556
    
    Comment:
    **New `RuleScope` fields are silently dropped when scope is enriched with task type**
    
    This PR adds five new fields to `RuleScope`: `agent_type`, `namespace`, `temporal_relevance`, `max_idle_sessions`, `created_session`. However, when `apply_rules` reconstructs a frozen scope after detecting a task type, it only copies the original five fields — the new ones are lost.
    
    Any caller that sets, for example, `temporal_relevance="recent"` or `max_idle_sessions=10` and relies on task-type auto-detection will silently get a scope with those fields zeroed out, making the P2 temporal decay feature inoperative for that code path.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  2. src/gradata/rules/rule_engine.py, line 549-556 (link)

    P1 Scope reconstruction drops all new P2.2 temporal fields

    When apply_rules rebuilds the scope to inject a detected task_type, the manual RuleScope(...) constructor only copies 5 of the 10 fields. The fields added by P2.2 — temporal_relevance, max_idle_sessions, created_session — plus the pre-existing agent_type and namespace are silently reset to their defaults.

    Any caller that sets up temporal decay (e.g. max_idle_sessions=5) and also passes a user_message without a pre-set task_type will have those settings discarded mid-pipeline.

    Use dataclasses.replace instead so all fields are preserved automatically:

    You'll need import dataclasses at the top of the file (or from dataclasses import replace and call replace(scope, task_type=detected_tt)).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/rules/rule_engine.py
    Line: 549-556
    
    Comment:
    **Scope reconstruction drops all new P2.2 temporal fields**
    
    When `apply_rules` rebuilds the scope to inject a detected `task_type`, the manual `RuleScope(...)` constructor only copies 5 of the 10 fields. The fields added by P2.2 — `temporal_relevance`, `max_idle_sessions`, `created_session` — plus the pre-existing `agent_type` and `namespace` are silently reset to their defaults.
    
    Any caller that sets up temporal decay (e.g. `max_idle_sessions=5`) and also passes a `user_message` without a pre-set `task_type` will have those settings discarded mid-pipeline.
    
    Use `dataclasses.replace` instead so all fields are preserved automatically:
    
    
    
    You'll need `import dataclasses` at the top of the file (or `from dataclasses import replace` and call `replace(scope, task_type=detected_tt)`).
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_engine.py
Line: 549-556

Comment:
**Scope reconstruction drops all new P2.2 temporal fields**

When `apply_rules` rebuilds the scope to inject a detected `task_type`, the manual `RuleScope(...)` constructor only copies 5 of the 10 fields. The fields added by P2.2 — `temporal_relevance`, `max_idle_sessions`, `created_session` — plus the pre-existing `agent_type` and `namespace` are silently reset to their defaults.

Any caller that sets up temporal decay (e.g. `max_idle_sessions=5`) and also passes a `user_message` without a pre-set `task_type` will have those settings discarded mid-pipeline.

Use `dataclasses.replace` instead so all fields are preserved automatically:

```suggestion
            scope = dataclasses.replace(scope, task_type=detected_tt)
```

You'll need `import dataclasses` at the top of the file (or `from dataclasses import replace` and call `replace(scope, task_type=detected_tt)`).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/rules/rule_tracker.py
Line: 96-105

Comment:
**`log_suppression` `emit` call unprotected — violates EventBus handler safety rule**

`log_application` (lines 59-67) wraps its `emit` in `try/except` and returns `None` on failure. `log_suppression` calls `emit` bare. If the event bus raises (DB locked, bus unavailable), the exception propagates through `apply_rules` and can abort the entire rule-selection pipeline.

`log_suppression` is called at all four filter stages (domain_disabled, relevance_threshold, assumption_invalid, conflict), so the blast radius is wide. Per Rule 4, all EventBus handler calls must be wrapped in try-except.

```suggestion
    try:
        emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
    except Exception:
        logging.getLogger("gradata.rule_tracker").warning(
            "Failed to log suppression for %s: %s", rule_id, reason
        )
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/enhancements/self_improvement.py
Line: 339-346

Comment:
**`import json` repeated inside `parse_lessons` loop body**

Lines 339, 348, and 355 each re-import `json` under different aliases (`_json_bp`, `_json`, `_json_md`) inside the same `while` loop. Python caches module imports so this doesn't cause a correctness issue, but it's unusual, clutters the function, and can confuse static analysis.

`json` is already used elsewhere in this file and is a stdlib module — it should be imported once at the top level. The same pattern appears at lines 837, 1069, 1074, and 1081. Consolidate all of these into a single top-level `import json`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/enhancements/diff_engine.py
Line: 123-140

Comment:
**`adjust_severity_by_semantics``"as-is"` floor intent is implicit**

The `"as-is"` severity label is not present in `_SEVERITY_DOWNGRADE`, so a result already at `"as-is"` (edit distance < 0.02) will never be downgraded even at similarity 1.0. This is likely intentional (can't go lower than "as-is"), but the intent is implicit since the key is simply absent rather than mapped to a floor sentinel.

Consider adding a clarifying comment:

```suggestion
_SEVERITY_DOWNGRADE: dict[str, str] = {
    "discarded": "major",
    "major": "moderate",
    "moderate": "minor",
    "minor": "as-is",
    # "as-is" intentionally omitted — already the floor, no further downgrade possible
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_tag_taxonomy.py
Line: 391-415

Comment:
**`_COGNITIVE_LOAD_MAP` recreated on every `enrich_tags` call**

`_COGNITIVE_LOAD_MAP` is a plain constant dict defined inside `enrich_tags`. It's allocated fresh on every invocation. Since `enrich_tags` is called in the hot path for every `CORRECTION` event, this allocates unnecessary objects at runtime and is harder to unit-test in isolation.

Move it to module-level alongside `ROSCH_HIERARCHY` and the other constants:

```python
_COGNITIVE_LOAD_MAP: dict[str, str] = {
    "FACTUAL": "intrinsic",
    "CONTENT": "intrinsic",
    ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix: Rosch loop variable _p shadowed _pa..." | Re-trigger Greptile

Context used:

  • Rule used - # Code Review Rules

Rule 1: Never use print() ... (source)

Oliver Le and others added 15 commits April 10, 2026 12:03
Co-Authored-By: Gradata <noreply@gradata.ai>
…ence

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Adds RULE_SUPPRESSION event emission to rule_tracker.py for tracking
why rules are filtered out during apply_rules pipeline.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Add alpha/beta_param persistence in parse_lessons/format_lessons for
round-trip serialization. Bayesian blending already wired in Task 1.

Co-Authored-By: Gradata <noreply@gradata.ai>
Add APPROVAL_PATTERNS to implicit_feedback hook for detecting user
approval signals (looks good, perfect, ship it, etc). Emit
OUTPUT_ACCEPTED event when approval detected.

Co-Authored-By: Gradata <noreply@gradata.ai>
Regex-based classifier that determines correction intent: factual,
compliance, formatting, clarity, conciseness, completeness, tone_shift,
preference, or unknown. Returns CorrectionIntent with confidence and
evidence.

Co-Authored-By: Gradata <noreply@gradata.ai>
…hanged

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…kward compat

Co-Authored-By: Gradata <noreply@gradata.ai>
…nomy

P0.1: Semantic similarity on EmbeddingClient + severity adjustment
P0.2: RULE_SUPPRESSION events at all apply_rules filter stages
P0.3: Rosch 6-category hierarchy (CONTENT/VOICE/STRUCTURE/PROCESS/CONSTRAINT/QUALITY)

Co-Authored-By: Gradata <noreply@gradata.ai>
P1.1: Beta posterior integration into confidence pipeline (Bayesian+FSRS blend)
P1.2: Positive signal detection + OUTPUT_ACCEPTED events
P1.3: Dual-layer intent classifier (surface edit + why it changed)

Co-Authored-By: Gradata <noreply@gradata.ai>
P2.1: Cross-correction recurring pattern detection
P2.2: Temporal scope with decay curves + idle session limits
P2.3: Cognitive load taxonomy tag (intrinsic/extraneous/germane)

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Adds temporal scoping to RuleScope, extends tag taxonomy with Rosch hierarchy and cognitive-load tags, introduces intent classification for corrections, Bayesian confidence blending for lessons, semantic-similarity-aware diff severity adjustments, recurring-pattern detection, approval-signal emission, and rule suppression logging; plus related tests.

Changes

Cohort / File(s) Summary
Temporal Scoping
src/gradata/_scope.py
Added temporal_relevance, max_idle_sessions, created_session to RuleScope; new temporal_decay(...); scope_from_dict() coerces integer fields from strings and defaults on conversion errors.
Tag Taxonomy & Enrichment
src/gradata/_tag_taxonomy.py
Expanded category values (VOICE,QUALITY); added cognitive_load closed taxonomy; introduced Rosch hierarchy constants and helpers (ROSCH_HIERARCHY, ROSCH_PARENTS, parent_category(), subordinates_of()); enrich_tags() auto-adds cognitive_load for CORRECTION when absent.
Lesson Type Extensions
src/gradata/_types.py
Added alpha: float and beta_param: float to Lesson dataclass (defaults 1.0).
Intent Classification
src/gradata/detection/intent_classifier.py, src/gradata/detection/__init__.py
New intent classifier module with INTENTS, CorrectionIntent dataclass, and classify_intent(correction_text, original_text=""); re-exported symbols in detection package.
Behavioral Pattern Detection
src/gradata/enhancements/behavioral_extractor.py
Added RecurringPattern dataclass and detect_recurring_patterns(lessons, min_corrections=3) to aggregate lessons by category and compute pattern confidences.
Semantic Similarity & Severity
src/gradata/enhancements/diff_engine.py, src/gradata/integrations/embeddings.py
DiffResult gains semantic_similarity; added adjust_severity_by_semantics(result, semantic_similarity, threshold=0.85) to downgrade severity when similarity meets threshold; EmbeddingClient.semantic_similarity(text_a, text_b) computes cosine similarity.
Bayesian Confidence Blending
src/gradata/enhancements/self_improvement.py
parse_lessons() reads per-lesson Beta params (alpha,beta_param); added _bayesian_blend_weight() and _bayesian_confidence() using beta posterior; update_confidence() updates alpha/beta_param and blends FSRS with Bayesian confidence; format_lessons() serializes Beta params when non-default.
Approval Signal Detection
src/gradata/hooks/implicit_feedback.py
Added approval signal category and patterns; when approvals detected emits OUTPUT_ACCEPTED with approval snippets and message preview in addition to existing IMPLICIT_FEEDBACK emission.
Rule Suppression Tracking
src/gradata/rules/rule_tracker.py, src/gradata/rules/rule_engine.py
Added VALID_SUPPRESSION_REASONS and log_suppression(...) to emit RULE_SUPPRESSION events (reason, relevance rounded to 4 decimals, competing_rules); apply_rules() optionally calls log_suppression for domain-disabled, low relevance, assumption failures, and conflict filtering.
Tests Added
tests/...
New tests covering approval signals, Bayesian confidence, behavioral extractor, diff severity adjustment, embedding similarity, intent classification, rule suppression logging, scope temporal behavior, tag taxonomy cognitive-load, and Lesson beta fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

breaking-change

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main objective: implementing nine features from the MiroFish P0-P2 roadmap, a substantial feature set across multiple priority levels.
Description check ✅ Passed The PR description comprehensively explains all 9 features from the MiroFish P0-P2 roadmap with clear categorization, test coverage metrics, and specific implementation details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/s102-mirofish-p0-p2

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Apr 10, 2026
Comment thread src/gradata/_tag_taxonomy.py Outdated
Comment on lines +78 to +105
def log_suppression(
rule_id: str,
reason: str,
relevance: float,
competing_rule_ids: list[str] | None = None,
ctx=None,
session: int | None = None,
) -> None:
"""Emit a RULE_SUPPRESSION event when a rule is filtered out.

Args:
rule_id: The rule that was suppressed.
reason: One of VALID_SUPPRESSION_REASONS.
relevance: The relevance score at time of suppression.
competing_rule_ids: Other rules that caused suppression (for conflicts).
ctx: Optional BrainContext.
session: Optional session number.
"""
from gradata._events import emit

data = {
"rule_id": rule_id,
"reason": reason,
"relevance": round(relevance, 4),
"competing_rules": competing_rule_ids or [],
}
tags = [f"rule:{rule_id}", f"suppression:{reason}"]
emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 log_suppression has no try-except around emit, unlike log_application

log_application (line 59) wraps its emit call in a try-except and returns None on failure. log_suppression emits bare — if emit raises (e.g. the event bus is unavailable or the DB is locked), the exception propagates up through apply_rules, potentially breaking the entire rule-selection pipeline.

log_suppression is called in four places inside apply_rules (domain_disabled, relevance_threshold, assumption_invalid, conflict stages), so the blast radius is wide.

Suggested change
def log_suppression(
rule_id: str,
reason: str,
relevance: float,
competing_rule_ids: list[str] | None = None,
ctx=None,
session: int | None = None,
) -> None:
"""Emit a RULE_SUPPRESSION event when a rule is filtered out.
Args:
rule_id: The rule that was suppressed.
reason: One of VALID_SUPPRESSION_REASONS.
relevance: The relevance score at time of suppression.
competing_rule_ids: Other rules that caused suppression (for conflicts).
ctx: Optional BrainContext.
session: Optional session number.
"""
from gradata._events import emit
data = {
"rule_id": rule_id,
"reason": reason,
"relevance": round(relevance, 4),
"competing_rules": competing_rule_ids or [],
}
tags = [f"rule:{rule_id}", f"suppression:{reason}"]
emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
def log_suppression(
rule_id: str,
reason: str,
relevance: float,
competing_rule_ids: list[str] | None = None,
ctx=None,
session: int | None = None,
) -> None:
try:
from gradata._events import emit
data = {
"rule_id": rule_id,
"reason": reason,
"relevance": round(relevance, 4),
"competing_rules": competing_rule_ids or [],
}
tags = [f"rule:{rule_id}", f"suppression:{reason}"]
emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
except Exception:
logging.getLogger("gradata.rule_tracker").warning(
"Failed to log suppression for %s (reason=%s)", rule_id, reason
)

Rule Used: # Code Review Rules

Rule 1: Never use print() ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_tracker.py
Line: 78-105

Comment:
**`log_suppression` has no try-except around `emit`, unlike `log_application`**

`log_application` (line 59) wraps its `emit` call in a try-except and returns `None` on failure. `log_suppression` emits bare — if `emit` raises (e.g. the event bus is unavailable or the DB is locked), the exception propagates up through `apply_rules`, potentially breaking the entire rule-selection pipeline.

`log_suppression` is called in four places inside `apply_rules` (domain_disabled, relevance_threshold, assumption_invalid, conflict stages), so the blast radius is wide.

```suggestion
def log_suppression(
    rule_id: str,
    reason: str,
    relevance: float,
    competing_rule_ids: list[str] | None = None,
    ctx=None,
    session: int | None = None,
) -> None:
    try:
        from gradata._events import emit

        data = {
            "rule_id": rule_id,
            "reason": reason,
            "relevance": round(relevance, 4),
            "competing_rules": competing_rule_ids or [],
        }
        tags = [f"rule:{rule_id}", f"suppression:{reason}"]
        emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
    except Exception:
        logging.getLogger("gradata.rule_tracker").warning(
            "Failed to log suppression for %s (reason=%s)", rule_id, reason
        )
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment thread src/gradata/_scope.py
if max_idle <= 0:
return 1.0
if sessions_since_fire <= 0:
return 1.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 import math inside function body

math is a stdlib module and can be imported at the top of the file without any cost. Placing the import inside the function body is unusual for a non-optional/lazy dependency and can confuse static analysis tools.

Move this to the top-level imports alongside the other stdlib imports.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_scope.py
Line: 71

Comment:
**`import math` inside function body**

`math` is a stdlib module and can be imported at the top of the file without any cost. Placing the import inside the function body is unusual for a non-optional/lazy dependency and can confuse static analysis tools.

Move this to the top-level imports alongside the other stdlib imports.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

reason: str,
relevance: float,
competing_rule_ids: list[str] | None = None,
ctx=None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ctx parameter missing | None type annotation

The project rule requires | None instead of a bare = None default for optional parameters. The session parameter on the same function correctly uses int | None = None, but ctx is untyped.

Suggested change
ctx=None,
ctx: object | None = None,

Rule Used: # Code Review Rules

Rule 1: Never use print() ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_tracker.py
Line: 83

Comment:
**`ctx` parameter missing `| None` type annotation**

The project rule requires `| None` instead of a bare `= None` default for optional parameters. The `session` parameter on the same function correctly uses `int | None = None`, but `ctx` is untyped.

```suggestion
    ctx: object | None = None,
```

**Rule Used:** # Code Review Rules

## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Comment on lines +149 to +153
if _HEDGE_WORDS.search(correction_text):
return CorrectionIntent(
intent="clarity",
confidence=0.60,
evidence=_HEDGE_WORDS.search(correction_text).group(), # type: ignore[union-attr]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _HEDGE_WORDS.search() called twice for the same input

The condition on line 149 already confirms the pattern matches; the inner call on line 153 repeats the search just to retrieve the match object. Cache the result instead.

Suggested change
if _HEDGE_WORDS.search(correction_text):
return CorrectionIntent(
intent="clarity",
confidence=0.60,
evidence=_HEDGE_WORDS.search(correction_text).group(), # type: ignore[union-attr]
hedge_match = _HEDGE_WORDS.search(correction_text)
if hedge_match:
return CorrectionIntent(
intent="clarity",
confidence=0.60,
evidence=hedge_match.group(),
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/detection/intent_classifier.py
Line: 149-153

Comment:
**`_HEDGE_WORDS.search()` called twice for the same input**

The condition on line 149 already confirms the pattern matches; the inner call on line 153 repeats the search just to retrieve the match object. Cache the result instead.

```suggestion
    hedge_match = _HEDGE_WORDS.search(correction_text)
    if hedge_match:
        return CorrectionIntent(
            intent="clarity",
            confidence=0.60,
            evidence=hedge_match.group(),
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 30

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/gradata/_scope.py (1)

27-52: 🧹 Nitpick | 🔵 Trivial

Consider documenting new RuleScope fields.

The class docstring documents the original five fields but omits the newer additions: agent_type, namespace, temporal_relevance, max_idle_sessions, and created_session. Adding brief descriptions would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_scope.py` around lines 27 - 52, The RuleScope dataclass
docstring only documents domain, task_type, audience, channel, and stakes;
update the class docstring for RuleScope to include brief descriptions for the
new fields agent_type, namespace, temporal_relevance, max_idle_sessions, and
created_session so their intent and acceptable values are clear (e.g.,
agent_type: agent role like "researcher"; namespace: scope tag like
"api-endpoint"; temporal_relevance: "evergreen"/"seasonal"/"recent"/"";
max_idle_sessions: int auto-suppression count; created_session: session index
when scope assigned). Ensure the wording is concise and follows the existing
Args style used in the docstring.
src/gradata/enhancements/diff_engine.py (1)

60-68: ⚠️ Potential issue | 🟡 Minor

Docstring threshold is stale after threshold change.

The docstring documents "discarded" as distance >= 0.60, but Line 89 changed the threshold to 0.80. Update the docstring to match the new threshold.

📝 Proposed fix
             * ``"as-is"``    — distance < 0.02
             * ``"minor"``    — distance < 0.10
             * ``"moderate"`` — distance < 0.30
-            * ``"major"``    — distance < 0.60
-            * ``"discarded"``— distance >= 0.60
+            * ``"major"``    — distance < 0.80
+            * ``"discarded"``— distance >= 0.80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/diff_engine.py` around lines 60 - 68, Update the
docstring in diff_engine.py to reflect the new threshold for the "discarded"
severity: change the prose that currently says `"discarded" — distance >= 0.60`
to match the implementation's threshold of 0.80 (i.e., `"discarded" — distance
>= 0.80`) so the documentation in the module (around the severity description
for compression_distance/edit_distance) matches the logic in the code (see the
severity mapping described and the implementation that uses 0.80).
src/gradata/integrations/embeddings.py (1)

146-169: ⚠️ Potential issue | 🟠 Major

Thread safety: _embedding_cache is modified without synchronization.

The cache is accessed from async handlers (async_handler=True) which may run concurrently. OrderedDict is not thread-safe for concurrent modifications. This could cause race conditions during cache eviction or insertion.

As per coding guidelines, integrations require "thread safety on shared state."

🔒 Proposed fix using threading.Lock
+import threading
+
 # Embedding cache: LRU-style OrderedDict, capped at _CACHE_MAX_SIZE.
 _embedding_cache: OrderedDict[str, list[float]] = OrderedDict()
+_cache_lock = threading.Lock()

Then wrap cache access in _embed_and_cache:

             if desc and desc not in _embedding_cache:
                 vec = get_client().embed(desc)
                 if vec:
-                    _embedding_cache[desc] = vec
-                    # Evict oldest entries when cache exceeds max size
-                    while len(_embedding_cache) > _CACHE_MAX_SIZE:
-                        _embedding_cache.popitem(last=False)
+                    with _cache_lock:
+                        _embedding_cache[desc] = vec
+                        # Evict oldest entries when cache exceeds max size
+                        while len(_embedding_cache) > _CACHE_MAX_SIZE:
+                            _embedding_cache.popitem(last=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/integrations/embeddings.py` around lines 146 - 169, The
module-level OrderedDict _embedding_cache is mutated without synchronization
inside _embed_and_cache (registered by subscribe_to_bus) leading to race
conditions; introduce a module-level threading.Lock (e.g.,
_embedding_cache_lock) and wrap all reads/writes/evictions of _embedding_cache
(contains checks, assignment _embedding_cache[desc] = vec, and the while loop
that pops items when > _CACHE_MAX_SIZE) in the lock to ensure atomicity; ensure
the lock is acquired before checking desc in _embedding_cache and held until
after eviction completes, then released.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_scope.py`:
- Line 73: Move the local "import math" out of the function and place it with
the module-level imports at the top of src/gradata/_scope.py; locate the inline
"import math" occurrence (the one shown in the diff) and remove it from inside
the function, then add a single module-level import line "import math" so
functions in this module can reference math without repeated lookup overhead.
- Around line 55-77: temporal_decay can return >1.0 if a caller passes floor >
1.0; clamp the final returned multiplier to the promised upper bound of 1.0 by
ensuring the return value is min(1.0, max(floor, round(decay, 4))). Update the
return expression in temporal_decay (which uses sessions_since_fire, max_idle,
floor, steepness and computes decay) so the result is always within [floor, 1.0]
(effectively [min(floor,1.0), 1.0] when floor > 1.0).

In `@src/gradata/_tag_taxonomy.py`:
- Around line 100-101: The loop `for _p in ROSCH_PARENTS:` is shadowing the
imported module alias `_p` (used elsewhere as `_p.BRAIN_DIR`), causing
AttributeError; rename the loop variable (e.g., `for parent in ROSCH_PARENTS:`
or `for rp in ROSCH_PARENTS:`) and update the body to `_SUB_TO_PARENT[parent] =
parent` (or `_SUB_TO_PARENT[rp] = rp`) so the imported `_p` remains available
for later uses like `_p.BRAIN_DIR`.
- Around line 391-415: The cognitive-load enrichment currently only reads
category from data ("category") and misses categories supplied as tags; update
the logic around event_type/prefixes where data, enriched, and
_COGNITIVE_LOAD_MAP are used so that if data.get("category") is empty it falls
back to extracting a "category:*" value from the tags list (or prefixes/tags
provided), normalize it to upper() before looking up in _COGNITIVE_LOAD_MAP,
compute cl = _COGNITIVE_LOAD_MAP.get(cat) and append the same enriched tag
("cognitive_load:{cl}") when found; ensure you reference the existing variables
event_type, prefixes, data, enriched, and _COGNITIVE_LOAD_MAP so the change
integrates with the current flow.
- Around line 104-111: Add "from __future__ import annotations" at the top of
the module so PEP 604 union types used in parent_category and subordinates_of
(e.g., "str | None" and "set[str]") are evaluated as forward references; update
the module header to include this import before any other imports or top-level
code to satisfy SDK type-safety requirements.

In `@src/gradata/detection/intent_classifier.py`:
- Line 52: _UPDATE: The _LIST_RE regex currently requires a '.' or ')' after the
list marker so plain markdown bullets like "- item" and "* item" aren't matched;
update the _LIST_RE pattern (symbol: _LIST_RE) to accept bare bullet markers (-,
*, +) or numbered markers (digits followed by '.' or ')') and allow one or more
whitespace characters after the marker (e.g., change to a pattern that uses an
alternation like (?:[-*+]|\d+[.)])\s+ with re.MULTILINE) so plain bullets are
classified as formatting.

In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 662-670: In detect_recurring_patterns, remove the unnecessary
string quotes used for forward references (e.g., change annotations like
list["Lesson"] and dict[str, list["Lesson"]] to list[Lesson] and dict[str,
list[Lesson]]); since from __future__ import annotations is already enabled,
update the type hints for the lessons parameter, the by_cat variable, and the
return type RecurringPattern accordingly so Ruff UP037 no longer flags them
(referencing detect_recurring_patterns, Lesson, RecurringPattern, and by_cat to
locate the changes).

In `@src/gradata/enhancements/diff_engine.py`:
- Around line 123-140: Clamp the incoming semantic_similarity to the [0.0, 1.0]
range before using it in logic and before placing it in the returned DiffResult:
in adjust_severity_by_semantics, compute a clamped_similarity = max(0.0,
min(1.0, semantic_similarity)), use clamped_similarity for the comparison
against _SEMANTIC_DOWNGRADE_THRESHOLD and for assigning semantic_similarity in
the returned DiffResult, leaving other fields unchanged; reference
adjust_severity_by_semantics, DiffResult, _SEMANTIC_DOWNGRADE_THRESHOLD, and
_SEVERITY_DOWNGRADE when making the change.

In `@src/gradata/enhancements/self_improvement.py`:
- Around line 463-474: The code truncates stored float Beta parameters by using
int(...) on lesson.alpha and lesson.beta_param; instead preserve the fractional
values when computing total_obs, successes, and trials used by beta_posterior.
Change total_obs to use lesson.alpha + lesson.beta_param - 2 (no int), pass
successes=max(0, lesson.alpha - 1) and trials=max(0, total_obs) (no int casts),
keep the early return when _bayesian_blend_weight(total_obs) == 0.0, compute
bayesian_conf from post["posterior_mean"], blend with lesson.confidence and
round the final value as before. Ensure references: total_obs, lesson.alpha,
lesson.beta_param, _bayesian_blend_weight, beta_posterior, posterior_mean,
lesson.confidence.
- Around line 338-346: The code assumes json.loads(meta_line[len("Beta
params:"):].strip()) returns a mapping and calls bp.get(...), which fails for
valid non-object JSON (list/string/number); modify the parse logic (the block
handling meta_line.startswith("Beta params:")) to verify bp is a dict (e.g.,
isinstance(bp, dict)) before using bp.get to set alpha and beta_param_val, and
otherwise fall back to the existing default values (alpha=1.0,
beta_param_val=1.0) so malformed-but-valid JSON won't raise an exception.
- Line 459: The type annotation on function _bayesian_confidence uses a quoted
forward reference ("Lesson") unnecessarily; remove the quotes so the signature
reads def _bayesian_confidence(lesson: Lesson) -> float: (keep using the already
enabled from __future__ import annotations) to satisfy Ruff UP037 and avoid
redundant stringified annotations.
- Around line 446-456: The ramp currently yields 0.0 at total_observations ==
_BAYESIAN_BLEND_MIN_OBS because the interpolation uses (total_observations -
_BAYESIAN_BLEND_MIN_OBS); change the interpolation in _bayesian_blend_weight to
start the Bayesian component at that observation by using (total_observations -
_BAYESIAN_BLEND_MIN_OBS + 1) in the numerator (keep the same denominator and
min/max guards with _BAYESIAN_BLEND_MIN_OBS and _BAYESIAN_BLEND_MAX_OBS and the
0.7 cap) so that total_observations == _BAYESIAN_BLEND_MIN_OBS yields a small >0
weight and ramps up to 0.7 by _BAYESIAN_BLEND_MAX_OBS.

In `@src/gradata/hooks/implicit_feedback.py`:
- Around line 48-56: APPROVAL_PATTERNS currently matches phrases like "perfect"
and "_detect_signals()" records approvals even when negated (e.g., "not
perfect"), causing "main()" to emit OUTPUT_ACCEPTED incorrectly; update
detection so approvals are rejected when preceded by negation. Specifically, in
_detect_signals() (the function that iterates APPROVAL_PATTERNS), after finding
a match check the immediate context (e.g., up to 3 tokens before the match) for
negation tokens like "not", "no", "never", "n't", "hardly", "barely" or negating
phrases, and only count the approval if no negation is present; alternatively,
update APPROVAL_PATTERNS to include negative-lookbehind/anchored patterns to
avoid matching when preceded by common negators. Ensure the same fix is applied
to the other approval-matching block referenced (lines ~115-125) so main() only
emits OUTPUT_ACCEPTED for truly positive, non-negated praise.

In `@src/gradata/integrations/embeddings.py`:
- Around line 80-88: The semantic_similarity method can return negative values
from cosine_similarity despite the docstring promise of [0.0, 1.0]; update
semantic_similarity (which calls self.embed and cosine_similarity) to clamp the
computed similarity into the [0.0, 1.0] range before returning (e.g., ensure
result = min(1.0, max(0.0, similarity))). Keep the early-return for empty inputs
unchanged.

In `@src/gradata/rules/rule_engine.py`:
- Line 579: The four identical inline imports of log_suppression inside
apply_rules() should be consolidated: move "from gradata.rules.rule_tracker
import log_suppression" to a single import at the top of the apply_rules()
function (or module-level if no circular import issues) and remove the
duplicated imports; then update the conditional checks to use the
already-imported symbol (e.g., "if _ctx and log_suppression:") so the logic is
unchanged but the import is not repeated.
- Around line 579-586: The log_suppression call for domain-disabled is missing
the _ctx guard and currently always emits an event; wrap the block that calls
log_suppression(rule_id=_make_rule_id(lesson), reason="domain_disabled",
relevance=0.0, ctx=_ctx) in an if _ctx: check (matching the other suppression
calls) so suppression logging only runs when _ctx is truthy.
- Line 512: The parameter `_ctx` in rule_engine.py is missing a type annotation;
update the function signature(s) that declare `_ctx` to add a precise type
(e.g., use typing.Optional[typing.Mapping[str, typing.Any]] or a specific
context class if one exists in the codebase) and add any necessary imports from
typing (Optional, Mapping, Any) or the concrete context type; ensure all usages
of `_ctx` in functions/methods that accept it reflect the new annotation so the
SDK layer maintains type safety.

In `@src/gradata/rules/rule_tracker.py`:
- Around line 78-105: The function log_suppression currently accepts any reason
string and emits a suppression tag, so validate the reason at the API boundary
by checking that the provided reason is a member of VALID_SUPPRESSION_REASONS
(import or reference VALID_SUPPRESSION_REASONS at top of the function), and if
not raise a ValueError (or TypeError) with a clear message; keep the rest of the
behavior (building data, tags, and calling emit("RULE_SUPPRESSION", ...))
unchanged so only known suppression reasons produce suppression:* tags and
analytics remain consistent.
- Around line 96-105: The log_suppression path currently calls
emit("RULE_SUPPRESSION", ...) directly which allows emit failures to propagate
and abort rule evaluation; update the log_suppression function to wrap the emit
call in a try/except block (catch Exception) so emission is best-effort: on
exception, log the error (using the existing logging facility or process logger)
and return without re-raising. Keep the same data, tags, session and ctx
arguments when calling emit and ensure the suppression telemetry remains
non-blocking by swallowing errors from emit in log_suppression.

In `@tests/test_approval_signals.py`:
- Around line 42-46: Add a regression test in test_negation_not_approval that
calls _detect_signals with negated-praise examples (e.g., "Not perfect.", "Not
exactly.", "That's not great.") and assert that the resulting signal types (from
the list comprehension using s["type"]) include "negation" but do NOT include
"approval" to ensure APPROVAL_PATTERNS does not trigger OUTPUT_ACCEPTED for
negated praise; place these additional assertions alongside the existing checks
in the test_negation_not_approval function.

In `@tests/test_bayesian_confidence.py`:
- Around line 18-21: The test uses loose inequalities for lesson.alpha and
lesson.beta_param after calling update_confidence([lesson], corrections); change
these to assert the exact expected increments (alpha and beta_param should
increase by exactly +1.0 per branch) using precise comparisons (e.g., assert
lesson.alpha == pytest.approx(<expected_value>) and assert lesson.beta_param ==
pytest.approx(<expected_value>)) so the test fails if update_confidence
double-counts; apply the same tightening for the other assertions around lines
33-36 that check alpha/beta_param to ensure they assert exact expected values
rather than just > checks.

In `@tests/test_behavioral_extractor.py`:
- Around line 24-28: The tests currently use loose assertions on the result of
detect_recurring_patterns(lessons); tighten them to assert deterministic
expected values: assert len(patterns) == 1, assert patterns[0].category ==
"TONE" (keep), and assert patterns[0].correction_count == 3 (replace >= 3), and
apply the same tightening to the assertions around lines 57-58 that check
pattern count and counts so they assert exact expected values rather than
inequalities or bare truthiness; locate these checks by the
detect_recurring_patterns call and the patterns[index] references in the test
file and update them accordingly.

In `@tests/test_diff_engine.py`:
- Around line 8-52: Add two unit tests to TestSemanticSeverityAdjustment to
cover the exact boundary and the minor→as-is path: one test (e.g.,
test_at_threshold_boundary_downgrades) that creates a DiffResult with
severity="moderate" and calls adjust_severity_by_semantics(result,
semantic_similarity=0.85) expecting severity "minor", and another test (e.g.,
test_minor_downgrades_to_as_is) that creates a DiffResult with severity="minor"
and low edit/compression distances and calls
adjust_severity_by_semantics(result, semantic_similarity=0.90) expecting
severity "as-is"; put them alongside the existing tests so
adjust_severity_by_semantics and DiffResult are exercised for the boundary and
minor→as-is downgrade paths.
- Around line 21-30: The test test_low_similarity_no_change doesn't assert that
semantic_similarity was recorded on the returned DiffResult; update the test to
call adjust_severity_by_semantics(result, semantic_similarity=0.50) and assert
that adjusted.semantic_similarity == 0.50 (in addition to the existing severity
assertion) so the DiffResult produced by adjust_severity_by_semantics preserves
the semantic_similarity field; reference the DiffResult instance and the
adjust_severity_by_semantics function in your change.

In `@tests/test_embeddings.py`:
- Around line 22-26: Refactor the test_empty_text_returns_zero to use
pytest.mark.parametrize: replace the three separate assertions with a single
parametrized test that iterates over pairs of inputs (e.g., ("", "hello"),
("hello", ""), ("", "")) and asserts EmbeddingClient().semantic_similarity(a, b)
== 0.0; keep the test name (test_empty_text_returns_zero) and the call to
EmbeddingClient.semantic_similarity to locate the code easily.

In `@tests/test_intent_classifier.py`:
- Around line 48-50: The test test_formatting_list currently only covers
numbered lists; extend it to also assert classify_intent correctly returns
intent "formatting" for bullet markdown variants by adding inputs like "- First
item\n- Second item" and "* First item\n* Second item" (or parametrize the test)
and verify result.intent == "formatting" for each; update the
test_formatting_list function to call classify_intent with these additional
strings (or iterate over a list of examples) so bullet-style markdown is
covered.

In `@tests/test_rule_tracker_suppression.py`:
- Around line 60-65: Extend test_valid_suppression_reasons_complete to include a
negative-path check: call log_suppression(...) with reason="typo" (using the
same parameters/context as other tests) and assert the call fails for unknown
reasons (e.g., raises ValueError or returns a falsy result depending on the
API); this complements the whitelist check of VALID_SUPPRESSION_REASONS and
prevents silent acceptance of unknown reasons.

In `@tests/test_scope.py`:
- Around line 21-37: Replace exact float equality checks in the tests that call
temporal_decay with pytest.approx to avoid brittle comparisons: update
test_decay_fresh_rule (assert temporal_decay(...) == 1.0),
test_decay_beyond_limit (assert temporal_decay(...) == 0.05), and
test_decay_zero_max_idle (assert temporal_decay(...) == 1.0) to use
pytest.approx(1.0) or pytest.approx(0.05) respectively; ensure pytest is
imported in tests/test_scope.py if not already.

In `@tests/test_tag_taxonomy.py`:
- Around line 13-20: The tests currently use truthy checks; update
test_validate_valid to assert explicit values: call
validate_tag("cognitive_load:intrinsic"),
validate_tag("cognitive_load:extraneous"), and
validate_tag("cognitive_load:germane") and for each assert that valid is True
(use "valid is True") and assert the exact expected msg returned by validate_tag
(e.g., "msg is None" or the specific success message your validate_tag
implementation returns) so failures are diagnostic; refer to the validate_tag
function to determine the correct expected msg and replace the loose "assert
valid" lines accordingly.
- Around line 26-37: The three nearly identical tests that call enrich_tags with
event_type="CORRECTION" and different data["category"] values should be combined
into a single parametrized test: create a pytest.mark.parametrize on (category,
expected_tag) covering ("FACTUAL","cognitive_load:intrinsic"),
("STYLE","cognitive_load:extraneous"), ("PROCESS","cognitive_load:germane") and
replace the three functions (test_enrich_factual_is_intrinsic,
test_enrich_style_is_extraneous, test_enrich_process_is_germane) with one
function (e.g., test_enrich_cognitive_load_mapping) that calls enrich_tags([],
event_type="CORRECTION", data={"category": category}) and asserts expected_tag
in tags; this keeps coverage while reducing duplication and follows the tests/**
guideline.

---

Outside diff comments:
In `@src/gradata/_scope.py`:
- Around line 27-52: The RuleScope dataclass docstring only documents domain,
task_type, audience, channel, and stakes; update the class docstring for
RuleScope to include brief descriptions for the new fields agent_type,
namespace, temporal_relevance, max_idle_sessions, and created_session so their
intent and acceptable values are clear (e.g., agent_type: agent role like
"researcher"; namespace: scope tag like "api-endpoint"; temporal_relevance:
"evergreen"/"seasonal"/"recent"/""; max_idle_sessions: int auto-suppression
count; created_session: session index when scope assigned). Ensure the wording
is concise and follows the existing Args style used in the docstring.

In `@src/gradata/enhancements/diff_engine.py`:
- Around line 60-68: Update the docstring in diff_engine.py to reflect the new
threshold for the "discarded" severity: change the prose that currently says
`"discarded" — distance >= 0.60` to match the implementation's threshold of 0.80
(i.e., `"discarded" — distance >= 0.80`) so the documentation in the module
(around the severity description for compression_distance/edit_distance) matches
the logic in the code (see the severity mapping described and the implementation
that uses 0.80).

In `@src/gradata/integrations/embeddings.py`:
- Around line 146-169: The module-level OrderedDict _embedding_cache is mutated
without synchronization inside _embed_and_cache (registered by subscribe_to_bus)
leading to race conditions; introduce a module-level threading.Lock (e.g.,
_embedding_cache_lock) and wrap all reads/writes/evictions of _embedding_cache
(contains checks, assignment _embedding_cache[desc] = vec, and the while loop
that pops items when > _CACHE_MAX_SIZE) in the lock to ensure atomicity; ensure
the lock is acquired before checking desc in _embedding_cache and held until
after eviction completes, then released.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7f314496-ca4c-4111-8b3e-434739c13121

📥 Commits

Reviewing files that changed from the base of the PR and between bfcdb35 and 6c2a50a.

📒 Files selected for processing (22)
  • src/gradata/_scope.py
  • src/gradata/_tag_taxonomy.py
  • src/gradata/_types.py
  • src/gradata/detection/__init__.py
  • src/gradata/detection/intent_classifier.py
  • src/gradata/enhancements/behavioral_extractor.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/hooks/implicit_feedback.py
  • src/gradata/integrations/embeddings.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/rules/rule_tracker.py
  • tests/test_approval_signals.py
  • tests/test_bayesian_confidence.py
  • tests/test_behavioral_extractor.py
  • tests/test_diff_engine.py
  • tests/test_embeddings.py
  • tests/test_intent_classifier.py
  • tests/test_rule_tracker_suppression.py
  • tests/test_scope.py
  • tests/test_tag_taxonomy.py
  • tests/test_types.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/integrations/embeddings.py
  • src/gradata/_types.py
  • src/gradata/detection/__init__.py
  • src/gradata/hooks/implicit_feedback.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/rules/rule_tracker.py
  • src/gradata/enhancements/behavioral_extractor.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/_scope.py
  • src/gradata/detection/intent_classifier.py
  • src/gradata/_tag_taxonomy.py
  • src/gradata/enhancements/self_improvement.py
src/gradata/integrations/**

⚙️ CodeRabbit configuration file

src/gradata/integrations/**: Framework integrations and nervous system modules. Check for: SSRF protection on any URL from env vars,
thread safety on shared state, embedding cache must be bounded, event bus handlers must never raise.

Files:

  • src/gradata/integrations/embeddings.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_scope.py
  • tests/test_embeddings.py
  • tests/test_rule_tracker_suppression.py
  • tests/test_diff_engine.py
  • tests/test_behavioral_extractor.py
  • tests/test_approval_signals.py
  • tests/test_bayesian_confidence.py
  • tests/test_intent_classifier.py
  • tests/test_tag_taxonomy.py
  • tests/test_types.py
src/gradata/hooks/**

⚙️ CodeRabbit configuration file

src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).

Files:

  • src/gradata/hooks/implicit_feedback.py
🪛 GitHub Actions: SDK CI
src/gradata/enhancements/behavioral_extractor.py

[error] 663-663: ruff check failed (UP037): Remove quotes from type annotation lessons: list["Lesson"] (fixable with --fix).


[error] 670-670: ruff check failed (UP037): Remove quotes from type annotation by_cat: dict[str, list["Lesson"]] (fixable with --fix).

src/gradata/_tag_taxonomy.py

[error] 100-101: ruff check failed (F402): Import _p from line 15 shadowed by loop variable at src/gradata/_tag_taxonomy.py:100.

src/gradata/enhancements/self_improvement.py

[error] 459-459: ruff check failed (UP037): Remove quotes from type annotation _bayesian_confidence(lesson: "Lesson") (fixable with --fix).

🔇 Additional comments (7)
src/gradata/rules/rule_engine.py (1)

122-146: LGTM — Formatting changes.

The whitespace and multi-line formatting changes across signal lists, string assembly, RuleScope construction, and sorting keys improve readability without altering logic.

Also applies to: 313-316, 357-361, 377-383, 421-430, 440-452, 665-676, 708-718, 832-844

src/gradata/_scope.py (2)

325-337: LGTM — Robust int coercion in scope_from_dict.

The defensive handling of stringified int fields with try/except and fallback to default value 0 ensures backward compatibility with JSON/SQLite serialization.


85-114: LGTM — Formatting changes in classification constants.

The whitespace adjustments in keyword lists are consistent and do not alter behavior.

src/gradata/integrations/embeddings.py (1)

171-181: LGTM!

The reformatted bus.on() registrations improve readability with no semantic change.

tests/test_embeddings.py (1)

8-20: LGTM!

Good coverage of key scenarios: identical texts returning ~1.0 with appropriate tolerance, and semantically different texts producing lower similarity. Proper use of pytest.approx for floating-point comparison.

src/gradata/enhancements/diff_engine.py (2)

78-78: LGTM!

The optional semantic_similarity field with None default maintains backward compatibility with existing DiffResult consumers.


306-306: LGTM!

Cosmetic reformatting of SequenceMatcher call with no functional change.

Comment thread src/gradata/_scope.py
Comment on lines +55 to +77
def temporal_decay(
sessions_since_fire: int,
max_idle: int,
floor: float = 0.05,
steepness: float = 3.0,
) -> float:
"""Compute temporal decay multiplier for rule confidence.

Uses exponential decay: exp(-steepness * ratio^2) where
ratio = sessions_since_fire / max_idle.

Returns decay multiplier in [floor, 1.0]. If max_idle=0, returns 1.0 (evergreen).
"""
if max_idle <= 0:
return 1.0
if sessions_since_fire <= 0:
return 1.0

import math

ratio = sessions_since_fire / max_idle
decay = math.exp(-steepness * ratio * ratio)
return max(floor, round(decay, 4))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add upper bound clamp to temporal_decay return value.

The docstring promises a return value in [floor, 1.0], but if floor > 1.0 is passed (invalid but possible), the function returns > 1.0. This could cause downstream issues if the multiplier is used with confidence values that must stay in [0.0, 1.0].

Proposed defensive fix
     ratio = sessions_since_fire / max_idle
     decay = math.exp(-steepness * ratio * ratio)
-    return max(floor, round(decay, 4))
+    return min(1.0, max(floor, round(decay, 4)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_scope.py` around lines 55 - 77, temporal_decay can return >1.0
if a caller passes floor > 1.0; clamp the final returned multiplier to the
promised upper bound of 1.0 by ensuring the return value is min(1.0, max(floor,
round(decay, 4))). Update the return expression in temporal_decay (which uses
sessions_since_fire, max_idle, floor, steepness and computes decay) so the
result is always within [floor, 1.0] (effectively [min(floor,1.0), 1.0] when
floor > 1.0).

Comment thread src/gradata/_scope.py
if sessions_since_fire <= 0:
return 1.0

import math
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider moving import math to module level.

The math import inside the function incurs a small lookup overhead on each call. Since math is a standard library module with no circular dependency concerns, moving it to the module-level imports improves clarity and efficiency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_scope.py` at line 73, Move the local "import math" out of the
function and place it with the module-level imports at the top of
src/gradata/_scope.py; locate the inline "import math" occurrence (the one shown
in the diff) and remove it from inside the function, then add a single
module-level import line "import math" so functions in this module can reference
math without repeated lookup overhead.

Comment thread src/gradata/_tag_taxonomy.py Outdated
Comment on lines +104 to +111
def parent_category(category: str) -> str | None:
"""Return the Rosch superordinate parent for a category, or None."""
return _SUB_TO_PARENT.get(category.upper())


def subordinates_of(parent: str) -> set[str]:
"""Return the set of subordinate categories under a Rosch parent."""
return set(ROSCH_HIERARCHY.get(parent.upper(), set()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify requirement presence in this changed SDK module
rg -n '^from __future__ import annotations$' src/gradata/_tag_taxonomy.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Check the file exists and view lines around 104-111
wc -l src/gradata/_tag_taxonomy.py
echo "---"
sed -n '1,20p' src/gradata/_tag_taxonomy.py
echo "---"
sed -n '100,115p' src/gradata/_tag_taxonomy.py

Repository: Gradata/gradata

Length of output: 1372


🏁 Script executed:

#!/bin/bash
# Check for print() statements in the file
echo "=== print() statements ==="
rg -n 'print\(' src/gradata/_tag_taxonomy.py

echo "=== Hardcoded paths or path issues ==="
rg -n "(\.json|\.py|/|\\\\)" src/gradata/_tag_taxonomy.py | head -20

echo "=== Count of type annotations in file ==="
rg -c '\w+\s*\|\s*\w+|str \|' src/gradata/_tag_taxonomy.py

echo "=== Other potential __future__ imports ==="
rg -n '^from __future__' src/gradata/_tag_taxonomy.py

Repository: Gradata/gradata

Length of output: 1159


Add from __future__ import annotations for SDK type-safety compliance.

The PEP 604 union syntax (e.g., str | None, set[str]) at lines 104–111 requires the future-annotations import. Per coding guidelines, src/gradata/**/*.py must include this import for type safety.

Minimal compliant patch
+from __future__ import annotations
+
 import json
 import re
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_tag_taxonomy.py` around lines 104 - 111, Add "from __future__
import annotations" at the top of the module so PEP 604 union types used in
parent_category and subordinates_of (e.g., "str | None" and "set[str]") are
evaluated as forward references; update the module header to include this import
before any other imports or top-level code to satisfy SDK type-safety
requirements.

Comment on lines +391 to +415
if event_type == "CORRECTION" and "cognitive_load" not in prefixes:
cat = data.get("category", "").upper()
_COGNITIVE_LOAD_MAP = {
"FACTUAL": "intrinsic",
"CONTENT": "intrinsic",
"DATA_INTEGRITY": "intrinsic",
"ENTITIES": "intrinsic",
"CONTEXT": "intrinsic",
"STYLE": "extraneous",
"TONE": "extraneous",
"COMMUNICATION": "extraneous",
"PRESENTATION": "extraneous",
"COST": "extraneous",
"PROCESS": "germane",
"DRAFTING": "germane",
"ACCURACY": "germane",
"THOROUGHNESS": "germane",
"ARCHITECTURE": "germane",
"CONSTRAINT": "germane",
"STRATEGY": "germane",
"POSITIONING": "germane",
}
cl = _COGNITIVE_LOAD_MAP.get(cat)
if cl:
enriched.append(f"cognitive_load:{cl}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cognitive-load enrichment ignores existing category:* tags.

On Line 392, inference only reads data["category"]. If category is supplied in tags (common in event pipelines) and omitted from data, no cognitive_load:* tag is derived.

Derive category from tags when data is missing
     if event_type == "CORRECTION" and "cognitive_load" not in prefixes:
-        cat = data.get("category", "").upper()
+        cat = data.get("category", "")
+        if not cat:
+            category_tag = next((t for t in enriched if t.startswith("category:")), "")
+            cat = category_tag.split(":", 1)[1] if category_tag else ""
+        cat = cat.upper()
         _COGNITIVE_LOAD_MAP = {
             "FACTUAL": "intrinsic",
             "CONTENT": "intrinsic",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_tag_taxonomy.py` around lines 391 - 415, The cognitive-load
enrichment currently only reads category from data ("category") and misses
categories supplied as tags; update the logic around event_type/prefixes where
data, enriched, and _COGNITIVE_LOAD_MAP are used so that if data.get("category")
is empty it falls back to extracting a "category:*" value from the tags list (or
prefixes/tags provided), normalize it to upper() before looking up in
_COGNITIVE_LOAD_MAP, compute cl = _COGNITIVE_LOAD_MAP.get(cat) and append the
same enriched tag ("cognitive_load:{cl}") when found; ensure you reference the
existing variables event_type, prefixes, data, enriched, and _COGNITIVE_LOAD_MAP
so the change integrates with the current flow.

Comment on lines +48 to +50
def test_formatting_list(self):
result = classify_intent("1. First item\n2. Second item")
assert result.intent == "formatting"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cover markdown bullets, not only numbered lists.

test_formatting_list() only exercises 1. ... input. Add - item and * item cases too, otherwise formatting regressions around bullet detection will slip through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_intent_classifier.py` around lines 48 - 50, The test
test_formatting_list currently only covers numbered lists; extend it to also
assert classify_intent correctly returns intent "formatting" for bullet markdown
variants by adding inputs like "- First item\n- Second item" and "* First
item\n* Second item" (or parametrize the test) and verify result.intent ==
"formatting" for each; update the test_formatting_list function to call
classify_intent with these additional strings (or iterate over a list of
examples) so bullet-style markdown is covered.

Comment on lines +60 to +65
def test_valid_suppression_reasons_complete(self):
assert "relevance_threshold" in VALID_SUPPRESSION_REASONS
assert "domain_disabled" in VALID_SUPPRESSION_REASONS
assert "conflict" in VALID_SUPPRESSION_REASONS
assert "assumption_invalid" in VALID_SUPPRESSION_REASONS
assert len(VALID_SUPPRESSION_REASONS) == 4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test invalid suppression reasons, not just the whitelist contents.

test_valid_suppression_reasons_complete() only checks the constant definition. Please add a negative-path assertion for log_suppression(..., reason="typo") once the API rejects unknown reasons, otherwise suppression:typo telemetry can regress silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rule_tracker_suppression.py` around lines 60 - 65, Extend
test_valid_suppression_reasons_complete to include a negative-path check: call
log_suppression(...) with reason="typo" (using the same parameters/context as
other tests) and assert the call fails for unknown reasons (e.g., raises
ValueError or returns a falsy result depending on the API); this complements the
whitelist check of VALID_SUPPRESSION_REASONS and prevents silent acceptance of
unknown reasons.

Comment thread tests/test_scope.py
Comment on lines +21 to +37
def test_decay_fresh_rule(self):
assert temporal_decay(sessions_since_fire=0, max_idle=20) == 1.0

def test_decay_midpoint(self):
factor = temporal_decay(sessions_since_fire=10, max_idle=20)
assert 0.4 < factor < 0.8

def test_decay_at_limit(self):
factor = temporal_decay(sessions_since_fire=20, max_idle=20)
assert factor < 0.15

def test_decay_beyond_limit(self):
factor = temporal_decay(sessions_since_fire=30, max_idle=20)
assert factor == 0.05

def test_decay_zero_max_idle(self):
assert temporal_decay(sessions_since_fire=100, max_idle=0) == 1.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the test file
find . -name "test_scope.py" -type f

Repository: Gradata/gradata

Length of output: 80


🏁 Script executed:

# Read the specific test file to verify the assertions at lines 21-37
cat -n tests/test_scope.py | sed -n '15,40p'

Repository: Gradata/gradata

Length of output: 1247


🏁 Script executed:

# Let's also check if pytest.approx is already imported or used elsewhere in the test file
grep -n "pytest.approx\|from pytest\|import pytest" tests/test_scope.py

Repository: Gradata/gradata

Length of output: 41


Use pytest.approx for exact equality assertions on float values.

Lines 22, 34, and 37 use exact equality (==) to compare floating-point results from temporal_decay. Switch these to pytest.approx(...) to comply with test guidelines and avoid brittle assertions that fail on harmless rounding differences. Lines 26 and 30 already follow best practices with range and inequality checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_scope.py` around lines 21 - 37, Replace exact float equality
checks in the tests that call temporal_decay with pytest.approx to avoid brittle
comparisons: update test_decay_fresh_rule (assert temporal_decay(...) == 1.0),
test_decay_beyond_limit (assert temporal_decay(...) == 0.05), and
test_decay_zero_max_idle (assert temporal_decay(...) == 1.0) to use
pytest.approx(1.0) or pytest.approx(0.05) respectively; ensure pytest is
imported in tests/test_scope.py if not already.

Comment on lines +13 to +20
def test_validate_valid(self):
valid, msg = validate_tag("cognitive_load:intrinsic")
assert valid
valid, msg = validate_tag("cognitive_load:extraneous")
assert valid
valid, msg = validate_tag("cognitive_load:germane")
assert valid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit assertion values in test_validate_valid.

On Line 15, Line 17, and Line 19, assert valid checks only truthiness. Assert exact outputs (valid is True and expected msg) to make failures diagnostic.

Proposed test tightening
     def test_validate_valid(self):
         valid, msg = validate_tag("cognitive_load:intrinsic")
-        assert valid
+        assert valid is True
+        assert msg == "ok"
         valid, msg = validate_tag("cognitive_load:extraneous")
-        assert valid
+        assert valid is True
+        assert msg == "ok"
         valid, msg = validate_tag("cognitive_load:germane")
-        assert valid
+        assert valid is True
+        assert msg == "ok"
As per coding guidelines, `tests/**`: "assertions check specific values not just truthiness".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_validate_valid(self):
valid, msg = validate_tag("cognitive_load:intrinsic")
assert valid
valid, msg = validate_tag("cognitive_load:extraneous")
assert valid
valid, msg = validate_tag("cognitive_load:germane")
assert valid
def test_validate_valid(self):
valid, msg = validate_tag("cognitive_load:intrinsic")
assert valid is True
assert msg == "ok"
valid, msg = validate_tag("cognitive_load:extraneous")
assert valid is True
assert msg == "ok"
valid, msg = validate_tag("cognitive_load:germane")
assert valid is True
assert msg == "ok"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_tag_taxonomy.py` around lines 13 - 20, The tests currently use
truthy checks; update test_validate_valid to assert explicit values: call
validate_tag("cognitive_load:intrinsic"),
validate_tag("cognitive_load:extraneous"), and
validate_tag("cognitive_load:germane") and for each assert that valid is True
(use "valid is True") and assert the exact expected msg returned by validate_tag
(e.g., "msg is None" or the specific success message your validate_tag
implementation returns) so failures are diagnostic; refer to the validate_tag
function to determine the correct expected msg and replace the loose "assert
valid" lines accordingly.

Comment on lines +26 to +37
def test_enrich_factual_is_intrinsic(self):
tags = enrich_tags([], event_type="CORRECTION", data={"category": "FACTUAL"})
assert "cognitive_load:intrinsic" in tags

def test_enrich_style_is_extraneous(self):
tags = enrich_tags([], event_type="CORRECTION", data={"category": "STYLE"})
assert "cognitive_load:extraneous" in tags

def test_enrich_process_is_germane(self):
tags = enrich_tags([], event_type="CORRECTION", data={"category": "PROCESS"})
assert "cognitive_load:germane" in tags

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Parametrize cognitive-load enrichment mapping tests.

Lines 26-37 duplicate the same boundary-style assertion pattern. Parametrizing these cases keeps coverage equivalent and improves maintainability.

Parametrized variant
+import pytest
+
 class TestCognitiveLoadTaxonomy:
@@
-    def test_enrich_factual_is_intrinsic(self):
-        tags = enrich_tags([], event_type="CORRECTION", data={"category": "FACTUAL"})
-        assert "cognitive_load:intrinsic" in tags
-
-    def test_enrich_style_is_extraneous(self):
-        tags = enrich_tags([], event_type="CORRECTION", data={"category": "STYLE"})
-        assert "cognitive_load:extraneous" in tags
-
-    def test_enrich_process_is_germane(self):
-        tags = enrich_tags([], event_type="CORRECTION", data={"category": "PROCESS"})
-        assert "cognitive_load:germane" in tags
+    `@pytest.mark.parametrize`(
+        "category,expected",
+        [
+            ("FACTUAL", "cognitive_load:intrinsic"),
+            ("STYLE", "cognitive_load:extraneous"),
+            ("PROCESS", "cognitive_load:germane"),
+        ],
+    )
+    def test_enrich_category_mapping(self, category, expected):
+        tags = enrich_tags([], event_type="CORRECTION", data={"category": category})
+        assert expected in tags
As per coding guidelines, `tests/**`: "parametrized tests preferred for boundary conditions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_tag_taxonomy.py` around lines 26 - 37, The three nearly identical
tests that call enrich_tags with event_type="CORRECTION" and different
data["category"] values should be combined into a single parametrized test:
create a pytest.mark.parametrize on (category, expected_tag) covering
("FACTUAL","cognitive_load:intrinsic"), ("STYLE","cognitive_load:extraneous"),
("PROCESS","cognitive_load:germane") and replace the three functions
(test_enrich_factual_is_intrinsic, test_enrich_style_is_extraneous,
test_enrich_process_is_germane) with one function (e.g.,
test_enrich_cognitive_load_mapping) that calls enrich_tags([],
event_type="CORRECTION", data={"category": category}) and asserts expected_tag
in tags; this keeps coverage while reducing duplication and follows the tests/**
guideline.

…eload

The module-level `for _p in ROSCH_PARENTS` overwrote the `import
gradata._paths as _p` alias, causing `_load_taxonomy()` to fail
silently (hasattr check on a string). Renamed loop var to `_rp`.

Co-Authored-By: Gradata <noreply@gradata.ai>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/gradata/_tag_taxonomy.py (2)

391-415: ⚠️ Potential issue | 🟠 Major

Cognitive-load enrichment still ignores existing category:* tags.

When category is supplied in input tags but omitted from data, no cognitive_load:* tag is derived. For example, enrich_tags(["category:FACTUAL"], "CORRECTION", {}) will not add cognitive_load:intrinsic.

Proposed fix: fall back to extracting category from tags
     if event_type == "CORRECTION" and "cognitive_load" not in prefixes:
         cat = data.get("category", "").upper()
+        if not cat:
+            category_tag = next((t for t in enriched if t.startswith("category:")), "")
+            cat = category_tag.split(":", 1)[1].upper() if category_tag else ""
         _COGNITIVE_LOAD_MAP = {

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_tag_taxonomy.py` around lines 391 - 415, The cognitive-load
enrichment uses data.get("category") only, so when category is present as a tag
(e.g. "category:FACTUAL") it is ignored; modify the block that computes cat
(currently using data.get("category", "").upper()) to fallback to parsing the
input tags for a "category:<VALUE>" entry (split or scan the tags list,
case-insensitive) when data lacks category, then upper() that value before
looking it up in _COGNITIVE_LOAD_MAP and appending the resulting
"cognitive_load:<...>" to enriched (references: event_type, prefixes, data,
tags/enriched, _COGNITIVE_LOAD_MAP).

12-15: ⚠️ Potential issue | 🟠 Major

Add from __future__ import annotations for SDK type-safety compliance.

The file uses PEP 604 union syntax (str | None, dict | None, set[str]) at lines 85, 104, 109, 332, 348, but lacks the required future-annotations import. Per coding guidelines, src/gradata/**/*.py must include this import.

Proposed fix
+from __future__ import annotations
+
 import json
 import re

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_tag_taxonomy.py` around lines 12 - 15, Add the line "from
__future__ import annotations" at the top of src/gradata/_tag_taxonomy.py
(immediately after any module docstring and before other imports) so PEP 604
union types like "str | None", "dict | None", and "set[str]" used throughout the
file are valid; no other code changes needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/gradata/_tag_taxonomy.py`:
- Around line 391-415: The cognitive-load enrichment uses data.get("category")
only, so when category is present as a tag (e.g. "category:FACTUAL") it is
ignored; modify the block that computes cat (currently using
data.get("category", "").upper()) to fallback to parsing the input tags for a
"category:<VALUE>" entry (split or scan the tags list, case-insensitive) when
data lacks category, then upper() that value before looking it up in
_COGNITIVE_LOAD_MAP and appending the resulting "cognitive_load:<...>" to
enriched (references: event_type, prefixes, data, tags/enriched,
_COGNITIVE_LOAD_MAP).
- Around line 12-15: Add the line "from __future__ import annotations" at the
top of src/gradata/_tag_taxonomy.py (immediately after any module docstring and
before other imports) so PEP 604 union types like "str | None", "dict | None",
and "set[str]" used throughout the file are valid; no other code changes needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 856da7ac-0af5-4f4d-99c6-a71aa4a7dbd4

📥 Commits

Reviewing files that changed from the base of the PR and between 6c2a50a and 55a4ea3.

📒 Files selected for processing (1)
  • src/gradata/_tag_taxonomy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/_tag_taxonomy.py
🔇 Additional comments (7)
src/gradata/_tag_taxonomy.py (7)

97-111: LGTM — Loop variable shadowing is fixed.

The loop variables now correctly use _parent, _subs, _sub, and _rp, avoiding the previously flagged shadowing of the _p module alias. The Rosch hierarchy helpers parent_category() and subordinates_of() are clean implementations.


71-76: LGTM — New cognitive_load taxonomy.

The closed taxonomy for cognitive load types aligns with Sweller's Cognitive Load Theory (intrinsic, extraneous, germane) and is correctly configured as optional.


32-51: LGTM — Category values extended with Rosch parents.

The addition of VOICE and QUALITY as valid category values is consistent with their definition in ROSCH_HIERARCHY. Formatting improvements aid readability.


331-333: LGTM — Function signatures are well-typed.

The validate_tags and enrich_tags signatures are clear and appropriately typed. The PEP 604 union syntax concern is addressed in the earlier comment about missing from __future__ import annotations.

Also applies to: 347-349


133-144: LGTM — Formatting improvements.

Value sets and dictionaries reformatted to one item per line. This improves readability and produces cleaner diffs for future changes.

Also applies to: 150-162, 168-178, 184-184, 190-190, 196-208, 214-214, 252-254, 363-372, 382-388


228-230: LGTM — Defensive attribute checks.

The hasattr(_p, "BRAIN_DIR") guards prevent AttributeError when the module attribute is not initialized. This is appropriate defensive coding for optional configuration.

Also applies to: 288-288, 296-296


85-92: LGTM — Rosch hierarchy structure is consistent.

The subordinate categories map correctly to either core (_CORE_TAXONOMY) or domain (_DOMAIN_CATEGORIES) values. The explicit allowance for subordinates under multiple parents (e.g., ACCURACY under both PROCESS and QUALITY) is well-documented at lines 82-83.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant