feat: S102 — MiroFish P0-P2 roadmap implementation (9 features)#24
feat: S102 — MiroFish P0-P2 roadmap implementation (9 features)#24
Conversation
Co-Authored-By: Gradata <noreply@gradata.ai>
…ence Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Adds RULE_SUPPRESSION event emission to rule_tracker.py for tracking why rules are filtered out during apply_rules pipeline. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Add alpha/beta_param persistence in parse_lessons/format_lessons for round-trip serialization. Bayesian blending already wired in Task 1. Co-Authored-By: Gradata <noreply@gradata.ai>
Add APPROVAL_PATTERNS to implicit_feedback hook for detecting user approval signals (looks good, perfect, ship it, etc). Emit OUTPUT_ACCEPTED event when approval detected. Co-Authored-By: Gradata <noreply@gradata.ai>
Regex-based classifier that determines correction intent: factual, compliance, formatting, clarity, conciseness, completeness, tone_shift, preference, or unknown. Returns CorrectionIntent with confidence and evidence. Co-Authored-By: Gradata <noreply@gradata.ai>
…hanged Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…kward compat Co-Authored-By: Gradata <noreply@gradata.ai>
…nomy P0.1: Semantic similarity on EmbeddingClient + severity adjustment P0.2: RULE_SUPPRESSION events at all apply_rules filter stages P0.3: Rosch 6-category hierarchy (CONTENT/VOICE/STRUCTURE/PROCESS/CONSTRAINT/QUALITY) Co-Authored-By: Gradata <noreply@gradata.ai>
P1.1: Beta posterior integration into confidence pipeline (Bayesian+FSRS blend) P1.2: Positive signal detection + OUTPUT_ACCEPTED events P1.3: Dual-layer intent classifier (surface edit + why it changed) Co-Authored-By: Gradata <noreply@gradata.ai>
P2.1: Cross-correction recurring pattern detection P2.2: Temporal scope with decay curves + idle session limits P2.3: Cognitive load taxonomy tag (intrinsic/extraneous/germane) Co-Authored-By: Gradata <noreply@gradata.ai>
📝 WalkthroughWalkthroughAdds temporal scoping to RuleScope, extends tag taxonomy with Rosch hierarchy and cognitive-load tags, introduces intent classification for corrections, Bayesian confidence blending for lessons, semantic-similarity-aware diff severity adjustments, recurring-pattern detection, approval-signal emission, and rule suppression logging; plus related tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
| def log_suppression( | ||
| rule_id: str, | ||
| reason: str, | ||
| relevance: float, | ||
| competing_rule_ids: list[str] | None = None, | ||
| ctx=None, | ||
| session: int | None = None, | ||
| ) -> None: | ||
| """Emit a RULE_SUPPRESSION event when a rule is filtered out. | ||
|
|
||
| Args: | ||
| rule_id: The rule that was suppressed. | ||
| reason: One of VALID_SUPPRESSION_REASONS. | ||
| relevance: The relevance score at time of suppression. | ||
| competing_rule_ids: Other rules that caused suppression (for conflicts). | ||
| ctx: Optional BrainContext. | ||
| session: Optional session number. | ||
| """ | ||
| from gradata._events import emit | ||
|
|
||
| data = { | ||
| "rule_id": rule_id, | ||
| "reason": reason, | ||
| "relevance": round(relevance, 4), | ||
| "competing_rules": competing_rule_ids or [], | ||
| } | ||
| tags = [f"rule:{rule_id}", f"suppression:{reason}"] | ||
| emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx) |
There was a problem hiding this comment.
log_suppression has no try-except around emit, unlike log_application
log_application (line 59) wraps its emit call in a try-except and returns None on failure. log_suppression emits bare — if emit raises (e.g. the event bus is unavailable or the DB is locked), the exception propagates up through apply_rules, potentially breaking the entire rule-selection pipeline.
log_suppression is called in four places inside apply_rules (domain_disabled, relevance_threshold, assumption_invalid, conflict stages), so the blast radius is wide.
| def log_suppression( | |
| rule_id: str, | |
| reason: str, | |
| relevance: float, | |
| competing_rule_ids: list[str] | None = None, | |
| ctx=None, | |
| session: int | None = None, | |
| ) -> None: | |
| """Emit a RULE_SUPPRESSION event when a rule is filtered out. | |
| Args: | |
| rule_id: The rule that was suppressed. | |
| reason: One of VALID_SUPPRESSION_REASONS. | |
| relevance: The relevance score at time of suppression. | |
| competing_rule_ids: Other rules that caused suppression (for conflicts). | |
| ctx: Optional BrainContext. | |
| session: Optional session number. | |
| """ | |
| from gradata._events import emit | |
| data = { | |
| "rule_id": rule_id, | |
| "reason": reason, | |
| "relevance": round(relevance, 4), | |
| "competing_rules": competing_rule_ids or [], | |
| } | |
| tags = [f"rule:{rule_id}", f"suppression:{reason}"] | |
| emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx) | |
| def log_suppression( | |
| rule_id: str, | |
| reason: str, | |
| relevance: float, | |
| competing_rule_ids: list[str] | None = None, | |
| ctx=None, | |
| session: int | None = None, | |
| ) -> None: | |
| try: | |
| from gradata._events import emit | |
| data = { | |
| "rule_id": rule_id, | |
| "reason": reason, | |
| "relevance": round(relevance, 4), | |
| "competing_rules": competing_rule_ids or [], | |
| } | |
| tags = [f"rule:{rule_id}", f"suppression:{reason}"] | |
| emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx) | |
| except Exception: | |
| logging.getLogger("gradata.rule_tracker").warning( | |
| "Failed to log suppression for %s (reason=%s)", rule_id, reason | |
| ) |
Rule Used: # Code Review Rules
Rule 1: Never use print() ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_tracker.py
Line: 78-105
Comment:
**`log_suppression` has no try-except around `emit`, unlike `log_application`**
`log_application` (line 59) wraps its `emit` call in a try-except and returns `None` on failure. `log_suppression` emits bare — if `emit` raises (e.g. the event bus is unavailable or the DB is locked), the exception propagates up through `apply_rules`, potentially breaking the entire rule-selection pipeline.
`log_suppression` is called in four places inside `apply_rules` (domain_disabled, relevance_threshold, assumption_invalid, conflict stages), so the blast radius is wide.
```suggestion
def log_suppression(
rule_id: str,
reason: str,
relevance: float,
competing_rule_ids: list[str] | None = None,
ctx=None,
session: int | None = None,
) -> None:
try:
from gradata._events import emit
data = {
"rule_id": rule_id,
"reason": reason,
"relevance": round(relevance, 4),
"competing_rules": competing_rule_ids or [],
}
tags = [f"rule:{rule_id}", f"suppression:{reason}"]
emit("RULE_SUPPRESSION", source="rule_engine", data=data, tags=tags, session=session, ctx=ctx)
except Exception:
logging.getLogger("gradata.rule_tracker").warning(
"Failed to log suppression for %s (reason=%s)", rule_id, reason
)
```
**Rule Used:** # Code Review Rules
## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
How can I resolve this? If you propose a fix, please make it concise.| if max_idle <= 0: | ||
| return 1.0 | ||
| if sessions_since_fire <= 0: | ||
| return 1.0 |
There was a problem hiding this comment.
import math inside function body
math is a stdlib module and can be imported at the top of the file without any cost. Placing the import inside the function body is unusual for a non-optional/lazy dependency and can confuse static analysis tools.
Move this to the top-level imports alongside the other stdlib imports.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_scope.py
Line: 71
Comment:
**`import math` inside function body**
`math` is a stdlib module and can be imported at the top of the file without any cost. Placing the import inside the function body is unusual for a non-optional/lazy dependency and can confuse static analysis tools.
Move this to the top-level imports alongside the other stdlib imports.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| reason: str, | ||
| relevance: float, | ||
| competing_rule_ids: list[str] | None = None, | ||
| ctx=None, |
There was a problem hiding this comment.
ctx parameter missing | None type annotation
The project rule requires | None instead of a bare = None default for optional parameters. The session parameter on the same function correctly uses int | None = None, but ctx is untyped.
| ctx=None, | |
| ctx: object | None = None, |
Rule Used: # Code Review Rules
Rule 1: Never use print() ... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_tracker.py
Line: 83
Comment:
**`ctx` parameter missing `| None` type annotation**
The project rule requires `| None` instead of a bare `= None` default for optional parameters. The `session` parameter on the same function correctly uses `int | None = None`, but `ctx` is untyped.
```suggestion
ctx: object | None = None,
```
**Rule Used:** # Code Review Rules
## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| if _HEDGE_WORDS.search(correction_text): | ||
| return CorrectionIntent( | ||
| intent="clarity", | ||
| confidence=0.60, | ||
| evidence=_HEDGE_WORDS.search(correction_text).group(), # type: ignore[union-attr] |
There was a problem hiding this comment.
_HEDGE_WORDS.search() called twice for the same input
The condition on line 149 already confirms the pattern matches; the inner call on line 153 repeats the search just to retrieve the match object. Cache the result instead.
| if _HEDGE_WORDS.search(correction_text): | |
| return CorrectionIntent( | |
| intent="clarity", | |
| confidence=0.60, | |
| evidence=_HEDGE_WORDS.search(correction_text).group(), # type: ignore[union-attr] | |
| hedge_match = _HEDGE_WORDS.search(correction_text) | |
| if hedge_match: | |
| return CorrectionIntent( | |
| intent="clarity", | |
| confidence=0.60, | |
| evidence=hedge_match.group(), | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/detection/intent_classifier.py
Line: 149-153
Comment:
**`_HEDGE_WORDS.search()` called twice for the same input**
The condition on line 149 already confirms the pattern matches; the inner call on line 153 repeats the search just to retrieve the match object. Cache the result instead.
```suggestion
hedge_match = _HEDGE_WORDS.search(correction_text)
if hedge_match:
return CorrectionIntent(
intent="clarity",
confidence=0.60,
evidence=hedge_match.group(),
)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 30
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/gradata/_scope.py (1)
27-52: 🧹 Nitpick | 🔵 TrivialConsider documenting new
RuleScopefields.The class docstring documents the original five fields but omits the newer additions:
agent_type,namespace,temporal_relevance,max_idle_sessions, andcreated_session. Adding brief descriptions would improve maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_scope.py` around lines 27 - 52, The RuleScope dataclass docstring only documents domain, task_type, audience, channel, and stakes; update the class docstring for RuleScope to include brief descriptions for the new fields agent_type, namespace, temporal_relevance, max_idle_sessions, and created_session so their intent and acceptable values are clear (e.g., agent_type: agent role like "researcher"; namespace: scope tag like "api-endpoint"; temporal_relevance: "evergreen"/"seasonal"/"recent"/""; max_idle_sessions: int auto-suppression count; created_session: session index when scope assigned). Ensure the wording is concise and follows the existing Args style used in the docstring.src/gradata/enhancements/diff_engine.py (1)
60-68:⚠️ Potential issue | 🟡 MinorDocstring threshold is stale after threshold change.
The docstring documents
"discarded"asdistance >= 0.60, but Line 89 changed the threshold to0.80. Update the docstring to match the new threshold.📝 Proposed fix
* ``"as-is"`` — distance < 0.02 * ``"minor"`` — distance < 0.10 * ``"moderate"`` — distance < 0.30 - * ``"major"`` — distance < 0.60 - * ``"discarded"``— distance >= 0.60 + * ``"major"`` — distance < 0.80 + * ``"discarded"``— distance >= 0.80🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/diff_engine.py` around lines 60 - 68, Update the docstring in diff_engine.py to reflect the new threshold for the "discarded" severity: change the prose that currently says `"discarded" — distance >= 0.60` to match the implementation's threshold of 0.80 (i.e., `"discarded" — distance >= 0.80`) so the documentation in the module (around the severity description for compression_distance/edit_distance) matches the logic in the code (see the severity mapping described and the implementation that uses 0.80).src/gradata/integrations/embeddings.py (1)
146-169:⚠️ Potential issue | 🟠 MajorThread safety:
_embedding_cacheis modified without synchronization.The cache is accessed from async handlers (
async_handler=True) which may run concurrently.OrderedDictis not thread-safe for concurrent modifications. This could cause race conditions during cache eviction or insertion.As per coding guidelines, integrations require "thread safety on shared state."
🔒 Proposed fix using threading.Lock
+import threading + # Embedding cache: LRU-style OrderedDict, capped at _CACHE_MAX_SIZE. _embedding_cache: OrderedDict[str, list[float]] = OrderedDict() +_cache_lock = threading.Lock()Then wrap cache access in
_embed_and_cache:if desc and desc not in _embedding_cache: vec = get_client().embed(desc) if vec: - _embedding_cache[desc] = vec - # Evict oldest entries when cache exceeds max size - while len(_embedding_cache) > _CACHE_MAX_SIZE: - _embedding_cache.popitem(last=False) + with _cache_lock: + _embedding_cache[desc] = vec + # Evict oldest entries when cache exceeds max size + while len(_embedding_cache) > _CACHE_MAX_SIZE: + _embedding_cache.popitem(last=False)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/integrations/embeddings.py` around lines 146 - 169, The module-level OrderedDict _embedding_cache is mutated without synchronization inside _embed_and_cache (registered by subscribe_to_bus) leading to race conditions; introduce a module-level threading.Lock (e.g., _embedding_cache_lock) and wrap all reads/writes/evictions of _embedding_cache (contains checks, assignment _embedding_cache[desc] = vec, and the while loop that pops items when > _CACHE_MAX_SIZE) in the lock to ensure atomicity; ensure the lock is acquired before checking desc in _embedding_cache and held until after eviction completes, then released.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_scope.py`:
- Line 73: Move the local "import math" out of the function and place it with
the module-level imports at the top of src/gradata/_scope.py; locate the inline
"import math" occurrence (the one shown in the diff) and remove it from inside
the function, then add a single module-level import line "import math" so
functions in this module can reference math without repeated lookup overhead.
- Around line 55-77: temporal_decay can return >1.0 if a caller passes floor >
1.0; clamp the final returned multiplier to the promised upper bound of 1.0 by
ensuring the return value is min(1.0, max(floor, round(decay, 4))). Update the
return expression in temporal_decay (which uses sessions_since_fire, max_idle,
floor, steepness and computes decay) so the result is always within [floor, 1.0]
(effectively [min(floor,1.0), 1.0] when floor > 1.0).
In `@src/gradata/_tag_taxonomy.py`:
- Around line 100-101: The loop `for _p in ROSCH_PARENTS:` is shadowing the
imported module alias `_p` (used elsewhere as `_p.BRAIN_DIR`), causing
AttributeError; rename the loop variable (e.g., `for parent in ROSCH_PARENTS:`
or `for rp in ROSCH_PARENTS:`) and update the body to `_SUB_TO_PARENT[parent] =
parent` (or `_SUB_TO_PARENT[rp] = rp`) so the imported `_p` remains available
for later uses like `_p.BRAIN_DIR`.
- Around line 391-415: The cognitive-load enrichment currently only reads
category from data ("category") and misses categories supplied as tags; update
the logic around event_type/prefixes where data, enriched, and
_COGNITIVE_LOAD_MAP are used so that if data.get("category") is empty it falls
back to extracting a "category:*" value from the tags list (or prefixes/tags
provided), normalize it to upper() before looking up in _COGNITIVE_LOAD_MAP,
compute cl = _COGNITIVE_LOAD_MAP.get(cat) and append the same enriched tag
("cognitive_load:{cl}") when found; ensure you reference the existing variables
event_type, prefixes, data, enriched, and _COGNITIVE_LOAD_MAP so the change
integrates with the current flow.
- Around line 104-111: Add "from __future__ import annotations" at the top of
the module so PEP 604 union types used in parent_category and subordinates_of
(e.g., "str | None" and "set[str]") are evaluated as forward references; update
the module header to include this import before any other imports or top-level
code to satisfy SDK type-safety requirements.
In `@src/gradata/detection/intent_classifier.py`:
- Line 52: _UPDATE: The _LIST_RE regex currently requires a '.' or ')' after the
list marker so plain markdown bullets like "- item" and "* item" aren't matched;
update the _LIST_RE pattern (symbol: _LIST_RE) to accept bare bullet markers (-,
*, +) or numbered markers (digits followed by '.' or ')') and allow one or more
whitespace characters after the marker (e.g., change to a pattern that uses an
alternation like (?:[-*+]|\d+[.)])\s+ with re.MULTILINE) so plain bullets are
classified as formatting.
In `@src/gradata/enhancements/behavioral_extractor.py`:
- Around line 662-670: In detect_recurring_patterns, remove the unnecessary
string quotes used for forward references (e.g., change annotations like
list["Lesson"] and dict[str, list["Lesson"]] to list[Lesson] and dict[str,
list[Lesson]]); since from __future__ import annotations is already enabled,
update the type hints for the lessons parameter, the by_cat variable, and the
return type RecurringPattern accordingly so Ruff UP037 no longer flags them
(referencing detect_recurring_patterns, Lesson, RecurringPattern, and by_cat to
locate the changes).
In `@src/gradata/enhancements/diff_engine.py`:
- Around line 123-140: Clamp the incoming semantic_similarity to the [0.0, 1.0]
range before using it in logic and before placing it in the returned DiffResult:
in adjust_severity_by_semantics, compute a clamped_similarity = max(0.0,
min(1.0, semantic_similarity)), use clamped_similarity for the comparison
against _SEMANTIC_DOWNGRADE_THRESHOLD and for assigning semantic_similarity in
the returned DiffResult, leaving other fields unchanged; reference
adjust_severity_by_semantics, DiffResult, _SEMANTIC_DOWNGRADE_THRESHOLD, and
_SEVERITY_DOWNGRADE when making the change.
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 463-474: The code truncates stored float Beta parameters by using
int(...) on lesson.alpha and lesson.beta_param; instead preserve the fractional
values when computing total_obs, successes, and trials used by beta_posterior.
Change total_obs to use lesson.alpha + lesson.beta_param - 2 (no int), pass
successes=max(0, lesson.alpha - 1) and trials=max(0, total_obs) (no int casts),
keep the early return when _bayesian_blend_weight(total_obs) == 0.0, compute
bayesian_conf from post["posterior_mean"], blend with lesson.confidence and
round the final value as before. Ensure references: total_obs, lesson.alpha,
lesson.beta_param, _bayesian_blend_weight, beta_posterior, posterior_mean,
lesson.confidence.
- Around line 338-346: The code assumes json.loads(meta_line[len("Beta
params:"):].strip()) returns a mapping and calls bp.get(...), which fails for
valid non-object JSON (list/string/number); modify the parse logic (the block
handling meta_line.startswith("Beta params:")) to verify bp is a dict (e.g.,
isinstance(bp, dict)) before using bp.get to set alpha and beta_param_val, and
otherwise fall back to the existing default values (alpha=1.0,
beta_param_val=1.0) so malformed-but-valid JSON won't raise an exception.
- Line 459: The type annotation on function _bayesian_confidence uses a quoted
forward reference ("Lesson") unnecessarily; remove the quotes so the signature
reads def _bayesian_confidence(lesson: Lesson) -> float: (keep using the already
enabled from __future__ import annotations) to satisfy Ruff UP037 and avoid
redundant stringified annotations.
- Around line 446-456: The ramp currently yields 0.0 at total_observations ==
_BAYESIAN_BLEND_MIN_OBS because the interpolation uses (total_observations -
_BAYESIAN_BLEND_MIN_OBS); change the interpolation in _bayesian_blend_weight to
start the Bayesian component at that observation by using (total_observations -
_BAYESIAN_BLEND_MIN_OBS + 1) in the numerator (keep the same denominator and
min/max guards with _BAYESIAN_BLEND_MIN_OBS and _BAYESIAN_BLEND_MAX_OBS and the
0.7 cap) so that total_observations == _BAYESIAN_BLEND_MIN_OBS yields a small >0
weight and ramps up to 0.7 by _BAYESIAN_BLEND_MAX_OBS.
In `@src/gradata/hooks/implicit_feedback.py`:
- Around line 48-56: APPROVAL_PATTERNS currently matches phrases like "perfect"
and "_detect_signals()" records approvals even when negated (e.g., "not
perfect"), causing "main()" to emit OUTPUT_ACCEPTED incorrectly; update
detection so approvals are rejected when preceded by negation. Specifically, in
_detect_signals() (the function that iterates APPROVAL_PATTERNS), after finding
a match check the immediate context (e.g., up to 3 tokens before the match) for
negation tokens like "not", "no", "never", "n't", "hardly", "barely" or negating
phrases, and only count the approval if no negation is present; alternatively,
update APPROVAL_PATTERNS to include negative-lookbehind/anchored patterns to
avoid matching when preceded by common negators. Ensure the same fix is applied
to the other approval-matching block referenced (lines ~115-125) so main() only
emits OUTPUT_ACCEPTED for truly positive, non-negated praise.
In `@src/gradata/integrations/embeddings.py`:
- Around line 80-88: The semantic_similarity method can return negative values
from cosine_similarity despite the docstring promise of [0.0, 1.0]; update
semantic_similarity (which calls self.embed and cosine_similarity) to clamp the
computed similarity into the [0.0, 1.0] range before returning (e.g., ensure
result = min(1.0, max(0.0, similarity))). Keep the early-return for empty inputs
unchanged.
In `@src/gradata/rules/rule_engine.py`:
- Line 579: The four identical inline imports of log_suppression inside
apply_rules() should be consolidated: move "from gradata.rules.rule_tracker
import log_suppression" to a single import at the top of the apply_rules()
function (or module-level if no circular import issues) and remove the
duplicated imports; then update the conditional checks to use the
already-imported symbol (e.g., "if _ctx and log_suppression:") so the logic is
unchanged but the import is not repeated.
- Around line 579-586: The log_suppression call for domain-disabled is missing
the _ctx guard and currently always emits an event; wrap the block that calls
log_suppression(rule_id=_make_rule_id(lesson), reason="domain_disabled",
relevance=0.0, ctx=_ctx) in an if _ctx: check (matching the other suppression
calls) so suppression logging only runs when _ctx is truthy.
- Line 512: The parameter `_ctx` in rule_engine.py is missing a type annotation;
update the function signature(s) that declare `_ctx` to add a precise type
(e.g., use typing.Optional[typing.Mapping[str, typing.Any]] or a specific
context class if one exists in the codebase) and add any necessary imports from
typing (Optional, Mapping, Any) or the concrete context type; ensure all usages
of `_ctx` in functions/methods that accept it reflect the new annotation so the
SDK layer maintains type safety.
In `@src/gradata/rules/rule_tracker.py`:
- Around line 78-105: The function log_suppression currently accepts any reason
string and emits a suppression tag, so validate the reason at the API boundary
by checking that the provided reason is a member of VALID_SUPPRESSION_REASONS
(import or reference VALID_SUPPRESSION_REASONS at top of the function), and if
not raise a ValueError (or TypeError) with a clear message; keep the rest of the
behavior (building data, tags, and calling emit("RULE_SUPPRESSION", ...))
unchanged so only known suppression reasons produce suppression:* tags and
analytics remain consistent.
- Around line 96-105: The log_suppression path currently calls
emit("RULE_SUPPRESSION", ...) directly which allows emit failures to propagate
and abort rule evaluation; update the log_suppression function to wrap the emit
call in a try/except block (catch Exception) so emission is best-effort: on
exception, log the error (using the existing logging facility or process logger)
and return without re-raising. Keep the same data, tags, session and ctx
arguments when calling emit and ensure the suppression telemetry remains
non-blocking by swallowing errors from emit in log_suppression.
In `@tests/test_approval_signals.py`:
- Around line 42-46: Add a regression test in test_negation_not_approval that
calls _detect_signals with negated-praise examples (e.g., "Not perfect.", "Not
exactly.", "That's not great.") and assert that the resulting signal types (from
the list comprehension using s["type"]) include "negation" but do NOT include
"approval" to ensure APPROVAL_PATTERNS does not trigger OUTPUT_ACCEPTED for
negated praise; place these additional assertions alongside the existing checks
in the test_negation_not_approval function.
In `@tests/test_bayesian_confidence.py`:
- Around line 18-21: The test uses loose inequalities for lesson.alpha and
lesson.beta_param after calling update_confidence([lesson], corrections); change
these to assert the exact expected increments (alpha and beta_param should
increase by exactly +1.0 per branch) using precise comparisons (e.g., assert
lesson.alpha == pytest.approx(<expected_value>) and assert lesson.beta_param ==
pytest.approx(<expected_value>)) so the test fails if update_confidence
double-counts; apply the same tightening for the other assertions around lines
33-36 that check alpha/beta_param to ensure they assert exact expected values
rather than just > checks.
In `@tests/test_behavioral_extractor.py`:
- Around line 24-28: The tests currently use loose assertions on the result of
detect_recurring_patterns(lessons); tighten them to assert deterministic
expected values: assert len(patterns) == 1, assert patterns[0].category ==
"TONE" (keep), and assert patterns[0].correction_count == 3 (replace >= 3), and
apply the same tightening to the assertions around lines 57-58 that check
pattern count and counts so they assert exact expected values rather than
inequalities or bare truthiness; locate these checks by the
detect_recurring_patterns call and the patterns[index] references in the test
file and update them accordingly.
In `@tests/test_diff_engine.py`:
- Around line 8-52: Add two unit tests to TestSemanticSeverityAdjustment to
cover the exact boundary and the minor→as-is path: one test (e.g.,
test_at_threshold_boundary_downgrades) that creates a DiffResult with
severity="moderate" and calls adjust_severity_by_semantics(result,
semantic_similarity=0.85) expecting severity "minor", and another test (e.g.,
test_minor_downgrades_to_as_is) that creates a DiffResult with severity="minor"
and low edit/compression distances and calls
adjust_severity_by_semantics(result, semantic_similarity=0.90) expecting
severity "as-is"; put them alongside the existing tests so
adjust_severity_by_semantics and DiffResult are exercised for the boundary and
minor→as-is downgrade paths.
- Around line 21-30: The test test_low_similarity_no_change doesn't assert that
semantic_similarity was recorded on the returned DiffResult; update the test to
call adjust_severity_by_semantics(result, semantic_similarity=0.50) and assert
that adjusted.semantic_similarity == 0.50 (in addition to the existing severity
assertion) so the DiffResult produced by adjust_severity_by_semantics preserves
the semantic_similarity field; reference the DiffResult instance and the
adjust_severity_by_semantics function in your change.
In `@tests/test_embeddings.py`:
- Around line 22-26: Refactor the test_empty_text_returns_zero to use
pytest.mark.parametrize: replace the three separate assertions with a single
parametrized test that iterates over pairs of inputs (e.g., ("", "hello"),
("hello", ""), ("", "")) and asserts EmbeddingClient().semantic_similarity(a, b)
== 0.0; keep the test name (test_empty_text_returns_zero) and the call to
EmbeddingClient.semantic_similarity to locate the code easily.
In `@tests/test_intent_classifier.py`:
- Around line 48-50: The test test_formatting_list currently only covers
numbered lists; extend it to also assert classify_intent correctly returns
intent "formatting" for bullet markdown variants by adding inputs like "- First
item\n- Second item" and "* First item\n* Second item" (or parametrize the test)
and verify result.intent == "formatting" for each; update the
test_formatting_list function to call classify_intent with these additional
strings (or iterate over a list of examples) so bullet-style markdown is
covered.
In `@tests/test_rule_tracker_suppression.py`:
- Around line 60-65: Extend test_valid_suppression_reasons_complete to include a
negative-path check: call log_suppression(...) with reason="typo" (using the
same parameters/context as other tests) and assert the call fails for unknown
reasons (e.g., raises ValueError or returns a falsy result depending on the
API); this complements the whitelist check of VALID_SUPPRESSION_REASONS and
prevents silent acceptance of unknown reasons.
In `@tests/test_scope.py`:
- Around line 21-37: Replace exact float equality checks in the tests that call
temporal_decay with pytest.approx to avoid brittle comparisons: update
test_decay_fresh_rule (assert temporal_decay(...) == 1.0),
test_decay_beyond_limit (assert temporal_decay(...) == 0.05), and
test_decay_zero_max_idle (assert temporal_decay(...) == 1.0) to use
pytest.approx(1.0) or pytest.approx(0.05) respectively; ensure pytest is
imported in tests/test_scope.py if not already.
In `@tests/test_tag_taxonomy.py`:
- Around line 13-20: The tests currently use truthy checks; update
test_validate_valid to assert explicit values: call
validate_tag("cognitive_load:intrinsic"),
validate_tag("cognitive_load:extraneous"), and
validate_tag("cognitive_load:germane") and for each assert that valid is True
(use "valid is True") and assert the exact expected msg returned by validate_tag
(e.g., "msg is None" or the specific success message your validate_tag
implementation returns) so failures are diagnostic; refer to the validate_tag
function to determine the correct expected msg and replace the loose "assert
valid" lines accordingly.
- Around line 26-37: The three nearly identical tests that call enrich_tags with
event_type="CORRECTION" and different data["category"] values should be combined
into a single parametrized test: create a pytest.mark.parametrize on (category,
expected_tag) covering ("FACTUAL","cognitive_load:intrinsic"),
("STYLE","cognitive_load:extraneous"), ("PROCESS","cognitive_load:germane") and
replace the three functions (test_enrich_factual_is_intrinsic,
test_enrich_style_is_extraneous, test_enrich_process_is_germane) with one
function (e.g., test_enrich_cognitive_load_mapping) that calls enrich_tags([],
event_type="CORRECTION", data={"category": category}) and asserts expected_tag
in tags; this keeps coverage while reducing duplication and follows the tests/**
guideline.
---
Outside diff comments:
In `@src/gradata/_scope.py`:
- Around line 27-52: The RuleScope dataclass docstring only documents domain,
task_type, audience, channel, and stakes; update the class docstring for
RuleScope to include brief descriptions for the new fields agent_type,
namespace, temporal_relevance, max_idle_sessions, and created_session so their
intent and acceptable values are clear (e.g., agent_type: agent role like
"researcher"; namespace: scope tag like "api-endpoint"; temporal_relevance:
"evergreen"/"seasonal"/"recent"/""; max_idle_sessions: int auto-suppression
count; created_session: session index when scope assigned). Ensure the wording
is concise and follows the existing Args style used in the docstring.
In `@src/gradata/enhancements/diff_engine.py`:
- Around line 60-68: Update the docstring in diff_engine.py to reflect the new
threshold for the "discarded" severity: change the prose that currently says
`"discarded" — distance >= 0.60` to match the implementation's threshold of 0.80
(i.e., `"discarded" — distance >= 0.80`) so the documentation in the module
(around the severity description for compression_distance/edit_distance) matches
the logic in the code (see the severity mapping described and the implementation
that uses 0.80).
In `@src/gradata/integrations/embeddings.py`:
- Around line 146-169: The module-level OrderedDict _embedding_cache is mutated
without synchronization inside _embed_and_cache (registered by subscribe_to_bus)
leading to race conditions; introduce a module-level threading.Lock (e.g.,
_embedding_cache_lock) and wrap all reads/writes/evictions of _embedding_cache
(contains checks, assignment _embedding_cache[desc] = vec, and the while loop
that pops items when > _CACHE_MAX_SIZE) in the lock to ensure atomicity; ensure
the lock is acquired before checking desc in _embedding_cache and held until
after eviction completes, then released.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7f314496-ca4c-4111-8b3e-434739c13121
📒 Files selected for processing (22)
src/gradata/_scope.pysrc/gradata/_tag_taxonomy.pysrc/gradata/_types.pysrc/gradata/detection/__init__.pysrc/gradata/detection/intent_classifier.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/diff_engine.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/hooks/implicit_feedback.pysrc/gradata/integrations/embeddings.pysrc/gradata/rules/rule_engine.pysrc/gradata/rules/rule_tracker.pytests/test_approval_signals.pytests/test_bayesian_confidence.pytests/test_behavioral_extractor.pytests/test_diff_engine.pytests/test_embeddings.pytests/test_intent_classifier.pytests/test_rule_tracker_suppression.pytests/test_scope.pytests/test_tag_taxonomy.pytests/test_types.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/integrations/embeddings.pysrc/gradata/_types.pysrc/gradata/detection/__init__.pysrc/gradata/hooks/implicit_feedback.pysrc/gradata/rules/rule_engine.pysrc/gradata/rules/rule_tracker.pysrc/gradata/enhancements/behavioral_extractor.pysrc/gradata/enhancements/diff_engine.pysrc/gradata/_scope.pysrc/gradata/detection/intent_classifier.pysrc/gradata/_tag_taxonomy.pysrc/gradata/enhancements/self_improvement.py
src/gradata/integrations/**
⚙️ CodeRabbit configuration file
src/gradata/integrations/**: Framework integrations and nervous system modules. Check for: SSRF protection on any URL from env vars,
thread safety on shared state, embedding cache must be bounded, event bus handlers must never raise.
Files:
src/gradata/integrations/embeddings.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_scope.pytests/test_embeddings.pytests/test_rule_tracker_suppression.pytests/test_diff_engine.pytests/test_behavioral_extractor.pytests/test_approval_signals.pytests/test_bayesian_confidence.pytests/test_intent_classifier.pytests/test_tag_taxonomy.pytests/test_types.py
src/gradata/hooks/**
⚙️ CodeRabbit configuration file
src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).
Files:
src/gradata/hooks/implicit_feedback.py
🪛 GitHub Actions: SDK CI
src/gradata/enhancements/behavioral_extractor.py
[error] 663-663: ruff check failed (UP037): Remove quotes from type annotation lessons: list["Lesson"] (fixable with --fix).
[error] 670-670: ruff check failed (UP037): Remove quotes from type annotation by_cat: dict[str, list["Lesson"]] (fixable with --fix).
src/gradata/_tag_taxonomy.py
[error] 100-101: ruff check failed (F402): Import _p from line 15 shadowed by loop variable at src/gradata/_tag_taxonomy.py:100.
src/gradata/enhancements/self_improvement.py
[error] 459-459: ruff check failed (UP037): Remove quotes from type annotation _bayesian_confidence(lesson: "Lesson") (fixable with --fix).
🔇 Additional comments (7)
src/gradata/rules/rule_engine.py (1)
122-146: LGTM — Formatting changes.The whitespace and multi-line formatting changes across signal lists, string assembly, RuleScope construction, and sorting keys improve readability without altering logic.
Also applies to: 313-316, 357-361, 377-383, 421-430, 440-452, 665-676, 708-718, 832-844
src/gradata/_scope.py (2)
325-337: LGTM — Robust int coercion inscope_from_dict.The defensive handling of stringified int fields with try/except and fallback to default value
0ensures backward compatibility with JSON/SQLite serialization.
85-114: LGTM — Formatting changes in classification constants.The whitespace adjustments in keyword lists are consistent and do not alter behavior.
src/gradata/integrations/embeddings.py (1)
171-181: LGTM!The reformatted
bus.on()registrations improve readability with no semantic change.tests/test_embeddings.py (1)
8-20: LGTM!Good coverage of key scenarios: identical texts returning ~1.0 with appropriate tolerance, and semantically different texts producing lower similarity. Proper use of
pytest.approxfor floating-point comparison.src/gradata/enhancements/diff_engine.py (2)
78-78: LGTM!The optional
semantic_similarityfield withNonedefault maintains backward compatibility with existingDiffResultconsumers.
306-306: LGTM!Cosmetic reformatting of
SequenceMatchercall with no functional change.
| def temporal_decay( | ||
| sessions_since_fire: int, | ||
| max_idle: int, | ||
| floor: float = 0.05, | ||
| steepness: float = 3.0, | ||
| ) -> float: | ||
| """Compute temporal decay multiplier for rule confidence. | ||
|
|
||
| Uses exponential decay: exp(-steepness * ratio^2) where | ||
| ratio = sessions_since_fire / max_idle. | ||
|
|
||
| Returns decay multiplier in [floor, 1.0]. If max_idle=0, returns 1.0 (evergreen). | ||
| """ | ||
| if max_idle <= 0: | ||
| return 1.0 | ||
| if sessions_since_fire <= 0: | ||
| return 1.0 | ||
|
|
||
| import math | ||
|
|
||
| ratio = sessions_since_fire / max_idle | ||
| decay = math.exp(-steepness * ratio * ratio) | ||
| return max(floor, round(decay, 4)) |
There was a problem hiding this comment.
Add upper bound clamp to temporal_decay return value.
The docstring promises a return value in [floor, 1.0], but if floor > 1.0 is passed (invalid but possible), the function returns > 1.0. This could cause downstream issues if the multiplier is used with confidence values that must stay in [0.0, 1.0].
Proposed defensive fix
ratio = sessions_since_fire / max_idle
decay = math.exp(-steepness * ratio * ratio)
- return max(floor, round(decay, 4))
+ return min(1.0, max(floor, round(decay, 4)))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_scope.py` around lines 55 - 77, temporal_decay can return >1.0
if a caller passes floor > 1.0; clamp the final returned multiplier to the
promised upper bound of 1.0 by ensuring the return value is min(1.0, max(floor,
round(decay, 4))). Update the return expression in temporal_decay (which uses
sessions_since_fire, max_idle, floor, steepness and computes decay) so the
result is always within [floor, 1.0] (effectively [min(floor,1.0), 1.0] when
floor > 1.0).
| if sessions_since_fire <= 0: | ||
| return 1.0 | ||
|
|
||
| import math |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider moving import math to module level.
The math import inside the function incurs a small lookup overhead on each call. Since math is a standard library module with no circular dependency concerns, moving it to the module-level imports improves clarity and efficiency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_scope.py` at line 73, Move the local "import math" out of the
function and place it with the module-level imports at the top of
src/gradata/_scope.py; locate the inline "import math" occurrence (the one shown
in the diff) and remove it from inside the function, then add a single
module-level import line "import math" so functions in this module can reference
math without repeated lookup overhead.
| def parent_category(category: str) -> str | None: | ||
| """Return the Rosch superordinate parent for a category, or None.""" | ||
| return _SUB_TO_PARENT.get(category.upper()) | ||
|
|
||
|
|
||
| def subordinates_of(parent: str) -> set[str]: | ||
| """Return the set of subordinate categories under a Rosch parent.""" | ||
| return set(ROSCH_HIERARCHY.get(parent.upper(), set())) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify requirement presence in this changed SDK module
rg -n '^from __future__ import annotations$' src/gradata/_tag_taxonomy.pyRepository: Gradata/gradata
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Check the file exists and view lines around 104-111
wc -l src/gradata/_tag_taxonomy.py
echo "---"
sed -n '1,20p' src/gradata/_tag_taxonomy.py
echo "---"
sed -n '100,115p' src/gradata/_tag_taxonomy.pyRepository: Gradata/gradata
Length of output: 1372
🏁 Script executed:
#!/bin/bash
# Check for print() statements in the file
echo "=== print() statements ==="
rg -n 'print\(' src/gradata/_tag_taxonomy.py
echo "=== Hardcoded paths or path issues ==="
rg -n "(\.json|\.py|/|\\\\)" src/gradata/_tag_taxonomy.py | head -20
echo "=== Count of type annotations in file ==="
rg -c '\w+\s*\|\s*\w+|str \|' src/gradata/_tag_taxonomy.py
echo "=== Other potential __future__ imports ==="
rg -n '^from __future__' src/gradata/_tag_taxonomy.pyRepository: Gradata/gradata
Length of output: 1159
Add from __future__ import annotations for SDK type-safety compliance.
The PEP 604 union syntax (e.g., str | None, set[str]) at lines 104–111 requires the future-annotations import. Per coding guidelines, src/gradata/**/*.py must include this import for type safety.
Minimal compliant patch
+from __future__ import annotations
+
import json
import re🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_tag_taxonomy.py` around lines 104 - 111, Add "from __future__
import annotations" at the top of the module so PEP 604 union types used in
parent_category and subordinates_of (e.g., "str | None" and "set[str]") are
evaluated as forward references; update the module header to include this import
before any other imports or top-level code to satisfy SDK type-safety
requirements.
| if event_type == "CORRECTION" and "cognitive_load" not in prefixes: | ||
| cat = data.get("category", "").upper() | ||
| _COGNITIVE_LOAD_MAP = { | ||
| "FACTUAL": "intrinsic", | ||
| "CONTENT": "intrinsic", | ||
| "DATA_INTEGRITY": "intrinsic", | ||
| "ENTITIES": "intrinsic", | ||
| "CONTEXT": "intrinsic", | ||
| "STYLE": "extraneous", | ||
| "TONE": "extraneous", | ||
| "COMMUNICATION": "extraneous", | ||
| "PRESENTATION": "extraneous", | ||
| "COST": "extraneous", | ||
| "PROCESS": "germane", | ||
| "DRAFTING": "germane", | ||
| "ACCURACY": "germane", | ||
| "THOROUGHNESS": "germane", | ||
| "ARCHITECTURE": "germane", | ||
| "CONSTRAINT": "germane", | ||
| "STRATEGY": "germane", | ||
| "POSITIONING": "germane", | ||
| } | ||
| cl = _COGNITIVE_LOAD_MAP.get(cat) | ||
| if cl: | ||
| enriched.append(f"cognitive_load:{cl}") |
There was a problem hiding this comment.
Cognitive-load enrichment ignores existing category:* tags.
On Line 392, inference only reads data["category"]. If category is supplied in tags (common in event pipelines) and omitted from data, no cognitive_load:* tag is derived.
Derive category from tags when data is missing
if event_type == "CORRECTION" and "cognitive_load" not in prefixes:
- cat = data.get("category", "").upper()
+ cat = data.get("category", "")
+ if not cat:
+ category_tag = next((t for t in enriched if t.startswith("category:")), "")
+ cat = category_tag.split(":", 1)[1] if category_tag else ""
+ cat = cat.upper()
_COGNITIVE_LOAD_MAP = {
"FACTUAL": "intrinsic",
"CONTENT": "intrinsic",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_tag_taxonomy.py` around lines 391 - 415, The cognitive-load
enrichment currently only reads category from data ("category") and misses
categories supplied as tags; update the logic around event_type/prefixes where
data, enriched, and _COGNITIVE_LOAD_MAP are used so that if data.get("category")
is empty it falls back to extracting a "category:*" value from the tags list (or
prefixes/tags provided), normalize it to upper() before looking up in
_COGNITIVE_LOAD_MAP, compute cl = _COGNITIVE_LOAD_MAP.get(cat) and append the
same enriched tag ("cognitive_load:{cl}") when found; ensure you reference the
existing variables event_type, prefixes, data, enriched, and _COGNITIVE_LOAD_MAP
so the change integrates with the current flow.
| def test_formatting_list(self): | ||
| result = classify_intent("1. First item\n2. Second item") | ||
| assert result.intent == "formatting" |
There was a problem hiding this comment.
Cover markdown bullets, not only numbered lists.
test_formatting_list() only exercises 1. ... input. Add - item and * item cases too, otherwise formatting regressions around bullet detection will slip through.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_intent_classifier.py` around lines 48 - 50, The test
test_formatting_list currently only covers numbered lists; extend it to also
assert classify_intent correctly returns intent "formatting" for bullet markdown
variants by adding inputs like "- First item\n- Second item" and "* First
item\n* Second item" (or parametrize the test) and verify result.intent ==
"formatting" for each; update the test_formatting_list function to call
classify_intent with these additional strings (or iterate over a list of
examples) so bullet-style markdown is covered.
| def test_valid_suppression_reasons_complete(self): | ||
| assert "relevance_threshold" in VALID_SUPPRESSION_REASONS | ||
| assert "domain_disabled" in VALID_SUPPRESSION_REASONS | ||
| assert "conflict" in VALID_SUPPRESSION_REASONS | ||
| assert "assumption_invalid" in VALID_SUPPRESSION_REASONS | ||
| assert len(VALID_SUPPRESSION_REASONS) == 4 |
There was a problem hiding this comment.
Test invalid suppression reasons, not just the whitelist contents.
test_valid_suppression_reasons_complete() only checks the constant definition. Please add a negative-path assertion for log_suppression(..., reason="typo") once the API rejects unknown reasons, otherwise suppression:typo telemetry can regress silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_rule_tracker_suppression.py` around lines 60 - 65, Extend
test_valid_suppression_reasons_complete to include a negative-path check: call
log_suppression(...) with reason="typo" (using the same parameters/context as
other tests) and assert the call fails for unknown reasons (e.g., raises
ValueError or returns a falsy result depending on the API); this complements the
whitelist check of VALID_SUPPRESSION_REASONS and prevents silent acceptance of
unknown reasons.
| def test_decay_fresh_rule(self): | ||
| assert temporal_decay(sessions_since_fire=0, max_idle=20) == 1.0 | ||
|
|
||
| def test_decay_midpoint(self): | ||
| factor = temporal_decay(sessions_since_fire=10, max_idle=20) | ||
| assert 0.4 < factor < 0.8 | ||
|
|
||
| def test_decay_at_limit(self): | ||
| factor = temporal_decay(sessions_since_fire=20, max_idle=20) | ||
| assert factor < 0.15 | ||
|
|
||
| def test_decay_beyond_limit(self): | ||
| factor = temporal_decay(sessions_since_fire=30, max_idle=20) | ||
| assert factor == 0.05 | ||
|
|
||
| def test_decay_zero_max_idle(self): | ||
| assert temporal_decay(sessions_since_fire=100, max_idle=0) == 1.0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the test file
find . -name "test_scope.py" -type fRepository: Gradata/gradata
Length of output: 80
🏁 Script executed:
# Read the specific test file to verify the assertions at lines 21-37
cat -n tests/test_scope.py | sed -n '15,40p'Repository: Gradata/gradata
Length of output: 1247
🏁 Script executed:
# Let's also check if pytest.approx is already imported or used elsewhere in the test file
grep -n "pytest.approx\|from pytest\|import pytest" tests/test_scope.pyRepository: Gradata/gradata
Length of output: 41
Use pytest.approx for exact equality assertions on float values.
Lines 22, 34, and 37 use exact equality (==) to compare floating-point results from temporal_decay. Switch these to pytest.approx(...) to comply with test guidelines and avoid brittle assertions that fail on harmless rounding differences. Lines 26 and 30 already follow best practices with range and inequality checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_scope.py` around lines 21 - 37, Replace exact float equality
checks in the tests that call temporal_decay with pytest.approx to avoid brittle
comparisons: update test_decay_fresh_rule (assert temporal_decay(...) == 1.0),
test_decay_beyond_limit (assert temporal_decay(...) == 0.05), and
test_decay_zero_max_idle (assert temporal_decay(...) == 1.0) to use
pytest.approx(1.0) or pytest.approx(0.05) respectively; ensure pytest is
imported in tests/test_scope.py if not already.
| def test_validate_valid(self): | ||
| valid, msg = validate_tag("cognitive_load:intrinsic") | ||
| assert valid | ||
| valid, msg = validate_tag("cognitive_load:extraneous") | ||
| assert valid | ||
| valid, msg = validate_tag("cognitive_load:germane") | ||
| assert valid | ||
|
|
There was a problem hiding this comment.
Use explicit assertion values in test_validate_valid.
On Line 15, Line 17, and Line 19, assert valid checks only truthiness. Assert exact outputs (valid is True and expected msg) to make failures diagnostic.
Proposed test tightening
def test_validate_valid(self):
valid, msg = validate_tag("cognitive_load:intrinsic")
- assert valid
+ assert valid is True
+ assert msg == "ok"
valid, msg = validate_tag("cognitive_load:extraneous")
- assert valid
+ assert valid is True
+ assert msg == "ok"
valid, msg = validate_tag("cognitive_load:germane")
- assert valid
+ assert valid is True
+ assert msg == "ok"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_validate_valid(self): | |
| valid, msg = validate_tag("cognitive_load:intrinsic") | |
| assert valid | |
| valid, msg = validate_tag("cognitive_load:extraneous") | |
| assert valid | |
| valid, msg = validate_tag("cognitive_load:germane") | |
| assert valid | |
| def test_validate_valid(self): | |
| valid, msg = validate_tag("cognitive_load:intrinsic") | |
| assert valid is True | |
| assert msg == "ok" | |
| valid, msg = validate_tag("cognitive_load:extraneous") | |
| assert valid is True | |
| assert msg == "ok" | |
| valid, msg = validate_tag("cognitive_load:germane") | |
| assert valid is True | |
| assert msg == "ok" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_tag_taxonomy.py` around lines 13 - 20, The tests currently use
truthy checks; update test_validate_valid to assert explicit values: call
validate_tag("cognitive_load:intrinsic"),
validate_tag("cognitive_load:extraneous"), and
validate_tag("cognitive_load:germane") and for each assert that valid is True
(use "valid is True") and assert the exact expected msg returned by validate_tag
(e.g., "msg is None" or the specific success message your validate_tag
implementation returns) so failures are diagnostic; refer to the validate_tag
function to determine the correct expected msg and replace the loose "assert
valid" lines accordingly.
| def test_enrich_factual_is_intrinsic(self): | ||
| tags = enrich_tags([], event_type="CORRECTION", data={"category": "FACTUAL"}) | ||
| assert "cognitive_load:intrinsic" in tags | ||
|
|
||
| def test_enrich_style_is_extraneous(self): | ||
| tags = enrich_tags([], event_type="CORRECTION", data={"category": "STYLE"}) | ||
| assert "cognitive_load:extraneous" in tags | ||
|
|
||
| def test_enrich_process_is_germane(self): | ||
| tags = enrich_tags([], event_type="CORRECTION", data={"category": "PROCESS"}) | ||
| assert "cognitive_load:germane" in tags | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Parametrize cognitive-load enrichment mapping tests.
Lines 26-37 duplicate the same boundary-style assertion pattern. Parametrizing these cases keeps coverage equivalent and improves maintainability.
Parametrized variant
+import pytest
+
class TestCognitiveLoadTaxonomy:
@@
- def test_enrich_factual_is_intrinsic(self):
- tags = enrich_tags([], event_type="CORRECTION", data={"category": "FACTUAL"})
- assert "cognitive_load:intrinsic" in tags
-
- def test_enrich_style_is_extraneous(self):
- tags = enrich_tags([], event_type="CORRECTION", data={"category": "STYLE"})
- assert "cognitive_load:extraneous" in tags
-
- def test_enrich_process_is_germane(self):
- tags = enrich_tags([], event_type="CORRECTION", data={"category": "PROCESS"})
- assert "cognitive_load:germane" in tags
+ `@pytest.mark.parametrize`(
+ "category,expected",
+ [
+ ("FACTUAL", "cognitive_load:intrinsic"),
+ ("STYLE", "cognitive_load:extraneous"),
+ ("PROCESS", "cognitive_load:germane"),
+ ],
+ )
+ def test_enrich_category_mapping(self, category, expected):
+ tags = enrich_tags([], event_type="CORRECTION", data={"category": category})
+ assert expected in tags🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_tag_taxonomy.py` around lines 26 - 37, The three nearly identical
tests that call enrich_tags with event_type="CORRECTION" and different
data["category"] values should be combined into a single parametrized test:
create a pytest.mark.parametrize on (category, expected_tag) covering
("FACTUAL","cognitive_load:intrinsic"), ("STYLE","cognitive_load:extraneous"),
("PROCESS","cognitive_load:germane") and replace the three functions
(test_enrich_factual_is_intrinsic, test_enrich_style_is_extraneous,
test_enrich_process_is_germane) with one function (e.g.,
test_enrich_cognitive_load_mapping) that calls enrich_tags([],
event_type="CORRECTION", data={"category": category}) and asserts expected_tag
in tags; this keeps coverage while reducing duplication and follows the tests/**
guideline.
…eload The module-level `for _p in ROSCH_PARENTS` overwrote the `import gradata._paths as _p` alias, causing `_load_taxonomy()` to fail silently (hasattr check on a string). Renamed loop var to `_rp`. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/gradata/_tag_taxonomy.py (2)
391-415:⚠️ Potential issue | 🟠 MajorCognitive-load enrichment still ignores existing
category:*tags.When
categoryis supplied in inputtagsbut omitted fromdata, nocognitive_load:*tag is derived. For example,enrich_tags(["category:FACTUAL"], "CORRECTION", {})will not addcognitive_load:intrinsic.Proposed fix: fall back to extracting category from tags
if event_type == "CORRECTION" and "cognitive_load" not in prefixes: cat = data.get("category", "").upper() + if not cat: + category_tag = next((t for t in enriched if t.startswith("category:")), "") + cat = category_tag.split(":", 1)[1].upper() if category_tag else "" _COGNITIVE_LOAD_MAP = {,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_tag_taxonomy.py` around lines 391 - 415, The cognitive-load enrichment uses data.get("category") only, so when category is present as a tag (e.g. "category:FACTUAL") it is ignored; modify the block that computes cat (currently using data.get("category", "").upper()) to fallback to parsing the input tags for a "category:<VALUE>" entry (split or scan the tags list, case-insensitive) when data lacks category, then upper() that value before looking it up in _COGNITIVE_LOAD_MAP and appending the resulting "cognitive_load:<...>" to enriched (references: event_type, prefixes, data, tags/enriched, _COGNITIVE_LOAD_MAP).
12-15:⚠️ Potential issue | 🟠 MajorAdd
from __future__ import annotationsfor SDK type-safety compliance.The file uses PEP 604 union syntax (
str | None,dict | None,set[str]) at lines 85, 104, 109, 332, 348, but lacks the required future-annotations import. Per coding guidelines,src/gradata/**/*.pymust include this import.Proposed fix
+from __future__ import annotations + import json import re,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_tag_taxonomy.py` around lines 12 - 15, Add the line "from __future__ import annotations" at the top of src/gradata/_tag_taxonomy.py (immediately after any module docstring and before other imports) so PEP 604 union types like "str | None", "dict | None", and "set[str]" used throughout the file are valid; no other code changes needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/gradata/_tag_taxonomy.py`:
- Around line 391-415: The cognitive-load enrichment uses data.get("category")
only, so when category is present as a tag (e.g. "category:FACTUAL") it is
ignored; modify the block that computes cat (currently using
data.get("category", "").upper()) to fallback to parsing the input tags for a
"category:<VALUE>" entry (split or scan the tags list, case-insensitive) when
data lacks category, then upper() that value before looking it up in
_COGNITIVE_LOAD_MAP and appending the resulting "cognitive_load:<...>" to
enriched (references: event_type, prefixes, data, tags/enriched,
_COGNITIVE_LOAD_MAP).
- Around line 12-15: Add the line "from __future__ import annotations" at the
top of src/gradata/_tag_taxonomy.py (immediately after any module docstring and
before other imports) so PEP 604 union types like "str | None", "dict | None",
and "set[str]" used throughout the file are valid; no other code changes needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 856da7ac-0af5-4f4d-99c6-a71aa4a7dbd4
📒 Files selected for processing (1)
src/gradata/_tag_taxonomy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_tag_taxonomy.py
🔇 Additional comments (7)
src/gradata/_tag_taxonomy.py (7)
97-111: LGTM — Loop variable shadowing is fixed.The loop variables now correctly use
_parent,_subs,_sub, and_rp, avoiding the previously flagged shadowing of the_pmodule alias. The Rosch hierarchy helpersparent_category()andsubordinates_of()are clean implementations.
71-76: LGTM — New cognitive_load taxonomy.The closed taxonomy for cognitive load types aligns with Sweller's Cognitive Load Theory (intrinsic, extraneous, germane) and is correctly configured as optional.
32-51: LGTM — Category values extended with Rosch parents.The addition of
VOICEandQUALITYas valid category values is consistent with their definition inROSCH_HIERARCHY. Formatting improvements aid readability.
331-333: LGTM — Function signatures are well-typed.The
validate_tagsandenrich_tagssignatures are clear and appropriately typed. The PEP 604 union syntax concern is addressed in the earlier comment about missingfrom __future__ import annotations.Also applies to: 347-349
133-144: LGTM — Formatting improvements.Value sets and dictionaries reformatted to one item per line. This improves readability and produces cleaner diffs for future changes.
Also applies to: 150-162, 168-178, 184-184, 190-190, 196-208, 214-214, 252-254, 363-372, 382-388
228-230: LGTM — Defensive attribute checks.The
hasattr(_p, "BRAIN_DIR")guards preventAttributeErrorwhen the module attribute is not initialized. This is appropriate defensive coding for optional configuration.Also applies to: 288-288, 296-296
85-92: LGTM — Rosch hierarchy structure is consistent.The subordinate categories map correctly to either core (
_CORE_TAXONOMY) or domain (_DOMAIN_CATEGORIES) values. The explicit allowance for subordinates under multiple parents (e.g.,ACCURACYunder bothPROCESSandQUALITY) is well-documented at lines 82-83.
Summary
Implements all 9 features from the MiroFish expert panel P0-P2 roadmap. Three priority levels executed in parallel worktrees with TDD.
P0 — Tightly Coupled (semantic severity, rule suppression, taxonomy)
EmbeddingClient.semantic_similarity()+adjust_severity_by_semantics()— downgrades severity when meaning is preserved despite high edit distanceRULE_SUPPRESSIONevents emitted at all 4 filter stages inapply_rules()(domain_disabled, relevance_threshold, assumption_invalid, conflict)P1 — Moderately Coupled (Bayesian confidence, implicit approval, intent classifier)
beta_posterior()from_stats.pyintegrated into confidence pipeline. Alpha/beta parameters on Lesson, Bayesian+FSRS blend that ramps from pure FSRS (< 5 obs) to 70% Bayesian (>= 20 obs). Roundtrip persistence in lessons.mdimplicit_feedback.py. EmitsOUTPUT_ACCEPTEDevents alongside existing negative signalsdetection/intent_classifier.pyclassifies corrections by intent (factual_correction, compliance, formatting, clarity, conciseness, completeness, tone_shift, preference, unknown)P2 — Independent (recurring patterns, temporal scope, cognitive load)
detect_recurring_patterns()groups lessons by category and identifies cross-correction patterns (3+ corrections in same category)RuleScope(temporal_relevance, max_idle_sessions, created_session) +temporal_decay()with configurable exponential decaycognitive_loadclosed tag (intrinsic/extraneous/germane) with auto-inference from correction categoryStats
TestBug7DomainAgnostic(pass individually, fail in suite due to module-level TAXONOMY pollution — not caused by these changes)intent_classifier.py)Test plan
Generated with Gradata
Greptile Summary
This PR implements all 9 features from the MiroFish P0-P2 roadmap across 22 files, adding semantic severity downgrading, rule suppression tracking, Rosch taxonomy restructuring, Bayesian confidence blending, implicit approval signals, intent classification, recurring pattern detection, temporal scope decay, and cognitive load tagging to the Gradata SDK.
The implementation is well-structured overall — TDD approach with +61 tests, no new dependencies, and all new features extend existing files cleanly. Two actionable bugs were found:
apply_rules(P1): Whenapply_rulesrebuildsRuleScopeto inject a detected task type, it manually copies only 5 of 10 fields, silently droppingagent_type,namespace, and all 3 new P2.2 temporal fields (temporal_relevance,max_idle_sessions,created_session). Adataclasses.replace()call fixes this.emitinlog_suppression(P1): The newlog_suppressionfunction callsemitbare, unlikelog_applicationwhich wraps it in try-except. A bus failure at any of the 4 suppression call sites inapply_ruleswould propagate and abort the entire rule-selection pipeline, violating Rule 4.Style issues (repeated
import jsonaliases in loop body,_COGNITIVE_LOAD_MAPas a local dict) are minor and can be addressed as a follow-up.Confidence Score: 4/5
Safe to merge after fixing the two P1 bugs — scope field loss and unguarded emit in log_suppression.
Large, well-tested PR (+61 tests, 1919 total passing) with clean architecture across 9 features. Two P1 bugs exist: the scope reconstruction regression in apply_rules silently discards P2.2 temporal fields and is triggered whenever user_message is provided without a pre-set task_type, and the unguarded emit in log_suppression can abort the rule pipeline on bus failure. Both are targeted one-line fixes. No security issues, no new dependencies, prior thread concerns about _p shadowing are confirmed resolved.
src/gradata/rules/rule_engine.py (scope reconstruction at line 549) and src/gradata/rules/rule_tracker.py (bare emit in log_suppression at line 105)
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[apply_rules called with RuleScope + user_message] --> B{scope.task_type empty AND user_message set?} B -- Yes --> C[detect_task_type from user_message] C --> D[RuleScope rebuilt - drops agent_type, namespace, temporal_relevance, max_idle_sessions, created_session] B -- No --> E[Use original scope] D --> F[Step 1: Eligibility gate PATTERN + RULE only] E --> F F --> G[Step 1.5: Domain disable check] G -- Suppressed --> H[log_suppression reason: domain_disabled] G -- Passes --> I[Step 2/3: Scope scoring] H --> I I -- relevance lt 0.3 --> J[log_suppression reason: relevance_threshold] I -- relevance ge 0.3 --> K[Step 3.5: Assumption check] J --> K K -- Invalid --> L[log_suppression reason: assumption_invalid] K -- Valid --> M[Step 4/5: Sort by state x difficulty x relevance x confidence] L --> M M --> N{graph conflict check} N -- Conflict --> O[log_suppression reason: conflict] N -- No conflict --> P[Step 6: Build AppliedRule objects] O --> P P --> Q[Return AppliedRule list]Comments Outside Diff (2)
src/gradata/rules/rule_engine.py, line 549-556 (link)RuleScopefields are silently dropped when scope is enriched with task typeThis PR adds five new fields to
RuleScope:agent_type,namespace,temporal_relevance,max_idle_sessions,created_session. However, whenapply_rulesreconstructs a frozen scope after detecting a task type, it only copies the original five fields — the new ones are lost.Any caller that sets, for example,
temporal_relevance="recent"ormax_idle_sessions=10and relies on task-type auto-detection will silently get a scope with those fields zeroed out, making the P2 temporal decay feature inoperative for that code path.Prompt To Fix With AI
src/gradata/rules/rule_engine.py, line 549-556 (link)When
apply_rulesrebuilds the scope to inject a detectedtask_type, the manualRuleScope(...)constructor only copies 5 of the 10 fields. The fields added by P2.2 —temporal_relevance,max_idle_sessions,created_session— plus the pre-existingagent_typeandnamespaceare silently reset to their defaults.Any caller that sets up temporal decay (e.g.
max_idle_sessions=5) and also passes auser_messagewithout a pre-settask_typewill have those settings discarded mid-pipeline.Use
dataclasses.replaceinstead so all fields are preserved automatically:You'll need
import dataclassesat the top of the file (orfrom dataclasses import replaceand callreplace(scope, task_type=detected_tt)).Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix: Rosch loop variable _p shadowed _pa..." | Re-trigger Greptile
Context used:
Rule 1: Never use print() ... (source)