fix(ragas): detect and skip instructor#1658 silent-zero scores from Google provider by decko · Pull Request #180 · decko/raki

decko · 2026-04-24T01:19:53Z

Summary

Detects when the instructor library silently returns value=0.0 / reason=None (instead of raising a ValidationError) for Google/Gemini structured output (upstream bug instructor#1658)
Adds InstructorSilentZeroError and is_instructor_silent_zero() in src/raki/metrics/ragas/adapter.py; integrates detection in all four Ragas metric scorers (faithfulness, precision, recall, relevancy)
Returns score=None (N/A) when every session in a run hits a silent-zero failure, preventing misleading 0.0 averages
Removes top_p from Google LLM model_args (mirrors existing Anthropic fix) to prevent API rejection

Acceptance Criteria

Silent-zero detection fires only for provider=="google" with value==0.0 AND no reason
Legitimate 0.0 scores (with a reason) are preserved
score=None returned when all sessions hit silent-zero failures
top_p removed from Google LLM path
20 new unit tests added and passing
All 926 tests pass (4 slow/LLM integration tests deselected per convention)
No pre-commit hook failures (ruff check, ruff format, ty check)
Merge conflict in tests/test_cli.py resolved (kept ruff-formatted multi-line signature)

Review Results

Verification ✅

926 passed, 4 deselected. All 20 new tests pass. ruff check, ruff format, and ty check clean. raki validate --deep operational metrics compute successfully. The 12 warning[unused-ignore-comment] diagnostics from ty are all pre-existing on the base branch.

Code Review 🔴 → ✅ (rework applied)

CRITICAL (fixed): Unresolved Git conflict markers in tests/test_cli.py at lines 1623–1629 caused a SyntaxError. Resolved by keeping the ruff-formatted multi-line method signature and removing the conflict markers.

Minor (acknowledged, non-blocking): No partial-success test (some sessions succeed, some hit silent zero). Pre-existing 0.0 fallback when scores is empty and no recognized failure type was tracked — not introduced by this PR.

Refs

Refs #169

Assisted-by: Claude Opus 4.6 (1M context) noreply@anthropic.com
Assigned-by: decko

…oogle provider When the instructor library fails to parse Google/Gemini structured output, it silently returns a Pydantic model with default values (value=0.0, reason=None) instead of raising a ValidationError. This causes Ragas metrics to record a misleading 0.0 for affected sessions, pulling the average score down without any indication of failure. Fix: - Add InstructorSilentZeroError (RuntimeError subclass) to adapter.py so the per-session error handler can distinguish this failure from other errors. - Add is_instructor_silent_zero(result, provider) detection function that fires only for provider="google" + result.value==0.0 + no reason. - Apply detection in all four Ragas metrics (faithfulness, precision, recall, relevancy): raise InstructorSilentZeroError when detected, track silent_zero_failures, and return score=None when ALL failures are silent zeros. - Pop top_p from Google LLM model_args after llm_factory() call, mirroring the existing Anthropic fix -- Google also rejects temperature + top_p together, which is one path that triggers the silent-zero bug. - Add ty: ignore[unresolved-attribute] annotations for model_args.pop() calls (the attribute exists at runtime but is not in the type stub). - Add towncrier fragment changes/169.fix. Closes #169 Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Assigned-by: soda-orchestrator

Removed stash conflict markers (<<<, ===, >>>) and kept the ruff-formatted multi-line signature for test_docs_path_within_cwd_but_outside_manifest_parent_accepted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When some sessions score normally and others hit the instructor#1658 silent zero bug, the failure count and warning are now included in the metric details. Previously silent zero failures were tracked but not surfaced when valid scores also existed. Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Assigned-by: decko

decko and others added 2 commits April 23, 2026 22:07

decko added the ai-assisted Implemented with AI assistance label Apr 24, 2026

decko merged commit f73128d into main Apr 24, 2026
4 checks passed

decko deleted the soda/169 branch April 24, 2026 01:48

decko mentioned this pull request Apr 24, 2026

refactor: extract shared error handling from 4 Ragas metric files #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ragas): detect and skip instructor#1658 silent-zero scores from Google provider#180

fix(ragas): detect and skip instructor#1658 silent-zero scores from Google provider#180
decko merged 3 commits into
mainfrom
soda/169

decko commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

decko commented Apr 24, 2026

Summary

Acceptance Criteria

Review Results

Verification ✅

Code Review 🔴 → ✅ (rework applied)

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant