fix: add strict mode to prevent silent fallback degradation during benchmarking by abrichr · Pull Request #154 · OpenAdaptAI/openadapt-evals

abrichr · 2026-03-20T00:46:16Z

Summary

Adds strict: bool = False parameter to ScrubMiddleware, extract_workflow(), and generate_transcript()
When strict=True, these components raise errors instead of silently falling back to degraded behavior (unscrubbed screenshots, 1:1 transcript-to-step mapping, placeholder entries)
Prevents evaluating/training on a different system than intended (e.g., no PII scrubbing when you think scrubbing is active, or trivial workflow extraction when VLM is failing)

Changes

scrub_middleware.py: ScrubMiddleware(adapter, strict=True) raises ImportError if openadapt-privacy is missing, re-raises on scrubbing failure
extract.py: extract_workflow(..., strict=True) raises ValueError on VLM parse failure, re-raises on VLM call failure (instead of falling back to 1:1 mapping)
transcript.py: generate_transcript(..., strict=True) re-raises on VLM call failure, raises ValueError if parser returns only placeholder entries (instead of silently continuing)

Test plan

Verify ScrubMiddleware(adapter, strict=True) raises ImportError when openadapt-privacy is not installed
Verify ScrubMiddleware(adapter, strict=False) still falls back silently (no behavior change for existing callers)
Verify extract_workflow(transcript, strict=True) raises ValueError when VLM response cannot be parsed
Verify generate_transcript(session, strict=True) raises on VLM call failure
Verify all three functions work identically to before when strict=False (default)

🤖 Generated with Claude Code

…nchmarking When strict=True, components that previously degraded silently now raise errors instead, ensuring benchmarking/training runs use the intended system configuration (e.g., PII scrubbing active, VLM extraction working). - ScrubMiddleware: raise ImportError if openadapt-privacy missing, re-raise on scrubbing failure - extract_workflow(): raise ValueError on VLM parse failure, re-raise on VLM call failure - generate_transcript(): re-raise on VLM call failure, raise ValueError if parser returns only placeholders Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abrichr force-pushed the fix/strict-mode-fallbacks branch from ec2f193 to 77039e4 Compare March 20, 2026 00:54

abrichr merged commit 62934ab into main Mar 20, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add strict mode to prevent silent fallback degradation during benchmarking#154

fix: add strict mode to prevent silent fallback degradation during benchmarking#154
abrichr merged 1 commit into
mainfrom
fix/strict-mode-fallbacks

abrichr commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Mar 20, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant