Skip to content

fix: add strict mode to prevent silent fallback degradation during benchmarking#154

Merged
abrichr merged 1 commit into
mainfrom
fix/strict-mode-fallbacks
Mar 20, 2026
Merged

fix: add strict mode to prevent silent fallback degradation during benchmarking#154
abrichr merged 1 commit into
mainfrom
fix/strict-mode-fallbacks

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 20, 2026

Summary

  • Adds strict: bool = False parameter to ScrubMiddleware, extract_workflow(), and generate_transcript()
  • When strict=True, these components raise errors instead of silently falling back to degraded behavior (unscrubbed screenshots, 1:1 transcript-to-step mapping, placeholder entries)
  • Prevents evaluating/training on a different system than intended (e.g., no PII scrubbing when you think scrubbing is active, or trivial workflow extraction when VLM is failing)

Changes

  • scrub_middleware.py: ScrubMiddleware(adapter, strict=True) raises ImportError if openadapt-privacy is missing, re-raises on scrubbing failure
  • extract.py: extract_workflow(..., strict=True) raises ValueError on VLM parse failure, re-raises on VLM call failure (instead of falling back to 1:1 mapping)
  • transcript.py: generate_transcript(..., strict=True) re-raises on VLM call failure, raises ValueError if parser returns only placeholder entries (instead of silently continuing)

Test plan

  • Verify ScrubMiddleware(adapter, strict=True) raises ImportError when openadapt-privacy is not installed
  • Verify ScrubMiddleware(adapter, strict=False) still falls back silently (no behavior change for existing callers)
  • Verify extract_workflow(transcript, strict=True) raises ValueError when VLM response cannot be parsed
  • Verify generate_transcript(session, strict=True) raises on VLM call failure
  • Verify all three functions work identically to before when strict=False (default)

🤖 Generated with Claude Code

…nchmarking

When strict=True, components that previously degraded silently now raise
errors instead, ensuring benchmarking/training runs use the intended
system configuration (e.g., PII scrubbing active, VLM extraction working).

- ScrubMiddleware: raise ImportError if openadapt-privacy missing,
  re-raise on scrubbing failure
- extract_workflow(): raise ValueError on VLM parse failure,
  re-raise on VLM call failure
- generate_transcript(): re-raise on VLM call failure,
  raise ValueError if parser returns only placeholders

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr force-pushed the fix/strict-mode-fallbacks branch from ec2f193 to 77039e4 Compare March 20, 2026 00:54
@abrichr abrichr merged commit 62934ab into main Mar 20, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant