feat(inference): add benchmark performance guardrails#219
feat(inference): add benchmark performance guardrails#219TobiBu merged 4 commits intofeat/posterior-predictive-outputsfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an inference “performance guardrails” utility to evaluate benchmark results against configurable runtime/objective thresholds, with tests and documentation so regressions can be detected early in CI or local benchmarking workflows.
Changes:
- Introduces
rubix.inference.performance_guardrailswith threshold dataclasses and pass/fail check helpers for optimization + VI benchmarks. - Exports guardrail APIs from
rubix.inferencefor public use and adds Sphinx docs coverage. - Adds unit tests covering pass/fail scenarios for both optimization and VI guardrail checks.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
rubix/inference/performance_guardrails.py |
New guardrail checking utilities and threshold/result dataclasses. |
rubix/inference/__init__.py |
Re-exports guardrail APIs at the package level. |
tests/test_inference_performance_guardrails.py |
Unit tests for guardrail pass/fail behavior. |
docs/rubix.inference.rst |
Adds the new module to Sphinx API docs. |
docs/inference_workflows.rst |
Documents intended usage pattern for guardrails in workflows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @dataclass(frozen=True) | ||
| class ObjectiveThresholds: | ||
| """Thresholds for objective quality checks.""" | ||
|
|
||
| max_final_loss: Optional[float] = None | ||
| max_best_loss: Optional[float] = None | ||
| max_final_objective: Optional[float] = None | ||
| max_best_objective: Optional[float] = None |
There was a problem hiding this comment.
ObjectiveThresholds mixes optimization loss thresholds (max_final_loss/max_best_loss) with VI objective thresholds (max_final_objective/max_best_objective), but each guardrail checker only evaluates a subset. As written, setting the “wrong” fields for a given checker will be silently ignored and can incorrectly report passed=True. Consider splitting this into two threshold dataclasses (optimization vs VI), or add explicit validation in check_ifu_optimization_guardrails/check_vi_guardrails to raise a ValueError when irrelevant threshold fields are set.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
feat(inference): add synthetic science recipe workflow
Agent-Logs-Url: https://github.com/AstroAI-Lab/rubix/sessions/13c7f31b-922d-4a76-8508-d6d549378715 Co-authored-by: TobiBu <7574273+TobiBu@users.noreply.github.com>
|
@copilot can you try again to apply changes based on the feedback in this PR? |
The changes are applied in commit
This removes the silent-ignore issue: passing the wrong threshold type to a checker now results in a type mismatch rather than incorrect pass/fail reporting. Tests and docs have been updated accordingly. |
30e4be3
into
feat/posterior-predictive-outputs
Summary
Validation