Skip to content

fix(infra): switch eval alarm to simple metric (no SEARCH)#141

Merged
cipher813 merged 1 commit into
mainfrom
fix/eval-alarm-no-search
May 3, 2026
Merged

fix(infra): switch eval alarm to simple metric (no SEARCH)#141
cipher813 merged 1 commit into
mainfrom
fix/eval-alarm-no-search

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Caught at deploy — PR 4c's alarm script failed with `ValidationError: SEARCH is not supported on Metric Alarms`. CloudWatch dashboards support SEARCH; alarms don't. Switched to a simple metric alarm against the new `agent_quality_score_4w_mean_min` floor metric (paired with research PR #96).

```
--namespace AlphaEngine/Eval
--metric-name agent_quality_score_4w_mean_min
--statistic Minimum --period 86400 --evaluation-periods 1
--threshold 3.0 --comparison-operator LessThanThreshold
```

Operator workflow unchanged

Alarm fires → operator clicks dashboard quality-trend page to identify which combo triggered. The SEARCH design wouldn't have surfaced per-combo identity in the alarm body anyway.

Pairs with

`alpha-engine-research` PR #96 — emits the floor metric this alarm reads.

Deploy after merge

`./infrastructure/setup_eval_quality_alarm.sh` (idempotent, overwrites in place).

Test plan

  • `bash -n setup_eval_quality_alarm.sh` syntax-clean.
  • `pytest tests/ -q` → 433 passed (no test logic affected).
  • After research PR fix(universe_returns): use NYSE trading-day arithmetic for forward windows #96 merges + Lambda redeploys + this PR merges + alarm script re-runs: `aws cloudwatch describe-alarms --alarm-names alpha-engine-eval-quality-regression` returns `StateValue: INSUFFICIENT_DATA` (no firing — first emission ~Sat 5/9).

🤖 Generated with Claude Code

PR 4c's setup_eval_quality_alarm.sh used a SEARCH expression to
reduce across (judged_agent_id, criterion, judge_model) combos at
alarm-evaluation time. Caught at deploy:

  ValidationError: SEARCH is not supported on Metric Alarms.

CloudWatch dashboards support SEARCH; alarms don't. Switched to a
simple metric alarm against the new agent_quality_score_4w_mean_min
floor metric that the rolling-mean Lambda emits (alpha-engine-research
PR #96).

  --namespace AlphaEngine/Eval
  --metric-name agent_quality_score_4w_mean_min
  --statistic Minimum --period 86400 --evaluation-periods 1
  --threshold 3.0 --comparison-operator LessThanThreshold

Operator workflow unchanged: alarm fires → operator clicks dashboard
quality-trend page to identify which combo triggered. The SEARCH
design wouldn't have surfaced per-combo identity in the alarm body
either.

bash -n syntax-clean. Tests 433 still pass (no test logic affected).

Deploy after merge: ./infrastructure/setup_eval_quality_alarm.sh
(idempotent — overwrites the alarm in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit b53f66e into main May 3, 2026
1 check passed
@cipher813 cipher813 deleted the fix/eval-alarm-no-search branch May 3, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant