You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a future major version, remove EvaluationResponse.is_safe, top-level EvaluationResponse.confidence, and per-control EvaluatorResult.confidence.
Evaluation responses should expose the actual evaluation artifacts: matches, errors, non_matches, and reason.
Enforcement code should derive allow/block/steer behavior directly from matches, errors, control actions, and on_evaluation_error, rather than reading a precomputed boolean or confidence score.
Motivation
Agent Control evaluation is fundamentally about control outcomes:
which controls matched
which controls failed
which controls did not match
which actions are attached to matched controls (deny, steer, observe)
is_safe is derived state, not source-of-truth state. It compresses control actions, evaluator errors, fail-open/fail-closed policy, and SDK/server merge behavior into one boolean.
Top-level confidence is not a clear confidence concept. The current value is a mixture of:
min(local, server) when merging SDK-local and server evaluations
Per-control EvaluatorResult.confidence is also not consistently meaningful. Most deterministic evaluators return fixed 1.0 / 0.0 values, while classifier-style evaluators may use it as a score. That creates a shared field whose semantics vary by evaluator.
Keeping these confidence fields risks misleading API consumers into treating them as calibrated probabilities.
Removing these fields forces clients to reason from explicit evaluation results instead of ambiguous summary numbers.
The engine computes is_safe from matched deny/steer controls and deny-control errors in engine/src/agent_control_engine/core.py.
The engine computes top-level confidence in engine/src/agent_control_engine/core.py using successful/evaluated counts and special deny/error cases.
Composite condition confidence is derived from child evaluator confidence in engine/src/agent_control_engine/core.py, even though many evaluator confidence values are fixed or non-calibrated.
The Python SDK currently uses is_safe in several enforcement paths:
Each ControlMatch.result should describe whether that specific evaluator matched, failed, and what metadata/message it returned, without a required confidence score.
Consumers that need an allow/block decision should derive it from:
Consumers that need evaluator-specific scores should carry those scores in evaluator-specific metadata with evaluator-specific naming, not in a universal confidence field.
Observe that top-level confidence is not model confidence; it is mostly an evaluation-success ratio with special cases.
Inspect built-in deterministic evaluators and observe that most EvaluatorResult.confidence values are fixed values rather than calibrated scores.
Inspect SDK enforcement paths and observe that is_safe is used as a convenience decision gate even though the underlying response already contains matches, errors, and actions.
Proposed solution (optional)
Remove is_safe and confidence from EvaluationResponse and EvaluationResult in the next major version.
Remove confidence from EvaluatorResult and all ControlMatch.result payloads.
Remove composite-confidence calculation from the engine.
Remove EvaluationResult.__bool__ or redefine it only after a separate explicit design decision; do not silently preserve bool(result) as an alias for removed is_safe.
Remove EvaluationResult.is_confident(...).
Update all evaluator implementations to stop returning confidence.
For evaluator-specific scores that remain useful, move them into evaluator metadata with explicit names, such as:
score
risk_score
classifier_score
provider-specific raw score fields
Update Python SDK enforcement to derive behavior directly:
blocking deny match -> raise ControlViolationError
non-blocking fail-open error -> keep visible in errors, do not block
observe matches -> log/emit observability only
Update local/server merge to merge matches, errors, and non_matches, then derive enforcement from the merged artifacts rather than computing local.is_safe and server.is_safe.
Update SDK decorator/integration paths to stop depending on response-level is_safe or confidence.
Update observability event models and stats to remove or replace confidence / avg_confidence.
Update TypeScript SDK generated models after the OpenAPI schema changes.
Summary
EvaluationResponse.is_safe, top-levelEvaluationResponse.confidence, and per-controlEvaluatorResult.confidence.matches,errors,non_matches, andreason.on_evaluation_error, rather than reading a precomputed boolean or confidence score.Motivation
deny,steer,observe)is_safeis derived state, not source-of-truth state. It compresses control actions, evaluator errors, fail-open/fail-closed policy, and SDK/server merge behavior into one boolean.confidenceis not a clear confidence concept. The current value is a mixture of:min(local, server)when merging SDK-local and server evaluationsEvaluatorResult.confidenceis also not consistently meaningful. Most deterministic evaluators return fixed1.0/0.0values, while classifier-style evaluators may use it as a score. That creates a shared field whose semantics vary by evaluator.Current behavior
models/src/agent_control_models/evaluation.pydefines:EvaluationResponse.is_safe: boolEvaluationResponse.confidence: floatmodels/src/agent_control_models/controls.pydefines:EvaluatorResult.confidence: floatis_safefrom matched deny/steer controls and deny-control errors inengine/src/agent_control_engine/core.py.confidenceinengine/src/agent_control_engine/core.pyusing successful/evaluated counts and special deny/error cases.engine/src/agent_control_engine/core.py, even though many evaluator confidence values are fixed or non-calibrated.is_safein several enforcement paths:sdks/python/src/agent_control/evaluation.pysdks/python/src/agent_control/evaluation.pysdks/python/src/agent_control/integrations/_core.pysdks/python/src/agent_control/control_decorators.pyEvaluationResult.is_confident(...), logs confidence, and merges local/server confidence withmin(...).avg_confidence, so this change affects event schemas and stats too.Expected behavior
EvaluationResponse.is_safeEvaluationResponse.confidenceEvaluatorResult.confidence{ "reason": null, "matches": [], "errors": [], "non_matches": [] }ControlMatch.resultshould describe whether that specific evaluator matched, failed, and what metadata/message it returned, without a required confidence score.matcheserrorson_evaluation_errorfrom issue Define control-level semantics for evaluator failures and timeouts #199confidencefield.Reproduction (if bug)
engine/src/agent_control_engine/core.pyconfidence calculation.confidenceis not model confidence; it is mostly an evaluation-success ratio with special cases.EvaluatorResult.confidencevalues are fixed values rather than calibrated scores.is_safeis used as a convenience decision gate even though the underlying response already containsmatches,errors, and actions.Proposed solution (optional)
is_safeandconfidencefromEvaluationResponseandEvaluationResultin the next major version.confidencefromEvaluatorResultand allControlMatch.resultpayloads.EvaluationResult.__bool__or redefine it only after a separate explicit design decision; do not silently preservebool(result)as an alias for removedis_safe.EvaluationResult.is_confident(...).confidence.scorerisk_scoreclassifier_scoredenymatch -> raiseControlViolationErrorsteermatch -> raiseControlSteerErrorerrors, do not blockmatches,errors, andnon_matches, then derive enforcement from the merged artifacts rather than computinglocal.is_safe and server.is_safe.is_safeor confidence.confidence/avg_confidence.Additional context
models/src/agent_control_models/evaluation.pymodels/src/agent_control_models/controls.pymodels/src/agent_control_models/observability.pyengine/src/agent_control_engine/core.pysdks/python/src/agent_control/evaluation.pysdks/python/src/agent_control/integrations/_core.pysdks/python/src/agent_control/control_decorators.pysdks/python/src/agent_control/evaluation_events.pyserver/src/agent_control_server/observability/store/postgres.py