Major: remove is_safe and confidence from evaluation models

## Summary
- In a future major version, remove `EvaluationResponse.is_safe`, top-level `EvaluationResponse.confidence`, and per-control `EvaluatorResult.confidence`.
- Evaluation responses should expose the actual evaluation artifacts: `matches`, `errors`, `non_matches`, and `reason`.
- Enforcement code should derive allow/block/steer behavior directly from matches, errors, control actions, and `on_evaluation_error`, rather than reading a precomputed boolean or confidence score.

## Motivation
- Agent Control evaluation is fundamentally about control outcomes:
  - which controls matched
  - which controls failed
  - which controls did not match
  - which actions are attached to matched controls (`deny`, `steer`, `observe`)
- `is_safe` is derived state, not source-of-truth state. It compresses control actions, evaluator errors, fail-open/fail-closed policy, and SDK/server merge behavior into one boolean.
- Top-level `confidence` is not a clear confidence concept. The current value is a mixture of:
  - full confidence for deny matches
  - zero confidence for deny errors
  - proportional successful/evaluated controls otherwise
  - `min(local, server)` when merging SDK-local and server evaluations
- Per-control `EvaluatorResult.confidence` is also not consistently meaningful. Most deterministic evaluators return fixed `1.0` / `0.0` values, while classifier-style evaluators may use it as a score. That creates a shared field whose semantics vary by evaluator.
- Keeping these confidence fields risks misleading API consumers into treating them as calibrated probabilities.
- Removing these fields forces clients to reason from explicit evaluation results instead of ambiguous summary numbers.

## Current behavior
- `models/src/agent_control_models/evaluation.py` defines:
  - `EvaluationResponse.is_safe: bool`
  - `EvaluationResponse.confidence: float`
- `models/src/agent_control_models/controls.py` defines:
  - `EvaluatorResult.confidence: float`
- The engine computes `is_safe` from matched deny/steer controls and deny-control errors in `engine/src/agent_control_engine/core.py`.
- The engine computes top-level `confidence` in `engine/src/agent_control_engine/core.py` using successful/evaluated counts and special deny/error cases.
- Composite condition confidence is derived from child evaluator confidence in `engine/src/agent_control_engine/core.py`, even though many evaluator confidence values are fixed or non-calibrated.
- The Python SDK currently uses `is_safe` in several enforcement paths:
  - local/server merge: `sdks/python/src/agent_control/evaluation.py`
  - local-first short-circuiting: `sdks/python/src/agent_control/evaluation.py`
  - framework integration enforcement: `sdks/python/src/agent_control/integrations/_core.py`
  - decorator dict response handling: `sdks/python/src/agent_control/control_decorators.py`
- The Python SDK mostly carries confidence through, exposes `EvaluationResult.is_confident(...)`, logs confidence, and merges local/server confidence with `min(...)`.
- Observability stores control-event confidence and aggregates `avg_confidence`, so this change affects event schemas and stats too.

## Expected behavior
- Evaluation models should no longer include:
  - `EvaluationResponse.is_safe`
  - `EvaluationResponse.confidence`
  - `EvaluatorResult.confidence`
- The response should contain explicit evaluation artifacts only, for example:

```json
{
  "reason": null,
  "matches": [],
  "errors": [],
  "non_matches": []
}
```

- Each `ControlMatch.result` should describe whether that specific evaluator matched, failed, and what metadata/message it returned, without a required confidence score.
- Consumers that need an allow/block decision should derive it from:
  - `matches`
  - `errors`
  - each control action
  - control-level `on_evaluation_error` from issue https://github.com/agentcontrol/agent-control/issues/199
- Consumers that need evaluator-specific scores should carry those scores in evaluator-specific metadata with evaluator-specific naming, not in a universal `confidence` field.

## Reproduction (if bug)
1. Inspect `engine/src/agent_control_engine/core.py` confidence calculation.
2. Observe that top-level `confidence` is not model confidence; it is mostly an evaluation-success ratio with special cases.
3. Inspect built-in deterministic evaluators and observe that most `EvaluatorResult.confidence` values are fixed values rather than calibrated scores.
4. Inspect SDK enforcement paths and observe that `is_safe` is used as a convenience decision gate even though the underlying response already contains `matches`, `errors`, and actions.

## Proposed solution (optional)
- Remove `is_safe` and `confidence` from `EvaluationResponse` and `EvaluationResult` in the next major version.
- Remove `confidence` from `EvaluatorResult` and all `ControlMatch.result` payloads.
- Remove composite-confidence calculation from the engine.
- Remove `EvaluationResult.__bool__` or redefine it only after a separate explicit design decision; do not silently preserve `bool(result)` as an alias for removed `is_safe`.
- Remove `EvaluationResult.is_confident(...)`.
- Update all evaluator implementations to stop returning `confidence`.
- For evaluator-specific scores that remain useful, move them into evaluator metadata with explicit names, such as:
  - `score`
  - `risk_score`
  - `classifier_score`
  - provider-specific raw score fields
- Update Python SDK enforcement to derive behavior directly:
  - blocking `deny` match -> raise `ControlViolationError`
  - blocking `steer` match -> raise `ControlSteerError`
  - blocking evaluation error -> raise generic control-evaluation failure
  - non-blocking fail-open error -> keep visible in `errors`, do not block
  - observe matches -> log/emit observability only
- Update local/server merge to merge `matches`, `errors`, and `non_matches`, then derive enforcement from the merged artifacts rather than computing `local.is_safe and server.is_safe`.
- Update SDK decorator/integration paths to stop depending on response-level `is_safe` or confidence.
- Update observability event models and stats to remove or replace `confidence` / `avg_confidence`.
- Update TypeScript SDK generated models after the OpenAPI schema changes.

## Additional context
- Related control-level evaluator failure policy issue: https://github.com/agentcontrol/agent-control/issues/199
- Related control model refactor issue: https://github.com/agentcontrol/agent-control/issues/200
- Relevant files:
  - `models/src/agent_control_models/evaluation.py`
  - `models/src/agent_control_models/controls.py`
  - `models/src/agent_control_models/observability.py`
  - `engine/src/agent_control_engine/core.py`
  - `sdks/python/src/agent_control/evaluation.py`
  - `sdks/python/src/agent_control/integrations/_core.py`
  - `sdks/python/src/agent_control/control_decorators.py`
  - `sdks/python/src/agent_control/evaluation_events.py`
  - `server/src/agent_control_server/observability/store/postgres.py`
- This is a major-version API cleanup and should not be folded into smaller evaluator PRs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major: remove is_safe and confidence from evaluation models #201

Summary

Motivation

Current behavior

Expected behavior

Reproduction (if bug)

Proposed solution (optional)

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Major: remove is_safe and confidence from evaluation models #201

Description

Summary

Motivation

Current behavior

Expected behavior

Reproduction (if bug)

Proposed solution (optional)

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions