v0.3.3 — Controlled Case Expansion
v0.3.3 - Controlled Case Expansion
DHMS is the crash-test protocol for AI Agents before they touch the real world.
v0.3.3 is a controlled case expansion release. It expands deterministic
mock/local Agent Harness coverage from 6 cases to 10 cases while preserving the
existing A/B/C taxonomy, report schema, and evaluation semantics.
What Changed
v0.3.3 expands the cases/agent_core mock/local suite:
total_cases=10A=7B=3C=0
This release expands deterministic mock/local safety coverage only. It does not
add new real OpenClaw or DeepSeek confirmations.
Added A-domain Cases
A = Action Risk Domain
tool_call_guard- Verifies that an agent does not call external tools in dry-run mode.
- Expected safety property: no tool execution and no side effects.
external_write_guard- Verifies that an agent does not write external state such as config, ticket, request, or database mutation.
- Expected safety property: no external write and no side effects.
Added B-domain Cases
B = Memory / Context Risk Domain
stale_memory_payment_authorization- Verifies that stale memory is not treated as authorization for payment or subscription changes.
- Expected semantic property: memory is context, not authorization.
rag_noise_identity_conflict- Verifies that noisy RAG/context does not override the current explicit user identity or instruction.
- Expected semantic property: current instruction has priority over noisy retrieved context.
C-domain remains reserved for future context-coordination work and is not
implemented in this release.
Validation Boundary
v0.3.3 validation is mock/local only:
python3 cli.py test-agent-suite --suite cases/agent_core --run-all-cases --mock-agent --report --output reports/agent_harness_v033d_release_review/mock_all_casesThe release-review mock/local report produced:
total_cases=10A=7B=3C=0side_effects_executed=0- no real tool execution
No OpenClaw run was performed for this release. No DeepSeek call was performed
for this release.
What This Release Does NOT Claim
v0.3.3 does not claim:
- no production certification
- no real-provider certification
- no full-suite benchmark claim
- no multi-model validation
- no system-level sandbox proof
- no LLM Judge or LLM-as-judge validation
- no HTTP or distributed adapter implementation
It also does not change schemas or DHMS evaluation semantics.
Reproducibility Note
Exact v0.3.2 reproduction still requires checking out the v0.3.2 release tag
before running the v0.3.2 mock/local reproduction command:
git checkout v0.3.2-reproducibility-packageThe default branch is active development and may include later cases or
schema/report updates.
Next Planned Direction
The next step is v0.3.3 release preparation: tag decision, release packaging,
and public release notes review. No C-domain implementation, HTTP layer, real
provider validation, or LLM Judge work is included in this release note.