Skip to content

Releases: RewriteReality-Labs/GBSE

GBSE v1.0.0 — ATTA BENCHMARK_002 AFFIRMED

04 Jun 13:41

Choose a tag to compare

GBSE v1.0.0 — ATTA BENCHMARK_002 AFFIRMED

This release records the first official ATTA-affirmed GBSE benchmark result.

Official benchmark status

  • Status: AFFIRMED
  • officialValid: true
  • Official run count: 3
  • Expected executions: 168
  • Actual executions: 168
  • Successful executions: 168
  • Errors: 0
  • API error rate: 0.0%

Benchmark metrics

  • Average flag detection: 90.5%
  • Silent hallucination rate: 1.8%
  • Silent hallucination rate on hallucination tests: 3.8%
  • Must-not-pass failure count: 0
  • Clean query pass rate: 100.0%
  • Adversarial rejection rate: 100.0%
  • False premise rejection rate: 100.0%
  • Injection rejection rate: 100.0%

Provenance

  • Benchmark code commit: 19b946d
  • Proof/result commit: 5f62d2c
  • Model: claude-sonnet-4-20250514
  • Temperature: 0
  • Run mode: official

Included proof artifact

  • benchmark-results.json