Skip to content

Revise benchmark evaluation metrics and runners#230

Merged
bihius merged 2 commits into
mainfrom
benchmark-refactor
Jun 6, 2026
Merged

Revise benchmark evaluation metrics and runners#230
bihius merged 2 commits into
mainfrom
benchmark-refactor

Conversation

@bihius
Copy link
Copy Markdown
Owner

@bihius bihius commented Jun 6, 2026

Summary

  • Remove universal TPR/FPR targets from the evaluation plan and keep only RPS degradation as a soft lab guardrail
  • Add a tagged labeled corpus runner for defensible TP/FN/TN/FP counting
  • Rework go-ftw, ZAP, and Nuclei reporting so only the corpus publishes clean classification metrics
  • Update collector logic, lab wiring, and docs to match the revised methodology

Testing

  • Added and passed unit tests for the new metrics helpers and FTW classification logic
  • Verified shell syntax for the lab runners and setup scripts
  • Verified the new metrics module compiles and the benchmark Make targets expand correctly

@bihius bihius merged commit 0d25d48 into main Jun 6, 2026
3 checks passed
bihius added a commit that referenced this pull request Jun 6, 2026
M3 and M5 are complete; M6 benchmark work is underway (PR #230 merged).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant