Benchmark dataset evaluating code review and security analysis tools on the OpenSSF CVE Benchmark.
Last updated: February 8, 2026
Final evaluation results in JSONL format with fields:
cve_id: CVE identifiervariant:fixedorunfixeddetected_issues: Issues found by the toolTP,FP,TN,FN: Classification metricsjudge_reasoning: Explanation of the judgment
Intermediate formatted results from each tool, normalized for comparison.
Original tool outputs per CVE, preserving the exact response from each tool.
The archive/ directory contains prompts and data from earlier benchmark runs: