Skip to content

[codex] Add benchmark leakage audit assistant#207

Open
Davifek wants to merge 1 commit into
SCIBASE-AI:mainfrom
Davifek:codex/benchmark-leakage-audit-16
Open

[codex] Add benchmark leakage audit assistant#207
Davifek wants to merge 1 commit into
SCIBASE-AI:mainfrom
Davifek:codex/benchmark-leakage-audit-16

Conversation

@Davifek
Copy link
Copy Markdown

@Davifek Davifek commented May 18, 2026

@algora-pbc /claim #16

Summary

Adds a distinct benchmark-leakage-audit-assistant/ slice for the AI-Powered Research Assistant Suite bounty. This module focuses on ML/research evaluation hygiene before publication or release review.

It detects:

  • train/test overlap by record ID or normalized content fingerprint
  • benchmark contamination in the training corpus
  • final holdout or test-set use during model selection
  • weak split provenance such as missing seed or manifest hash
  • missing reproducibility packet artifacts

Demo

Included short demo artifact: benchmark-leakage-audit-assistant/demo.gif

Run locally:

node benchmark-leakage-audit-assistant/demo.js

Verification

Passed locally:

node benchmark-leakage-audit-assistant/test.js
node benchmark-leakage-audit-assistant/demo.js
git diff --cached --check

The tests cover a contaminated project that is blocked, a clean project that passes without false positives, and reviewer-ready evidence/remediation fields for every finding.

Notes

  • Dependency-free Node.js standard library implementation
  • Synthetic sample data only
  • No credentials, external services, or network access required
  • Includes a requirement map for AI-Powered Research Assistant Suite #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant