Skip to content

v1.5.1: open-source polish

Choose a tag to compare

@sanchitmonga22 sanchitmonga22 released this 27 May 21:47

Open-source polish release. Addresses every finding from the pre-publish audit pass — security, licensing, UX, hygiene. No code-behaviour changes; safe to take.

Full release notes: docs/release-notes/v1.5.1.md.

What changed

Licensing simplified

  • Deleted NOTICE.md, LICENSE-DATA, LICENSE.md.
  • Single LICENSE (MIT) now covers code, data, charts, and docs prose.
  • Citation request lives in the README's bibtex block.

Documentation rewritten

  • README.md — six-cell headline table, full prereq + quickstart with realistic time/cost estimates, "picking a config for real work" section distilled from the v1.5 leaderboard, full bench CLI table.
  • AGENTS.md — refreshed for v1.5.0: D6 task class documented, v1.5 configs added to the tree, conventions reflect that single-letter codes (A/B/D) are retired.
  • CODE_OF_CONDUCT.md — short and direct.
  • Source-tree docstrings + per-task READMEs — lib.* rewritten to core.*; "Category D / B / X" rewritten to refactors / real-prs / puzzles end-to-end.

UX cleanup

  • Deleted scripts/reproduce.sh./bench setup already does prereq checks, smoke is ./bench sweep --config configs/v1.4-smoke.yaml.
  • Deleted logs/v3.3/ — historical sweep logs moved out of git. logs/ is now gitignored.

Hygiene

  • __version__ bumped 0.1.0 → 1.5.1 (it was stuck at 0.1.0).
  • pytest moved from runtime deps to [dev] extras.
  • Removed unused pytest -m slow filter from CI + docs.
  • .github/ISSUE_TEMPLATE/new_model.md: broken configs/variants/_template.yaml reference fixed.
  • docs/HYBRID_ROUTING_DESIGN.md + v1.4.{0,1} release notes: jq snippets updated from legacy D::cline::heuristic to refactors::cline::heuristic.
  • Test aliases r10_cline, r6_mini_swe_agent renamed.

Privacy

  • Sanitized 263 absolute-path leaks (/Users/<owner>/...) in tracked raw.jsonl / progress.log. JSON re-validated on every row (520 rows, 0 parse errors).

Verification

  • 120 fast tests pass on Python 3.11 + 3.12 (CI matrix).
  • ruff check src/ tests/ clean.

Citation

If you use this benchmark, a citation would be really appreciated. BibTeX in the README.


📦 Dataset

results-v1.5.1.tar.gz is byte-identical to the v1.5.0 dataset — v1.5.1 added 0 new benchmark rows (it is an open-source-polish release). It is attached here so visitors landing on the Latest release can download the data directly. The canonical 1,704-row dataset is unchanged since v1.5.0.

gh release download v1.5.1 -p results-v1.5.1.tar.gz   # or v1.5.0 — same bytes