Skip to content

Release 0.7.1

Choose a tag to compare

@dylanuys dylanuys released this 12 Jun 22:43
· 3 commits to main since this release
e0a1a22

Summary

Two changes shipping together.

Sampling floors stopped mode budgets from meaning anything once the corpus grew: small mode asks for 600 total video samples, but the per-dataset minimums inflated that to ~24,200 across 238 datasets, pushing entrance exams past their 2-hour timeout. The floors now scale down to fit the mode's total budget; full-mode allocations are unchanged.

Version single-sourcing removes the duplicated version string — pyproject.toml is now the only place the version lives.

What changes

Sampling: per-dataset floors now respect the mode's total budget

calculate_weighted_dataset_sampling clamps every dataset's allocation to static floors (REGULAR_DATASET_MIN_SAMPLES=100, GASSTATION_DATASET_MIN_SAMPLES=500). With the video corpus at 238 datasets, these floors silently override the mode targets — the first v18 entrance exam processed 154/238 datasets in 2 hours, timed out, and the model was incorrectly failed and blocked.

The floors now cap at an even share of target_total_samples (target // num_datasets, gasstation-weighted), so a mode's budget holds regardless of corpus size:

Run Before After
small video, 238 datasets ~24,200 samples (timeout) ~486 samples (~20–40 min)
full video 107/dataset, ~25,900 unchanged
full image 282/dataset, ~55,000 unchanged

Full-mode allocations are numerically identical — the floors only ever bound when the budget was being ignored, so production benchmark scores are unaffected. If more per-dataset signal is wanted in exams, BENCHMARK_TOTAL_OVERRIDES["small"] is now an honest dial (e.g. 600 → 2,400 ≈ 10/dataset).

Version: pyproject.toml is the single source

__version__ was hardcoded in both pyproject.toml and gasbench/__init__.py and had to be bumped in lockstep. __init__.py now derives it via importlib.metadata.version("gasbench") (with a 0.0.0+unknown fallback for uninstalled source checkouts). The CLI --version flag is unaffected.