Features
- Support preparing remote WebDatasets and media metadata with local temporary SQLite indexes before upload (#221)
- Replace custom Fastseek video seeking with PyAV/FFmpeg index based seeking and probe fallback, including AVI support (#214)
- Add permission handling for generated local metadata files during
prepare/prepare-media(#216) - Add
allinstall extra and document CPU torch backend installation (#210) - Add DSS URL support, including
dss://...,filesystem+dss://..., and DSS auxiliary paths (#209, #213, #226)
Fixes
- Fix detailed sample printing by rendering nested samples as YAML with improved tensor/array summaries (#235)
- Fix
load_dataset()with dict configs by callingpost_initialize()(#245) - Fix GC freeze handling for CUDA, Dynamo output graphs, and distributed work objects in workers (#237)
- Fix duplicate
SkipSampleexport/catching by using the canonical error class and deprecating the wrapper alias (#223) - Fix stale remote local-copy cache invalidation using remote modified timestamps (#221)
- Remove redundant SQLite sample-key index while preserving duplicate-key rollback/reporting in batched index writes (#224, #225)
Other
- Batch WebDataset indexing and SQLite writes for faster
prepare_dataset()on large datasets (#225) - Update dependencies, including
multi-storage-client>=0.40.0andav>=17.0.0; remove old Fastseek-only optional dependencies (#210, #214) - Add no-extra install smoke test and CI workflow (#204)
- Update docs for remote prepare, DSS URLs, AV decoding, and remote filesystem auxiliary data (#246)
- Add Cursor unittest workflow/rules for local development (#225)
New Contributors
- @bbuschkaemper made their first contribution in #223
Full Changelog: 7.3.2...7.4.0