·
2 commits
to refs/heads/main
since this release
What's Changed π
β¨ Features
- feat: add try_cast function for safe type conversion @XuQianJin-Stars (#6960)
- feat(shuffle): default flight shuffle compression to lz4 @colin-ho (#7071)
- feat(iceberg): support branch and tag reads @jackylee-ch (#7042)
- feat: thread assume_sorted_and_aligned_partitions parameter through ASOF join @euanlimzx (#7067)
- feat(checkpoint): distributed observability counters @rohitkulshreshtha (#7026)
- feat(functions): add UUIDv7 timestamp-extraction partition transforms @jaychia (#7032)
- feat: tests for range repartition hints in clustering spec @euanlimzx (#7044)
- feat: allow custom datasources to specify clustering spec @jaychia (#7031)
- feat(iceberg): accept PathLike metadata paths in read_iceberg @jackylee-ch (#7040)
- feat(ext): native extension support on ray @rchowell (#6996)
- feat: support PostgreSQL as a structured data source via Gravitino @qingfeng-occ (#6989)
- feat: Better extension display @universalmind303 (#6802)
- feat(gcs): implement delete for GCS object store @daiping8 (#6958)
- feat(inline-agg): symbolize string group keys in multi-column grouped aggregation @BABTUNA (#6748)
- feat: first_value / last_value aggs for window functions @euanlimzx (#6974)
- feat(iceberg): auto-configure IOConfig for Alibaba Cloud OSS @plusplusjiajia (#6993)
- feat: nearest asof joins @euanlimzx (#6953)
- feat(functions): add sample_interval_seconds to video_frames @TheR1sing3un (#6832)
- feat: wrappers for using Daft's FileIO / open from top-level ops @srilman (#6986)
- feat: configurable flight shuffle compression @colin-ho (#6979)
- feat(checkpoint): idempotent write_deltalake + cross-sink helpers @rohitkulshreshtha (#6932)
π Bug Fixes
- fix(asof-join): use unknown clustering spec instead of hash @euanlimzx (#7075)
- fix(ray): configure worker startup timeout on runner @colin-ho (#7055)
- fix(shuffle): concat recordbatches before repartition @colin-ho (#7064)
- fix(flight-shuffle): reduce coordinator memory to O(map_tasks + partitions) @colin-ho (#7056)
- fix(planner): propagate clustering through bound projections @jaychia (#7033)
- fix(filesystem): fix pyarrow fs memory by caching by value, not identity @rchowell (#7025)
- fix: strip trailing slash from resolved write/read paths @colin-ho (#6987)
- fix(integration): ray data deprecated api fix @kyuds (#6871)
- fix(io): use checked_shl and saturating_mul in retry jitter calculation @ARDA7787 (#6949)
- fix: cast generic file arrays to media-specific types @everettVT (#6848)
- fix(daft-distributed): remove unused hash_map_macro feature gate @mikedep333 (#6981)
- fix: gate
Selfimport on Python 3.11+ @everettVT (#6971)
π Performance
- perf: update jemalloc 5.3.0 β 5.3.1 to fix muzzy decay performance bug @everettVT (#7059)
- perf(window): reduce unnecessary copies of data in finalize() step of window functions to reduce memory usage @euanlimzx (#7006)
- perf(window): return Series from window ops to reduce intermediate RecordBatch copies @euanlimzx (#7011)
- perf(flotilla): batch flight partition refs per finalize @cckellogg (#7008)
- perf: Specialized Code Path for Hash Join on Ints @srilman (#6644)
- perf: enable per-column compression configuration @rchowell (#6884)
β»οΈ Refactor
- refactor(checkpoint): drop "seal" vocabulary from Rust API surface @rohitkulshreshtha (#7078)
- refactor(file): rename File byte-range fields to position/size @aaron-ang (#6747)
- refactor(distributed): rename needs_hash_repartition to can_skip_hash_repartition @euanlimzx (#7053)
- refactor(checkpoint): bincode-encoded versioned FileMetadata blobs @rohitkulshreshtha (#7001)
π Documentation
- docs: standardize Slack links to use daft.ai/slack @everettVT (#7066)
- docs: reorder UDF API docs to prioritize new API and deprecate legacy @everettVT (#7028)
- docs(paimon): clarify object-store IO config usage @jackylee-ch (#7029)
- docs: add shuffle algorithms tuning guide @colin-ho (#7017)
- docs(lance): fix daft-lance repository links @jackylee-ch (#7016)
β Tests
- test(lance): use daft-lance APIs directly @jackylee-ch (#7010)
- test(parquet): cover embedding and image roundtrips @jackylee-ch (#7021)
- test(iceberg): cover partition field transform conversion @jackylee-ch (#7024)
- test(lance): import test helpers from daft-lance @jackylee-ch (#7015)
- test(checkpoint): optional real-S3 routing for checkpoint test suites @rohitkulshreshtha (#6934)
π· CI
π§ Maintenance
- chore(deps): bump the all group across 1 directory with 59 updates @dependabot[bot] (#7000)
- chore: bump minimum PyArrow version to 16 @0xdeadd (#6868)
β¬οΈ Dependencies
- chore(deps): bump the all group across 1 directory with 59 updates @dependabot[bot] (#7000)
Full Changelog: v0.7.14...v0.7.15