What's Changed π
β¨ Features
- feat(droid): Add
daft.datasets.droid@srilman (#7089) - feat(temporal): add Spark-style timezone conversions @BABTUNA (#6919)
- feat: add top-level daft.concat() for concatenating multiple DataFrames @Liusixuuu (#7105)
- feat(file): Add a
file_existsexpression @srilman (#7140) - feat: Add
DataFrame.to_torch_dataloader@srilman (#6997) - feat(daft-ext): add scalar #[daft_func] proc macro @rchowell (#7043)
- feat(iceberg): support SQL read_iceberg ignore_corrupt_files @jackylee-ch (#7130)
- feat: add Spark-compatible string functions (translate, substring_index, soundex, ascii, chr, space) @XuQianJin-Stars (#7070)
- feat(iceberg): support branch and tag reads in SQL @jackylee-ch (#7084)
- feat: add support for range partition hints in clustering spec @euanlimzx (#7050)
- feat: tests for aligned asof join @euanlimzx (#7072)
- feat: add ignore_corrupt_files option to read_parquet, read_csv and read_iceberg @chenghuichen (#6520)
- feat(ray): Implement dynamic scale-in for RaySwordfishActor @huleilei (#5903)
- feat(delta-lake): support column mapping for reads @aaron-ang (#7005)
- feat(functions): add string distance/similarity functions @nish2292 (#7068)
π Bug Fixes
- fix(scan): estimate scan size from materialized buffer, not encoded size @madvart (#7161)
- fix(scan): use parquet metadata for scan task size estimates @desmondcheongzx (#6542)
- fix(session): list_namespaces mirrors list_tables resolution @YuangGao (#7144)
- fix(tracing): honor OTEL resource env config @RitwijParmar (#7061)
- fix(parquet): Use chunk size for ArrayReader buffer size @srilman (#7149)
- fix(ci): pin maturin <1.14 to keep --timings=html working @XiaoHongbo-Hope (#7145)
- fix: Fix flaky dataloader test @srilman (#7142)
- fix: detect unschedulable actor UDFs and warn on over-requested concurrency @XiaoHongbo-Hope (#7113)
- fix(session): list_tables mirrors get_table resolution and returns Identifier @YuangGao (#7126)
- fix: actor UDF followed by BlockingSink deadlocks due to held channel sender @XiaoHongbo-Hope (#7115)
- fix(sql): allow read_parquet file options @jackylee-ch (#7128)
- fix(parquet): cast map values to explicit schema @jackylee-ch (#7093)
- fix(mcap): support http urls through fsspec handler @RitwijParmar (#7062)
- fix(deps): bump ray floor to >=2.11.0 @rohitkulshreshtha (#7116)
- fix(native-runner): assign unique plan-cache fingerprint when caller omits one @TechyMT (#7088)
- fix(flotilla): tolerate node churn during worker startup @desmondcheongzx (#7100)
- fix(deps): gate vllm to Linux so macOS/Windows resolve without CUDA wheels @rohitkulshreshtha (#7095)
- fix: pass options in Gravitino PostgreSQL read method @qingfeng-occ (#7047)
π Performance
- perf(inline-agg): add BoolAnd and BoolOr accumulator types @BABTUNA (#6984)
- perf(inline-agg): add Product accumulator type @BABTUNA (#6975)
π Documentation
- docs: explain lazy execution in architecture flow @everettVT (#6994)
- docs: collapse sidebar nav and group Guide sections @colin-ho (#7131)
- docs: update uuid function docstring @everettVT (#7106)
β Tests
- test(hf): retry transient network errors in HF integration tests @XuQianJin-Stars (#7120)
- test: tolerate HuggingFace Hub HTTP 429 in classifier and IO tests @XuQianJin-Stars (#7080)
- test(parquet): cover read_parquet edge cases @jackylee-ch (#7085)
π§ Maintenance
- chore: upgrade arrow-rs from 57.1.0 to 59.0.0 @mikedep333 (#7141)
- chore: Replace TOS support with OpenDAL @srilman (#7135)
- chore: upgrade opendal from 0.55 to 0.57.0 @XuQianJin-Stars (#7037)
π New Contributors
- @Liusixuuu made their first contribution in #7105
- @nish2292 made their first contribution in #7068
- @TechyMT made their first contribution in #7088
- @RitwijParmar made their first contribution in #7061
- @XiaoHongbo-Hope made their first contribution in #7113
Full Changelog: v0.7.15...v0.7.16