diff --git a/dev/changelog/0.15.0.md b/dev/changelog/0.15.0.md new file mode 100644 index 0000000000..f57fbe1921 --- /dev/null +++ b/dev/changelog/0.15.0.md @@ -0,0 +1,207 @@ + + +# DataFusion Comet 0.15.0 Changelog + +This release consists of 142 commits from 19 contributors. See credits at the end of this changelog for more information. + +**Fixed bugs:** + +- fix: enable native_datafusion Spark SQL tests previously ignored in #3315 [#3696](https://github.com/apache/datafusion-comet/pull/3696) (andygrove) +- fix: route file-not-found errors through SparkError JSON path [#3699](https://github.com/apache/datafusion-comet/pull/3699) (andygrove) +- fix: fall back from native_datafusion for duplicate fields in case-insensitive mode [#3687](https://github.com/apache/datafusion-comet/pull/3687) (andygrove) +- fix: enable more Spark SQL tests for `native_datafusion` (`DynamicPartitionPruningSuite` / `ExplainSuite`) [#3694](https://github.com/apache/datafusion-comet/pull/3694) (andygrove) +- fix: Correct GetArrayItem null handling for dynamic indices and re-enable native execution [#3709](https://github.com/apache/datafusion-comet/pull/3709) (0lai0) +- fix: enable native_datafusion Spark SQL tests for #3320, #3401, #3719 [#3718](https://github.com/apache/datafusion-comet/pull/3718) (andygrove) +- fix: Native engine crashes on literal DateTrunc and TimestampTrunc [#3668](https://github.com/apache/datafusion-comet/pull/3668) (0lai0) +- fix: Use the loaded Comet extension too (Spark 3.5.8) [#3707](https://github.com/apache/datafusion-comet/pull/3707) (martin-g) +- fix: Use thread context classloader for Iceberg class loading [#3738](https://github.com/apache/datafusion-comet/pull/3738) (karuppayya) +- fix: disable ANSI mode in benchmarks to avoid exceptions on invalid input [#3750](https://github.com/apache/datafusion-comet/pull/3750) (parthchandra) +- fix: fix string to timestamp cast for UTC timestamps [#3656](https://github.com/apache/datafusion-comet/pull/3656) (parthchandra) +- fix: native error message not propagated to SparkException on empty errorClass [#3727](https://github.com/apache/datafusion-comet/pull/3727) (manuzhang) +- fix: add timezone and special formats support for cast string to timestamp [#3730](https://github.com/apache/datafusion-comet/pull/3730) (parthchandra) +- fix: handle inf/-inf/nan in ShimSparkErrorConverter cast overflow [#3768](https://github.com/apache/datafusion-comet/pull/3768) (manuzhang) +- fix: handle scalar decimal value overflow correctly in ANSI mode [#3803](https://github.com/apache/datafusion-comet/pull/3803) (parthchandra) +- fix: correct array_append return type and mark as Compatible [#3795](https://github.com/apache/datafusion-comet/pull/3795) (andygrove) +- fix: remove broken directBuffer feature for parquet reads [#3814](https://github.com/apache/datafusion-comet/pull/3814) (andygrove) +- fix: remove unnecessary IgnoreCometNativeDataFusion tags from 3.5.8 diff [#3831](https://github.com/apache/datafusion-comet/pull/3831) (andygrove) +- fix: query tolerance= in SQL file tests now also asserts Comet native execution [#3797](https://github.com/apache/datafusion-comet/pull/3797) (andygrove) +- fix: include scan impl in PR Linux artifact names [#3853](https://github.com/apache/datafusion-comet/pull/3853) (manuzhang) +- fix: correct invalid Option.contains assertion in cast test [#3851](https://github.com/apache/datafusion-comet/pull/3851) (manuzhang) +- fix: native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields [#3808](https://github.com/apache/datafusion-comet/pull/3808) (vaibhawvipul) +- fix: cache object stores and bucket regions to reduce DNS query volume [#3802](https://github.com/apache/datafusion-comet/pull/3802) (andygrove) +- fix: skip Comet columnar shuffle for stages with DPP scans [#3879](https://github.com/apache/datafusion-comet/pull/3879) (andygrove) +- fix: Native_datafusion reports correct files and bytes scanned [#3798](https://github.com/apache/datafusion-comet/pull/3798) (0lai0) +- fix: address clippy collapsible_match warnings [#3863](https://github.com/apache/datafusion-comet/pull/3863) (manuzhang) +- fix: parameterize file count in Native_datafusion metrics test [#3896](https://github.com/apache/datafusion-comet/pull/3896) (0lai0) +- fix: Make cast string to timestamp compatible with Spark [#3884](https://github.com/apache/datafusion-comet/pull/3884) (parthchandra) +- fix: add EmptySchemaShufflePartitioner and test from #3858 [#3893](https://github.com/apache/datafusion-comet/pull/3893) (mbutrovich) +- fix: use min instead of max when capping write buffer size to Int range [#3914](https://github.com/apache/datafusion-comet/pull/3914) (andygrove) +- fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change [#3915](https://github.com/apache/datafusion-comet/pull/3915) (parthchandra) +- fix: audit array_insert expression for correctness and test coverage [#3890](https://github.com/apache/datafusion-comet/pull/3890) (andygrove) +- fix: handle ambiguous and non-existent local times [#3865](https://github.com/apache/datafusion-comet/pull/3865) (matthewalex4) +- fix: improve tracing feature [#3688](https://github.com/apache/datafusion-comet/pull/3688) (andygrove) +- fix: make tan and atan2 compatible [#3849](https://github.com/apache/datafusion-comet/pull/3849) (kazuyukitanimura) +- fix: checkSparkAnswer displays incorrect labels [#3927](https://github.com/apache/datafusion-comet/pull/3927) (parthchandra) +- fix: support full-width and null characters, and negative scale in string to decimal [#3922](https://github.com/apache/datafusion-comet/pull/3922) (parthchandra) +- fix: enable Corr [#3892](https://github.com/apache/datafusion-comet/pull/3892) (kazuyukitanimura) +- fix: array to array cast [#2897](https://github.com/apache/datafusion-comet/pull/2897) (manuzhang) +- fix: exclude tpcds-plan-stability extended.txt files from rat license check [#3964](https://github.com/apache/datafusion-comet/pull/3964) (andygrove) +- fix: use UTC for Arrow schema timezone in SparkToColumnar conversions [#3878](https://github.com/apache/datafusion-comet/pull/3878) (andygrove) +- fix: remove spurious .flatten call that garbled SortMergeJoin fallback messages [#3968](https://github.com/apache/datafusion-comet/pull/3968) (andygrove) +- fix: Add legacy mode handling to cast Decimal to String [#3939](https://github.com/apache/datafusion-comet/pull/3939) (parthchandra) +- fix: improve test coverage for decimal to primitive type casts [#3948](https://github.com/apache/datafusion-comet/pull/3948) (parthchandra) +- fix: fix decimal div and add tests [#3952](https://github.com/apache/datafusion-comet/pull/3952) (parthchandra) +- fix: make shuffle fallback decisions sticky across planning passes [#3982](https://github.com/apache/datafusion-comet/pull/3982) (andygrove) + +**Performance related:** + +- perf: Coalesce broadcast exchange batches before broadcasting [#3703](https://github.com/apache/datafusion-comet/pull/3703) (mbutrovich) +- perf: stop using FFI in native shuffle read path [#3731](https://github.com/apache/datafusion-comet/pull/3731) (andygrove) +- perf: Enable native c2r for more queries [#3764](https://github.com/apache/datafusion-comet/pull/3764) (andygrove) +- perf: Mark more operators as FFI safe to avoid deep copies [#3765](https://github.com/apache/datafusion-comet/pull/3765) (andygrove) +- perf: remove BufReader wrapper when copying spill files to shuffle output [#3861](https://github.com/apache/datafusion-comet/pull/3861) (andygrove) +- fix: share unified memory pools across native execution contexts within a task [#3924](https://github.com/apache/datafusion-comet/pull/3924) (andygrove) + +**Implemented enhancements:** + +- feat: Add PR review skill for Comet expression reviews [#3711](https://github.com/apache/datafusion-comet/pull/3711) (andygrove) +- feat: add sort_array benchmark [#3758](https://github.com/apache/datafusion-comet/pull/3758) (grorge123) +- feat: Support Spark expression days [#3746](https://github.com/apache/datafusion-comet/pull/3746) (0lai0) +- feat: expose comet metrics through Sparks external monitoring system [#3708](https://github.com/apache/datafusion-comet/pull/3708) (coderfender) +- feat: support SQL aggregate FILTER (WHERE ...) clause in native execution [#3835](https://github.com/apache/datafusion-comet/pull/3835) (viirya) +- feat: Implement CRC32C algorithm [#3822](https://github.com/apache/datafusion-comet/pull/3822) (snmvaughan) +- feat: add audit-comet-expression Claude Code skill [#3793](https://github.com/apache/datafusion-comet/pull/3793) (andygrove) +- feat: enable native_datafusion scan in auto mode [#3781](https://github.com/apache/datafusion-comet/pull/3781) (andygrove) +- feat: support LEAD and LAG window functions with IGNORE NULLS [#3876](https://github.com/apache/datafusion-comet/pull/3876) (viirya) +- feat: add standalone shuffle benchmark tool [#3752](https://github.com/apache/datafusion-comet/pull/3752) (andygrove) +- feat: Mark array_compact as Compatible and improve test coverage [#3889](https://github.com/apache/datafusion-comet/pull/3889) (andygrove) +- feat: add native support for get_json_object expression [#3747](https://github.com/apache/datafusion-comet/pull/3747) (andygrove) +- feat: Support Spark expression hours [#3804](https://github.com/apache/datafusion-comet/pull/3804) (0lai0) +- feat: add support for date_from_unix_date expression [#3144](https://github.com/apache/datafusion-comet/pull/3144) (andygrove) +- feat: support spark bin function [#3928](https://github.com/apache/datafusion-comet/pull/3928) (kazantsev-maksim) +- feat: support sort_array expression [#3706](https://github.com/apache/datafusion-comet/pull/3706) (grorge123) + +**Documentation updates:** + +- docs: Add some .lldbint configurations for debugging document [#3686](https://github.com/apache/datafusion-comet/pull/3686) (wForget) +- docs: document Iceberg Spark tests in contributor guide [#3777](https://github.com/apache/datafusion-comet/pull/3777) (mbutrovich) +- docs: document negative zero cast-to-string incompatibility [#3811](https://github.com/apache/datafusion-comet/pull/3811) (andygrove) +- docs: Add docs about global singletons to development guide [#3809](https://github.com/apache/datafusion-comet/pull/3809) (mbutrovich) +- docs: add bug triage guide for prioritizing open issues [#3812](https://github.com/apache/datafusion-comet/pull/3812) (andygrove) +- docs: broaden area:writer and area:scan label descriptions [#3843](https://github.com/apache/datafusion-comet/pull/3843) (andygrove) +- docs: expand profiling guide with JVM and async-profiler coverage [#3628](https://github.com/apache/datafusion-comet/pull/3628) (andygrove) +- doc: GetArrayItem is now supported [#3880](https://github.com/apache/datafusion-comet/pull/3880) (kazuyukitanimura) +- docs: update Iceberg docs to reflect capabilities [#3961](https://github.com/apache/datafusion-comet/pull/3961) (mbutrovich) +- docs: clarify Maven staging behavior across release candidates [#3963](https://github.com/apache/datafusion-comet/pull/3963) (andygrove) +- docs: document CI test suite registration requirement [#3943](https://github.com/apache/datafusion-comet/pull/3943) (andygrove) +- docs: Add documentation for running spark-sql-perf [#3950](https://github.com/apache/datafusion-comet/pull/3950) (andygrove) + +**Other:** + +- ci: remove Java Iceberg integration tests from CI [iceberg] [#3673](https://github.com/apache/datafusion-comet/pull/3673) (andygrove) +- build: revert "chore(deps): bump runs-on/action from 2.0.3 to 2.1.0" [#3714](https://github.com/apache/datafusion-comet/pull/3714) (blaginin) +- chore(deps): bump lz4_flex from 0.12.0 to 0.12.1 in /native [#3713](https://github.com/apache/datafusion-comet/pull/3713) (dependabot[bot]) +- chore: Add changelog for 0.14.0 release [#3681](https://github.com/apache/datafusion-comet/pull/3681) (andygrove) +- chore: bump version to 0.15.0-SNAPSHOT [#3715](https://github.com/apache/datafusion-comet/pull/3715) (andygrove) +- chore: update documentation links for 0.14.0 release [#3716](https://github.com/apache/datafusion-comet/pull/3716) (andygrove) +- Fix: map_from_arrays() with NULL inputs causes native crash [#3356](https://github.com/apache/datafusion-comet/pull/3356) (kazantsev-maksim) +- chore: Refactor planner random and partition expressions [#3704](https://github.com/apache/datafusion-comet/pull/3704) (coderfender) +- test: enable ignored 4.0 tests, enable ansi mode [#3454](https://github.com/apache/datafusion-comet/pull/3454) (parthchandra) +- chore: keep original error message for failed SQL test [#3725](https://github.com/apache/datafusion-comet/pull/3725) (comphead) +- build: lint as a separate step [#3717](https://github.com/apache/datafusion-comet/pull/3717) (blaginin) +- chore(deps): bump lz4_flex from 0.12.1 to 0.13.0 in /native [#3744](https://github.com/apache/datafusion-comet/pull/3744) (dependabot[bot]) +- chore(deps): bump runs-on/action from 2.0.3 to 2.1.0 [#3741](https://github.com/apache/datafusion-comet/pull/3741) (dependabot[bot]) +- chore: remove iceberg-java integration [#3739](https://github.com/apache/datafusion-comet/pull/3739) (andygrove) +- chore: refactor to extract `common` and `jni-bridge` as separate crates [#3667](https://github.com/apache/datafusion-comet/pull/3667) (andygrove) +- chore(deps): bump rustls-webpki from 0.103.9 to 0.103.10 in /native [#3751](https://github.com/apache/datafusion-comet/pull/3751) (dependabot[bot]) +- chore(deps): bump github/codeql-action from 4.32.6 to 4.33.0 [#3742](https://github.com/apache/datafusion-comet/pull/3742) (dependabot[bot]) +- chore(deps): bump cc from 1.2.56 to 1.2.57 in /native in the all-other-cargo-deps group [#3743](https://github.com/apache/datafusion-comet/pull/3743) (dependabot[bot]) +- chore: extract shuffle module into separate crate [#3749](https://github.com/apache/datafusion-comet/pull/3749) (andygrove) +- chore: run Spark 3.4 tests with `native_datafusion` scan [#3722](https://github.com/apache/datafusion-comet/pull/3722) (andygrove) +- chore: [native_datafusion] replace #3311 references with specific issues in 3.5.8 diff [#3761](https://github.com/apache/datafusion-comet/pull/3761) (andygrove) +- chore: fix allocations in schema adapter for `native_datafusion` scan [#3755](https://github.com/apache/datafusion-comet/pull/3755) (comphead) +- chore: update Iceberg Java diffs after #3739 [iceberg] [#3774](https://github.com/apache/datafusion-comet/pull/3774) (mbutrovich) +- chore(deps): update datafusion to 52.4.0 [iceberg] [#3769](https://github.com/apache/datafusion-comet/pull/3769) (andygrove) +- test: Port DateTimeUtilsSuite timestamp format tests in Comet [#3780](https://github.com/apache/datafusion-comet/pull/3780) (parthchandra) +- build: add `CometDateTimeUtilsSuite` to CI workflow [#3782](https://github.com/apache/datafusion-comet/pull/3782) (andygrove) +- chore: Run Spark 4.0 SQL tests with native_datafusion scan [#3728](https://github.com/apache/datafusion-comet/pull/3728) (andygrove) +- Test: Add test coverage and documentation for SumDecimal/AvgDecimal nullability behavior [#3766](https://github.com/apache/datafusion-comet/pull/3766) (vaibhawvipul) +- tests: fix Iceberg test diffs for Spark 3.4 [iceberg] [#3785](https://github.com/apache/datafusion-comet/pull/3785) (mbutrovich) +- ci: run Iceberg Spark tests on all PRs and commits to main branch [#3792](https://github.com/apache/datafusion-comet/pull/3792) (mbutrovich) +- chore(deps): bump github/codeql-action from 4.33.0 to 4.34.1 [#3805](https://github.com/apache/datafusion-comet/pull/3805) (dependabot[bot]) +- chore(deps): bump the all-other-cargo-deps group in /native with 3 updates [#3806](https://github.com/apache/datafusion-comet/pull/3806) (dependabot[bot]) +- refactor: reorganize shuffle crate module structure [#3772](https://github.com/apache/datafusion-comet/pull/3772) (andygrove) +- chore: update git plugin to allow worktrees [#3815](https://github.com/apache/datafusion-comet/pull/3815) (parthchandra) +- chore: Remove `SupportsComet` interface [#3818](https://github.com/apache/datafusion-comet/pull/3818) (andygrove) +- Replace catalyst.util.fileToString with Files.readString [#3844](https://github.com/apache/datafusion-comet/pull/3844) (snmvaughan) +- test: cast negative zero to string [#3829](https://github.com/apache/datafusion-comet/pull/3829) (kazuyukitanimura) +- test: add SQL file test for casting double to string [#3854](https://github.com/apache/datafusion-comet/pull/3854) (andygrove) +- chore(deps): bump jni from 0.21.1 to 0.22.4 in /native [#3753](https://github.com/apache/datafusion-comet/pull/3753) (manuzhang) +- test: ceil and floor works correctly for Decimal128 [#3848](https://github.com/apache/datafusion-comet/pull/3848) (kazuyukitanimura) +- chore(deps): bump the all-other-cargo-deps group in /native with 2 updates [#3899](https://github.com/apache/datafusion-comet/pull/3899) (dependabot[bot]) +- chore(deps): bump github/codeql-action from 4.34.1 to 4.35.1 [#3898](https://github.com/apache/datafusion-comet/pull/3898) (dependabot[bot]) +- chore(deps): bump actions/github-script from 7 to 8 [#3897](https://github.com/apache/datafusion-comet/pull/3897) (dependabot[bot]) +- chore: add SQL tests for FIRST/LAST aggregates [#3891](https://github.com/apache/datafusion-comet/pull/3891) (comphead) +- test: do not ignore test SPARK-48037 [#2774](https://github.com/apache/datafusion-comet/pull/2774) (kazuyukitanimura) +- deps: upgrade to DataFusion 53.0, Arrow to 58.1 [#3629](https://github.com/apache/datafusion-comet/pull/3629) (mbutrovich) +- chore: `native_datafusion` to report scan task input metrics [#3842](https://github.com/apache/datafusion-comet/pull/3842) (comphead) +- test: improve array_distinct test coverage and incompatibility description [#3887](https://github.com/apache/datafusion-comet/pull/3887) (andygrove) +- ci: remove native_datafusion CI workflows after 4f5eaf0 [#3908](https://github.com/apache/datafusion-comet/pull/3908) (mbutrovich) +- test: Enable more Spark tests [#3905](https://github.com/apache/datafusion-comet/pull/3905) (kazuyukitanimura) +- chore: reenable `width_bucket` test [#3910](https://github.com/apache/datafusion-comet/pull/3910) (comphead) +- chore: reenable `array_contains` tests [#3912](https://github.com/apache/datafusion-comet/pull/3912) (comphead) +- chore: reenable `array_remove` tests [#3917](https://github.com/apache/datafusion-comet/pull/3917) (comphead) +- chore(deps): bump actions/github-script from 8 to 9 [#3923](https://github.com/apache/datafusion-comet/pull/3923) (dependabot[bot]) +- chore: Remove redundant `parquet.enable.dictionary` ConfigMatrix from SQL tests [#3866](https://github.com/apache/datafusion-comet/pull/3866) (andygrove) +- chore(deps): bump rand from 0.10.0 to 0.10.1 in /native [#3942](https://github.com/apache/datafusion-comet/pull/3942) (dependabot[bot]) +- chore: add changelog for 0.14.1 [#3944](https://github.com/apache/datafusion-comet/pull/3944) (andygrove) +- deps: bump iceberg-rust to latest after picking up fixes for #3856 and #3860 [#3958](https://github.com/apache/datafusion-comet/pull/3958) (mbutrovich) +- deps: update to `opendal` revision where object_store `get_ranges` performance fixed [#3965](https://github.com/apache/datafusion-comet/pull/3965) (comphead) +- chore: Register task completion listener to ensure CometExecIterator is always closed [#3959](https://github.com/apache/datafusion-comet/pull/3959) (wForget) +- chore(deps): bump github/codeql-action from 4.35.1 to 4.35.2 [#3972](https://github.com/apache/datafusion-comet/pull/3972) (dependabot[bot]) +- deps: upgrade to DataFusion 53.1 [#3946](https://github.com/apache/datafusion-comet/pull/3946) (mbutrovich) + +## Credits + +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. + +``` + 55 Andy Grove + 14 Parth Chandra + 14 dependabot[bot] + 12 Matt Butrovich + 8 Oleks V + 7 KAZUYUKI TANIMURA + 7 Manu Zhang + 6 ChenChen Lai + 2 Bhargava Vadlamani + 2 Dmitrii Blaginin + 2 Han-Wen Tsao + 2 Kazantsev Maksim + 2 Liang-Chi Hsieh + 2 Steve Vaughan + 2 Vipul Vaibhaw + 2 Zhen Wang + 1 Karuppayya + 1 Martin Grigorov + 1 Matthew Alex +``` + +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.